hkzhang-git / hinas Goto Github PK

View Code? Open in Web Editor NEW

14.0 14.0 4.0 237 KB

Python 100.00%

hinas's People

Contributors

Stargazers

Watchers

Forkers

jon-drugstore mfriendly ruchira2k juliawasala

hinas's Issues

NotImplementedError: Input Error: Only 3D, 4D and 5D input Tensors supported (got 4D) for the modes: nearest | linear | bilinear | trilinear (got bicubic)

I have searched, but the following error appears，Hope you can help answer

2021-12-04 15:50:36,768 one_stage_nas INFO: Namespace(config_file='/data/yy/hinas/configs/sr/DIV2K_2c3n/03_x2_train_CR.yaml', device='3', opts=[])
2021-12-04 15:50:36,779 one_stage_nas INFO: Loaded configuration file /data/yy/hinas/configs/sr/DIV2K_2c3n/03_x2_train_CR.yaml
2021-12-04 15:50:36,780 one_stage_nas INFO:
DATASET:
DATA_ROOT: /data/yy/hinas/data

DATA_ROOT: /data/data2/zhk218/data/nas_data

DATA_NAME: DIV2K_800

DATA_NAME: Set14
CROP_SIZE: 64
TASK: "sr"
LOAD_ALL: False
SEARCH:
TIE_CELL: False
INPUT:
CROP_SIZE_TRAIN: 64
SOLVER:
TRAIN:

MAX_ITER: 600000

MAX_ITER: 100

CHECKPOINT_PERIOD: 10

CHECKPOINT_PERIOD: 1000

VALIDATE_PERIOD: 10
LOSS: ['l1', 'log_ssim']
LOSS_WEIGHT: [1.0, 0.6]
DATALOADER:
NUM_WORKERS: 4
BATCH_SIZE_TRAIN: 16
BATCH_SIZE_TEST: 16
S_FACTOR: 2
R_CROP: 4
DATA_LIST_DIR: ../preprocess/dataset_json
MODEL:
FILTER_MULTIPLIER: 16
META_ARCHITECTURE: Sr_compnet

META_ARCHITECTURE: Sr_supernet

META_MODE: Width
NUM_STRIDES: 3
NUM_LAYERS: 2
NUM_BLOCKS: 3
IN_CHANNEL: 3
PRIMITIVES: "NO_DEF_L"
ACTIVATION_F: "Leaky"
USE_ASPP: True
USE_RES: True

OUTPUT_DIR: output

2021-12-04 15:50:36,782 one_stage_nas INFO: Running with config:
DATALOADER:
BATCH_SIZE_TEST: 16
BATCH_SIZE_TRAIN: 16
DATA_AUG: 1
DATA_LIST_DIR: ../preprocess/dataset_json
NUM_WORKERS: 4
R_CROP: 4
SIGMA: []
S_FACTOR: 2
DATASET:
CROP_SIZE: 64
DATA_NAME: Set14
DATA_ROOT: /data/yy/hinas/data
LOAD_ALL: False
TASK: sr
TEST_DATASETS: []
TO_GRAY: False
TRAIN_DATASETS: []
TRAIN_DATASETS_WEIGHT: []
INPUT:
CROP_SIZE_TRAIN: 64
MAX_SIZE_TEST: 1024
MAX_SIZE_TRAIN: 1024
MIN_SIZE_TEST: -1
MIN_SIZE_TRAIN: -1
MODEL:
ACTIVATION_F: Leaky
AFFINE: True
ASPP_RATES: (2, 4, 6)
FILTER_MULTIPLIER: 16
IN_CHANNEL: 3
META_ARCHITECTURE: Sr_compnet
META_MODE: Width
NUM_BLOCKS: 3
NUM_LAYERS: 2
NUM_STRIDES: 3
PRIMITIVES: NO_DEF_L
RES: add
USE_ASPP: True
USE_RES: True
WEIGHT:
WS_FACTORS: [1, 1.5, 2]
OUTPUT_DIR: output
RESULT_DIR: .
SEARCH:
ARCH_START_EPOCH: 20
PORTION: 0.5
R_SEED: 0
SEARCH_ON: False
TIE_CELL: False
VAL_PORTION: 0.02
SOLVER:
BIAS_LR_FACTOR: 2
CHECKPOINT_PERIOD: 10
LOSS: ['l1', 'log_ssim']
LOSS_WEIGHT: [1.0, 0.6]
MAX_EPOCH: 30
MOMENTUM: 0.9
SCHEDULER: poly
SEARCH:
LR_A: 0.001
LR_END: 0.001
LR_START: 0.025
MOMENTUM: 0.9
T_MAX: 10
WD_A: 0.001
WEIGHT_DECAY: 0.0003
TRAIN:
INIT_LR: 0.05
MAX_ITER: 100
POWER: 0.9
VAL_PORTION: 0.01
VALIDATE_PERIOD: 10
WEIGHT_DECAY: 4e-05
WEIGHT_DECAY_BIAS: 0
Loading genotype from output/sr/Set14/Outline-2c3n_TC-False_ASPP-True_Res-True_Prim-NO_DEF_L/search/models/model_best.geno
2021-12-04 15:50:58,502 one_stage_nas.utils.checkpoint INFO: No checkpoint found. Initializing model from scratch
2021-12-04 15:50:58,549 one_stage_nas.trainer INFO: Model Params: 0.26M
2021-12-04 15:50:58,556 one_stage_nas.trainer INFO: Start training
/home/yy/miniconda/conda/envs/hinas/lib/python3.7/site-packages/torch/nn/functional.py:2423: UserWarning: Default upsampling behavior when mode=bicubic is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
"See the documentation of nn.Upsample for details.".format(mode))
Traceback (most recent call last):
File "train.py", line 134, in
main()
File "train.py", line 130, in main
train(cfg, output_dir)
File "train.py", line 68, in train
cfg
File "../one_stage_nas/engine/trainer.py", line 87, in do_train
pred, loss_dict = model(images, targets)
File "/home/yy/miniconda/conda/envs/hinas/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/yy/miniconda/conda/envs/hinas/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 141, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/yy/miniconda/conda/envs/hinas/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "../one_stage_nas/modeling/sr_compnet.py", line 214, in forward
pred = F.interpolate(images, size=pred.size()[-2:], mode='bicubic') + pred
File "/home/yy/miniconda/conda/envs/hinas/lib/python3.7/site-packages/torch/nn/functional.py", line 2459, in interpolate
" (got {})".format(input.dim(), mode))
NotImplementedError: Input Error: Only 3D, 4D and 5D input Tensors supported (got 4D) for the modes: nearest | linear | bilinear | trilinear (got bicubic)

and the model_best.geno is 354bytes. Whether or not this has anything to do with "No checkpoint found. Initializing model from scratch"

Loss is NAN

After running the 03_search_CR_R0.yaml, the genotype of the best supernet accuracy is Val: SSIM:0.615 PSNR:23.8382. However，When I train the net by the genotype. The loss function does not converge and sometimes it is nan. I don't change the code.

The follow is the train.yaml file
DATASET:
DATA_ROOT:
DATA_NAME: BSD500_300
CROP_SIZE: 64
TASK: "dn"
LOAD_ALL: True
TO_GRAY: True
SEARCH:
TIE_CELL: False
INPUT:
CROP_SIZE_TRAIN: 64
SOLVER:
TRAIN:
MAX_ITER: 600000
CHECKPOINT_PERIOD: 1000
VALIDATE_PERIOD: 1000
LOSS: ['mse', 'log_ssim']
LOSS_WEIGHT: [1.0, 0.6]
DATALOADER:
NUM_WORKERS: 2
BATCH_SIZE_TRAIN: 24
BATCH_SIZE_TEST: 24
SIGMA: [30]
DATA_AUG: 5
MODEL:
FILTER_MULTIPLIER: 20
META_ARCHITECTURE: Dn_compnet
META_MODE: Width
NUM_STRIDES: 3
NUM_LAYERS: 3
NUM_BLOCKS: 4
IN_CHANNEL: 1
PRIMITIVES: "NO_DEF_L"
ACTIVATION_F: "Leaky"
USE_ASPP: False
USE_RES: True

OUTPUT_DIR: output_R0

l have trid to use learning rate = 0.01. The loss is small but still does not converge

sr_eval.py consumes a large amount of GPU memory

sr_eval.py consumes a large amount of GPU memory.
my GPU is 16GB, the code sr_eval.py just use 5 pictures. If there are more than 5 images, the cuda out of memory. Do you have this problem?

关于 build_dataset.py中参数的设置问题

您好，谢谢开源，非常棒的工作。
我想知道 cfg.SEARCH.PORTION 这个参数的具体设置是怎样的，是论文中描述的 0.5 么？
还有 engineer 文件夹中 cfg.DATALOADER.R_CROP的设置貌似都找不到，这些参数对数据集的处理应该都是很关键的。
期待您的回复，谢谢~

Sr_supernet

这段代码是什么意思，看了半天也没看懂，论文中也没看到具体的解释

RuntimeError: Expected object of backend CUDA but got backend CPU for argument #2 'weight'

I did not modify the source code, but the following error occurred during training
Loading genotype from output/sr/Set14/Outline-2c3n_TC-False_ASPP-True_Res-False_Prim-NO_DEF_L/search/models/model_best.geno
2021-12-14 04:24:08,598 one_stage_nas.utils.checkpoint INFO: No checkpoint found. Initializing model from scratch
2021-12-14 04:24:08,600 one_stage_nas.trainer INFO: Model Params: 0.25M
2021-12-14 04:24:08,601 one_stage_nas.trainer INFO: Start training
Traceback (most recent call last):
File "train.py", line 137, in
main()
File "train.py", line 133, in main
train(cfg, output_dir)
File "train.py", line 71, in train
cfg
File "../one_stage_nas/engine/trainer.py", line 87, in do_train
pred, loss_dict = model(images, targets)
File "/home/data2/yt_data/anaconda3/envs/hinas/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/data2/yt_data/anaconda3/envs/hinas/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 141, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/data2/yt_data/anaconda3/envs/hinas/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "../one_stage_nas/modeling/sr_compnet.py", line 239, in forward
loss.append(loss_item(pred, targets) * weight)
File "/home/data2/yt_data/anaconda3/envs/hinas/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "../one_stage_nas/modeling/loss.py", line 34, in forward
mu1 = F.conv2d(img1, self.window, padding=self.window_size // 2, groups=self.channel)
RuntimeError: Expected object of backend CUDA but got backend CPU for argument #2 'weight'

03_infe.yaml missing

Hello,

It seems that the inference config file for denoising is missing.
FileNotFoundError: [Errno 2] No such file or directory: '../configs/dn/BSD500_3c4n/03_train_CR_R0/03_infe.yaml'

Can you please upload the inference file?

Thank you.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.