menghaoguo / autodeeplab Goto Github PK
View Code? Open in Web Editor NEWPytorch Implementation the paper Auto-DeepLab Hierarchical Neural Architecture Search for Semantic Image Segmentation
Home Page: https://arxiv.org/abs/1901.02985
Pytorch Implementation the paper Auto-DeepLab Hierarchical Neural Architecture Search for Semantic Image Segmentation
Home Page: https://arxiv.org/abs/1901.02985
self.train_loader1, self.train_loader2, self.val_loader, self.test_loader, self.nclass = make_data_loader(args, **kwargs)
ValueError: not enough values to unpack (expected 5, got 4)
Is anyone has trained the model?I used DeeplabV3+ but its size is too big(roughly 450M)for me. So I want to know AutoDeeplab's size after trained.
Maybe I need to train it by myself.
why we should multiply softmax result by 5. And there aren't code about using "decode" part. I will try it later, while if some one have done this part job, please share.
I am training my model by VOC2012 dataset without any pretraining . After training for 50 epochs the mIOU I'm getting is very low(0.09).Why am I getting such low mIOU value?
Does anyone have an idea about inference code?
hi,could i ask what's wrong with this?
When I trained model, the memory always gets even larger with the epochs, and then out of memory, even I only use 200 images. Does anyone have solutions?
您好,既然都是**人就不用英语了。麻烦您可以把数据输入的接口做好吗?
What will be the value of parameter arch_lr(architecture learning rate)?
In aspp.py file, there is import ABN from operations. However, there is no ABN in operation.
Would be very helpful if someone could provide a clarification.
你好,我运行后也显示这个错误 请问下是还需要什么包或者依赖吗
Thank you very much for sharing, but in the code did not see how to get the final search of the network structure framework, and do not know how to train the target architecture of search, thank you!
Thanks for your contributions! When I tried to run the train_voc.sh on cityscapes dataset only by changing the '--dataset pascal' to '--dataset cityscapes', it turns out that 4 GPUs are assigned but only the first one was actually used. I check the code but can't find the reasons. Could you please give a help? Thanks!
Namespace(arch_lr=None, arch_weight_decay=0.001, backbone='resnet', base_size=224, batch_size=4, checkname='deeplab-resnet', crop_size=224, cuda=True, dataset='pascal', epochs=50, eval_interval=1, freeze_bn=False, ft=False, gpu_ids=[0], loss_type='ce', lr=0.007, lr_scheduler='poly', momentum=0.9, nesterov=False, no_cuda=False, no_val=False, out_stride=16, resume=None, seed=1, start_epoch=0, sync_bn=False, test_batch_size=4, use_balanced_weights=False, use_sbd=False, weight_decay=0.0003, workers=4)
Number of images in train: 1464
Number of images in val: 1449
Traceback (most recent call last):
File "train_autodeeplab.py", line 321, in <module>
main()
File "train_autodeeplab.py", line 310, in main
trainer = Trainer(args)
File "train_autodeeplab.py", line 32, in __init__
self.train_loader1, self.train_loader2, self.val_loader, self.test_loader, self.nclass = make_data_loader(args, **kwargs)
ValueError: not enough values to unpack (expected 5, got 4)
train_voc.sh: line 2: --workers: command not found
train_voc.sh: line 3: deeplab-resnet: command not found
Thanks for your attention to this project.Now, this project can't reproduce the results.This project just provide some ideas to reproduce the paper.If you have some good ideas, both merging and discussing are welcome.
@MenghaoGuo Thank you for your hard work,
Is your code compatible with Pytorch 1? because the newer graphics cards are compatible with it.
I am wondering how can I visualise the best inner cell and outer network after search for my custom dataset. Something similar to the original paper in Figure 3.
I really appreciate any comment or help in advance.
From my point of view, The index of self.alphas_network with size(12,4,3) should correspond to (layer_numbers(12), network_depth(4,8,16,32), actions(up,level,down) ). For simplicity, we can just give different index specific meaning, like for actions: 0 means go up ,1 means go level, 2 means go down, but the code clearly doesn't work this way.
The intend of this code, I guess, is that :
For depth index: for 0(depth 4), 0 means level,1 means down;
For 1,2(depth 8,16), 0 means up, 1 means level,2 means down;
For 3(depth 32), 0 means up,1 means level;
By understanding in this way ,I found that there is still some logical problem in forward part.
Can anyone give me a reasonable explainition? or it's just a ignored bug.
When I finish 1 epoch and save the checkpoint,the following error occurred
File "train_autodeeplab.py", line 185, in validation
'state_dict': self.model.module.state_dict(),
AttributeError: 'AutoDeeplab' object has no attribute 'module'
Can u tell me where is wrong,please,thanks.
I use a single GPU card( NVIDIA GTX1080 Ti ) and set the workers to 1 according to other issues. However, when I ran the train_autodeeplab.py, after some images, I still met the problem that cuda runtime error (2): at c:\programdata\miniconda3\conda-bld\pytorch_1524549877902\work\aten\src\thc\generic/THCStorage.cu:58. Could somebody tell me how to sovle this if I only have a single GPU card?
done
in the original code, the crop size is set for 224, while the problem-"out of memory" will take place in my GPU device——Tesla V100- 32G, have any one meet this problem. Now, I make the crop size 128, and it works.
And I think for esmantic segmentation, should we not crop the image size too small? look forward to good answers.
in the train_autodeeplab.py file, line 32:
self.train_loader1, self.train_loader2, self.val_loader, self.test_loader, self.nclass = make_data_loader(args, **kwargs);
go into the function make_data_loader which path is dataloaders / init.py,
we can see train_loader1 = train_loader, and train_loader2 = train_loader. If what i said is true, this code is not right according to original paper. Have anyone corrected this? please upload your code. Thx
When I run Train Autodeeplab.py,the following error occurred
File "train_autodeeplab.py", line 8, in
from modeling.sync_batchnorm.replicate import patch_replication_callback
ModuleNotFoundError: No module named 'modeling'
Hi, can you provide the test and visualization?
Hi, Thanks for the great project. Could you please report the searched arch`s performance on VOC? More details training strategy will be helpful. Thanks a lot.
and please tell me your dataset and the highest mIOU
does anyone have ideas?
Hey, thx for your codes released!
OS: Ubuntu 16.04
CUDA: 8.0.44
GPU: TITAN X Pascal (11.2GB as the memory) X 4
I intend to train model in PASCAL VOC 2012, and I run CUDA_VISIBLE_DEVICES=0,1,2,3 python train_autodeeplab.py --backbone resnet --lr 0.007 --workers 4 --epochs 40 --batch_size 1 --eval_interval 1 --dataset pascal
and the error message show as below
Namespace(arch_lr=0.003, arch_weight_decay=0.001, backbone='resnet', base_size=320, batch_size=1, checkname='deeplab-resnet', crop_size=320, cuda=True, dataset='pascal', epochs=40, eval_interval=1, freeze_bn=False, ft=False, gpu_ids=0, loss_type='ce', lr=0.007, lr_scheduler='cos', momentum=0.9, nesterov=False, no_cuda=False, no_val=False, out_stride=16, resize=512, resume=None, seed=1, start_epoch=0, sync_bn=True, test_batch_size=1, use_balanced_weights=False, use_sbd=False, weight_decay=0.0003, workers=4)Number of images in train: 1464Number of images in val: 1449cuda finished
Using cos LR Scheduler!
Starting Epoch: 0 Total Epoches: 40 0%| | 0/1464 [00:00<?, ?it/s]=>Epoches 0, learning rate = 0.0070, previous best = 0.0000
/home/ljy/anaconda3/envs/p36c8t041ljy/lib/python3.6/site-packages/torch/nn/modules/upsampling.py:122: UserWarning: nn.Upsampling is deprecated. Use nn.functional.interpolate instead. warnings.warn("nn.Upsampling is deprecated. Use nn.functional.interpolate instead.")
/home/ljy/anaconda3/envs/p36c8t041ljy/lib/python3.6/site-packages/torch/nn/functional.py:1961: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. "See the documentation of nn.Upsample for details.".format(mode))
Traceback (most recent call last):
File "train_autodeeplab.py", line 324, in main()
File "train_autodeeplab.py", line 317, in main trainer.training(epoch) File "train_autodeeplab.py", line 116, in training output = self.model(image)
File "/home/ljy/anaconda3/envs/p36c8t041ljy/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, **kwargs)
File "/home/ljy/li-codes/lwz/codes/AutoDeeplab/auto_deeplab.py", line 214, in forward level4_new_2 = self.cells[count] (self.level_4[-2], self.level_8[-1], weight_cells)
File "/home/ljy/anaconda3/envs/p36c8t041ljy/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, **kwargs)
File "/home/ljy/li-codes/lwz/codes/AutoDeeplab/model_search.py", line 68, in forward s = sum(self._ops[offset+j](h, weights[offset+j]) for j, h in enumerate(states) if h is not None)
File "/home/ljy/li-codes/lwz/codes/AutoDeeplab/model_search.py", line 68, in s = sum(self._ops[offset+j](h, weights[offset+j]) for j, h in enumerate(states) if h is not None)
File "/home/ljy/anaconda3/envs/p36c8t041ljy/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, **kwargs)
File "/home/ljy/li-codes/lwz/codes/AutoDeeplab/model_search.py", line 22, in forward return sum(w * op(x) for w, op in zip(weights, self._ops))
File "/home/ljy/li-codes/lwz/codes/AutoDeeplab/model_search.py", line 22, in return sum(w * op(x) for w, op in zip(weights, self._ops))
RuntimeError: CUDA error: out of memory
I guess I may fail to use multi-GPUs, so I even change a line the code into self.model = torch.nn.DataParallel(self.model, device_ids=[0, 1, 2, 3])
, but the same error message show again.
What can I do to resolve it, please
Thx in advance
RuntimeError: CuDNN error: CUDNN_STATUS_SUCCESS
anaconda install pytorch=0.4
cudatoolkit=9.0
cudnn=7.3.1
pytorch=0.4.1
weight_cells = F.softmax(self.alphas_cell, dim=-1) at line 168, 172, auto_deeplab.py
weight_network = F.softmax (self.alphas_network, dim = -1) at line 169, 171, auto_deeplab.py
The softmax applied to archtecture parameters are called twice, wonder this is what auto-deeplab meant to do or is a bug?
Please add a LICENSE file
The network architecture weights alphas_network
are initialized non zero even for connections that do not exist. For example in the first layer we only have two connections, but three weight are initialized. This will cause problems when taking the softmax over values that are always non zero even though the weights are not used.
Therefore one should change the values to zero for all connections that are not used and only consider the non zero values when calculating softmax. I would suggest to add some kind of masking to alphas_network
paramaters.
Why in train_autodeeplab.py do you not run self.architect.step (image_search, target_search)
until epoch > 19 ?
It seems there are many codes missed, such as utils. When would you release the total codes?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.