menghaoguo / autodeeplab Goto Github PK

View Code? Open in Web Editor NEW

408.0 20.0 97.0 75 KB

Pytorch Implementation the paper Auto-DeepLab Hierarchical Neural Architecture Search for Semantic Image Segmentation

Home Page: https://arxiv.org/abs/1901.02985

Python 99.79% Shell 0.21%

autodeeplab's Introduction

AutoDeeplab

This is an implementation of Auto-DeepLab using Pytorch.

Environment

The implementation needs the following dependencies:

Python = 3.7
Pytorch = 0.4
TensorboardX

Other basic dependencies like matplotlib, tqdm ... are also needed.

Installation

First, clone the repository

git clone https://github.com/MenghaoGuo/AutoDeeplab.git

Then

cd AutoDeeplab

Train

The dataloader module is built on this repo

If you want to train this model on different datasets, you need to edit --dataset parameter and then:

bash train_voc.sh

Reference

[1] : Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation

[2] : pytorch-deeplab-xception

autodeeplab's People

Contributors

Stargazers

Watchers

Forkers

kelvinson ailibrary jdc08161063 rambosofter lgyoung collector-m hyzcn jingzhao3200 fuweijie nanzhixionggit wenwenyu oddbook liveway6 playai xiaoketongxue gaimjkp bruinxiong shubhampachori12110095 jovahe shiyuan0806 yuexingyu xiaoyubing tomarraj008 meitianjinbu gjy3035 jianlong-yuan pchank lijiunderstand yangsuhui zzzz94 bigbugx dlwbm123 william-zhan hankkung ankushmalik vinaysingh5898 xiaochengcike randylcy juvu lon9 kingmv manuelbre shiyanrubing electronicelephant peterzhousz graygaoyu highclow samplvs zizai zhwzhong biqiyang odellus heathhose liu-ca louyanyang kids0cn branikas anthonylzh555 gladcolor mzkaramat wangjingbo1219 ylqi jiaoyining hell-to-heaven takuyashintate fangwudi ww-jingle luckysonkhaidem skat00sh surfcao nunofernandes-plight wangq95 limitmhw killsking olegjakushkin zhangrj91 cv-ip joene-zhou 1xiaoxiaozhang blyucs 845968074 sjjdd vdesgrange soudipd yutingyao yuan1998jeff jie311 jinyuhang spencer551 wzb1005 iq-scm s55038

autodeeplab's Issues

about the crop image size for input

in the original code, the crop size is set for 224, while the problem-"out of memory" will take place in my GPU device——Tesla V100- 32G， have any one meet this problem. Now, I make the crop size 128, and it works.
And I think for esmantic segmentation, should we not crop the image size too small? look forward to good answers.

missing LICENSE file

Please add a LICENSE file

what does ABN in "from operations import ABN" in aspp.py ?

In aspp.py file, there is import ABN from operations. However, there is no ABN in operation.

Would be very helpful if someone could provide a clarification.

Are these total codes?

It seems there are many codes missed, such as utils. When would you release the total codes?

How can I visualize the best inner cell and outer network after search for my custom dataset?

I am wondering how can I visualise the best inner cell and outer network after search for my custom dataset. Something similar to the original paper in Figure 3.

I really appreciate any comment or help in advance.

how to do inference?

Does anyone have an idea about inference code?

arch_lr value not provided

What will be the value of parameter arch_lr(architecture learning rate)?

About the model size

Is anyone has trained the model？I used DeeplabV3+ but its size is too big（roughly 450M）for me. So I want to know AutoDeeplab's size after trained.
Maybe I need to train it by myself.

Statement

Thanks for your attention to this project.Now, this project can't reproduce the results.This project just provide some ideas to reproduce the paper.If you have some good ideas, both merging and discussing are welcome.

Pytorch 1

@MenghaoGuo Thank you for your hard work,

Is your code compatible with Pytorch 1? because the newer graphics cards are compatible with it.

why we should multiply softmax result by 5

why we should multiply softmax result by 5. And there aren't code about using "decode" part. I will try it later, while if some one have done this part job, please share.

Where is utils？

I am very interested in this code.Could you please release the total code?
(I found that someone asked this question but some scripts like uitls is missed again)

ModuleNotFoundError: No module named 'modeling'

你好，我运行后也显示这个错误请问下是还需要什么包或者依赖吗

what is train_loader1 and train_loader2?

self.train_loader1, self.train_loader2, self.val_loader, self.test_loader, self.nclass = make_data_loader(args, **kwargs)
ValueError: not enough values to unpack (expected 5, got 4)

test

Hi, can you provide the test and visualization?

ModuleNotFoundError

When I run Train Autodeeplab.py,the following error occurred
File "train_autodeeplab.py", line 8, in
from modeling.sync_batchnorm.replicate import patch_replication_callback
ModuleNotFoundError: No module named 'modeling'

how many epoches should we set to make the train loss to be convergent?

and please tell me your dataset and the highest mIOU

I think the decode part is just a single DFS rather than viterbi algorithm ∩_∩, and definitely can be optimized.

ValueError: not enough values to unpack (expected 5, got 4)

Namespace(arch_lr=None, arch_weight_decay=0.001, backbone='resnet', base_size=224, batch_size=4, checkname='deeplab-resnet', crop_size=224, cuda=True, dataset='pascal', epochs=50, eval_interval=1, freeze_bn=False, ft=False, gpu_ids=[0], loss_type='ce', lr=0.007, lr_scheduler='poly', momentum=0.9, nesterov=False, no_cuda=False, no_val=False, out_stride=16, resume=None, seed=1, start_epoch=0, sync_bn=False, test_batch_size=4, use_balanced_weights=False, use_sbd=False, weight_decay=0.0003, workers=4)
Number of images in train: 1464
Number of images in val: 1449
Traceback (most recent call last):
  File "train_autodeeplab.py", line 321, in <module>
    main()
  File "train_autodeeplab.py", line 310, in main
    trainer = Trainer(args)
  File "train_autodeeplab.py", line 32, in __init__
    self.train_loader1, self.train_loader2, self.val_loader, self.test_loader, self.nclass = make_data_loader(args, **kwargs)
ValueError: not enough values to unpack (expected 5, got 4)
train_voc.sh: line 2: --workers: command not found
train_voc.sh: line 3: deeplab-resnet: command not found

how to obtain and train the target architecture of searc？

Thank you very much for sharing, but in the code did not see how to get the final search of the network structure framework, and do not know how to train the target architecture of search, thank you!

The network weights and the architecture weights(coefficient) train together?

I think it would be better if we train the network weights and the architecture weights separately, to be exact , frozen the grad of α，β when updating w, also frozen the gradient of w when updating α，β.

By the definition of:

Given groups=1, weight of size 48 160 1 1, expected input[2, 1280, 25, 25] to have 160 channels, but got 1280 channels instead

Still 'CUDA out of memory' with 4 TITAN X (pascal) when training model in PASCAL VOC dataset

Hey, thx for your codes released!

OS: Ubuntu 16.04
CUDA: 8.0.44
GPU: TITAN X Pascal (11.2GB as the memory) X 4

I intend to train model in PASCAL VOC 2012, and I run CUDA_VISIBLE_DEVICES=0,1,2,3 python train_autodeeplab.py --backbone resnet --lr 0.007 --workers 4 --epochs 40 --batch_size 1 --eval_interval 1 --dataset pascal

and the error message show as below

Namespace(arch_lr=0.003, arch_weight_decay=0.001, backbone='resnet', base_size=320, batch_size=1, checkname='deeplab-resnet', crop_size=320, cuda=True, dataset='pascal', epochs=40, eval_interval=1, freeze_bn=False, ft=False, gpu_ids=0, loss_type='ce', lr=0.007, lr_scheduler='cos', momentum=0.9, nesterov=False, no_cuda=False, no_val=False, out_stride=16, resize=512, resume=None, seed=1, start_epoch=0, sync_bn=True, test_batch_size=1, use_balanced_weights=False, use_sbd=False, weight_decay=0.0003, workers=4)Number of images in train: 1464Number of images in val: 1449cuda finished
Using cos LR Scheduler!
Starting Epoch: 0 Total Epoches: 40 0%| | 0/1464 [00:00<?, ?it/s]=>Epoches 0, learning rate = 0.0070, previous best = 0.0000
/home/ljy/anaconda3/envs/p36c8t041ljy/lib/python3.6/site-packages/torch/nn/modules/upsampling.py:122: UserWarning: nn.Upsampling is deprecated. Use nn.functional.interpolate instead. warnings.warn("nn.Upsampling is deprecated. Use nn.functional.interpolate instead.")
/home/ljy/anaconda3/envs/p36c8t041ljy/lib/python3.6/site-packages/torch/nn/functional.py:1961: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. "See the documentation of nn.Upsample for details.".format(mode))
Traceback (most recent call last):
File "train_autodeeplab.py", line 324, in main()
File "train_autodeeplab.py", line 317, in main trainer.training(epoch) File "train_autodeeplab.py", line 116, in training output = self.model(image)
File "/home/ljy/anaconda3/envs/p36c8t041ljy/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, **kwargs)
File "/home/ljy/li-codes/lwz/codes/AutoDeeplab/auto_deeplab.py", line 214, in forward level4_new_2 = self.cells[count] (self.level_4[-2], self.level_8[-1], weight_cells)
File "/home/ljy/anaconda3/envs/p36c8t041ljy/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, **kwargs)
File "/home/ljy/li-codes/lwz/codes/AutoDeeplab/model_search.py", line 68, in forward s = sum(self._ops[offset+j](h, weights[offset+j]) for j, h in enumerate(states) if h is not None)
File "/home/ljy/li-codes/lwz/codes/AutoDeeplab/model_search.py", line 68, in s = sum(self._ops[offset+j](h, weights[offset+j]) for j, h in enumerate(states) if h is not None)
File "/home/ljy/anaconda3/envs/p36c8t041ljy/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, **kwargs)
File "/home/ljy/li-codes/lwz/codes/AutoDeeplab/model_search.py", line 22, in forward return sum(w * op(x) for w, op in zip(weights, self._ops))
File "/home/ljy/li-codes/lwz/codes/AutoDeeplab/model_search.py", line 22, in return sum(w * op(x) for w, op in zip(weights, self._ops))
RuntimeError: CUDA error: out of memory

I guess I may fail to use multi-GPUs, so I even change a line the code into self.model = torch.nn.DataParallel(self.model, device_ids=[0, 1, 2, 3]), but the same error message show again.

What can I do to resolve it, please

Thx in advance

File "../AutoDeeplab-master2/model_search.py", line 68, in forward s = sum(self._ops[offset+j](h, weights[offset+j]) for j, h in enumerate(states)) RuntimeError: The size of tensor a (81) must match the size of tensor b (82) at non-singleton dime

does anyone have ideas?

What does the index of alphas_network mean in auto_deeplab.py?

From my point of view, The index of self.alphas_network with size(12,4,3) should correspond to (layer_numbers(12), network_depth(4,8,16,32), actions(up,level,down) ). For simplicity, we can just give different index specific meaning, like for actions: 0 means go up ,1 means go level, 2 means go down, but the code clearly doesn't work this way.
The intend of this code, I guess, is that :
For depth index: for 0(depth 4), 0 means level,1 means down;
For 1,2(depth 8,16), 0 means up, 1 means level,2 means down;
For 3(depth 32), 0 means up,1 means level;
By understanding in this way ,I found that there is still some logical problem in forward part.
Can anyone give me a reasonable explainition? or it's just a ignored bug.

AttributeError

When I finish 1 epoch and save the checkpoint,the following error occurred
File "train_autodeeplab.py", line 185, in validation
'state_dict': self.model.module.state_dict(),
AttributeError: 'AutoDeeplab' object has no attribute 'module'
Can u tell me where is wrong,please,thanks.

Single GPU with workers=1, but I still met cuda runtime error.

I use a single GPU card( NVIDIA GTX1080 Ti ) and set the workers to 1 according to other issues. However, when I ran the train_autodeeplab.py, after some images, I still met the problem that cuda runtime error (2): at c:\programdata\miniconda3\conda-bld\pytorch_1524549877902\work\aten\src\thc\generic/THCStorage.cu:58. Could somebody tell me how to sovle this if I only have a single GPU card?

Not doing an architecture step until epoch > 19

Why in train_autodeeplab.py do you not run self.architect.step (image_search, target_search) until epoch > 19 ?

Use GPU '0,1,2,3', but only the card 0 is actually used.

Thanks for your contributions! When I tried to run the train_voc.sh on cityscapes dataset only by changing the '--dataset pascal' to '--dataset cityscapes', it turns out that 4 GPUs are assigned but only the first one was actually used. I check the code but can't find the reasons. Could you please give a help? Thanks!

Network Architecture Weights are non zero for connections that do not exist

The network architecture weights alphas_network are initialized non zero even for connections that do not exist. For example in the first layer we only have two connections, but three weight are initialized. This will cause problems when taking the softmax over values that are always non zero even though the weights are not used.

Therefore one should change the values to zero for all connections that are not used and only consider the non zero values when calculating softmax. I would suggest to add some kind of masking to alphas_network paramaters.

TypeError: 'NoneType' object is not callable

When I run this code,

Very low mIOU value

I am training my model by VOC2012 dataset without any pretraining . After training for 50 epochs the mIOU I'm getting is very low(0.09).Why am I getting such low mIOU value?

AttributeError: 'Cell' object has no attribute 'cell_ops'

hello, I would like to ask why this happens before it starts running and how to solve it.

File "../AutoDeeplab-master/operations.py", line 90, in forward out = torch.cat([self.conv_1(x), self.conv_2(x[:,:,1:,1:])], dim=1)

done

RuntimeError

RuntimeError: CuDNN error: CUDNN_STATUS_SUCCESS

anaconda install pytorch=0.4

cudatoolkit=9.0
cudnn=7.3.1
pytorch=0.4.1

performance report

Hi, Thanks for the great project. Could you please report the searched arch`s performance on VOC? More details training strategy will be helpful. Thanks a lot.

How to implement data input？

您好，既然都是**人就不用英语了。麻烦您可以把数据输入的接口做好吗？

no module named 'mypath'

self.train_loader1, self.train_loader2 in train_autodeeplab.py

in the train_autodeeplab.py file, line 32:
self.train_loader1, self.train_loader2, self.val_loader, self.test_loader, self.nclass = make_data_loader(args, **kwargs);
go into the function make_data_loader which path is dataloaders / init.py,
we can see train_loader1 = train_loader, and train_loader2 = train_loader. If what i said is true, this code is not right according to original paper. Have anyone corrected this? please upload your code. Thx

train_autodeeplab.py: error: unrecognized arguments: --gpu_ids 0,1,2,3

hi,could i ask what's wrong with this?

Softmax applied twice to the archtecture parameters?

    weight_cells = F.softmax(self.alphas_cell, dim=-1) at line 168, 172, auto_deeplab.py
    weight_network = F.softmax (self.alphas_network, dim = -1) at line 169, 171, auto_deeplab.py

The softmax applied to archtecture parameters are called twice, wonder this is what auto-deeplab meant to do or is a bug?

memory consumption too large

When I trained model, the memory always gets even larger with the epochs, and then out of memory, even I only use 200 images. Does anyone have solutions?