Git Product home page Git Product logo

autodeeplab's Introduction

Hi, I’m Meng-Hao Guo

⚡ GitHub Stats

Top Langs

BrunoSantosStats

autodeeplab's People

Contributors

hankkung avatar lgyoung avatar menghaoguo avatar randylcy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

autodeeplab's Issues

what is train_loader1 and train_loader2?

self.train_loader1, self.train_loader2, self.val_loader, self.test_loader, self.nclass = make_data_loader(args, **kwargs)
ValueError: not enough values to unpack (expected 5, got 4)

About the model size

Is anyone has trained the model?I used DeeplabV3+ but its size is too big(roughly 450M)for me. So I want to know AutoDeeplab's size after trained.
Maybe I need to train it by myself.

why we should multiply softmax result by 5

why we should multiply softmax result by 5. And there aren't code about using "decode" part. I will try it later, while if some one have done this part job, please share.

Very low mIOU value

I am training my model by VOC2012 dataset without any pretraining . After training for 50 epochs the mIOU I'm getting is very low(0.09).Why am I getting such low mIOU value?

memory consumption too large

When I trained model, the memory always gets even larger with the epochs, and then out of memory, even I only use 200 images. Does anyone have solutions?

Use GPU '0,1,2,3', but only the card 0 is actually used.

Thanks for your contributions! When I tried to run the train_voc.sh on cityscapes dataset only by changing the '--dataset pascal' to '--dataset cityscapes', it turns out that 4 GPUs are assigned but only the first one was actually used. I check the code but can't find the reasons. Could you please give a help? Thanks!

ValueError: not enough values to unpack (expected 5, got 4)

Namespace(arch_lr=None, arch_weight_decay=0.001, backbone='resnet', base_size=224, batch_size=4, checkname='deeplab-resnet', crop_size=224, cuda=True, dataset='pascal', epochs=50, eval_interval=1, freeze_bn=False, ft=False, gpu_ids=[0], loss_type='ce', lr=0.007, lr_scheduler='poly', momentum=0.9, nesterov=False, no_cuda=False, no_val=False, out_stride=16, resume=None, seed=1, start_epoch=0, sync_bn=False, test_batch_size=4, use_balanced_weights=False, use_sbd=False, weight_decay=0.0003, workers=4)
Number of images in train: 1464
Number of images in val: 1449
Traceback (most recent call last):
  File "train_autodeeplab.py", line 321, in <module>
    main()
  File "train_autodeeplab.py", line 310, in main
    trainer = Trainer(args)
  File "train_autodeeplab.py", line 32, in __init__
    self.train_loader1, self.train_loader2, self.val_loader, self.test_loader, self.nclass = make_data_loader(args, **kwargs)
ValueError: not enough values to unpack (expected 5, got 4)
train_voc.sh: line 2: --workers: command not found
train_voc.sh: line 3: deeplab-resnet: command not found

Statement

Thanks for your attention to this project.Now, this project can't reproduce the results.This project just provide some ideas to reproduce the paper.If you have some good ideas, both merging and discussing are welcome.

Pytorch 1

@MenghaoGuo Thank you for your hard work,

Is your code compatible with Pytorch 1? because the newer graphics cards are compatible with it.

What does the index of alphas_network mean in auto_deeplab.py?

From my point of view, The index of self.alphas_network with size(12,4,3) should correspond to (layer_numbers(12), network_depth(4,8,16,32), actions(up,level,down) ). For simplicity, we can just give different index specific meaning, like for actions: 0 means go up ,1 means go level, 2 means go down, but the code clearly doesn't work this way.
The intend of this code, I guess, is that :
For depth index: for 0(depth 4), 0 means level,1 means down;
For 1,2(depth 8,16), 0 means up, 1 means level,2 means down;
For 3(depth 32), 0 means up,1 means level;
By understanding in this way ,I found that there is still some logical problem in forward part.
Can anyone give me a reasonable explainition? or it's just a ignored bug.

Where is utils?

I am very interested in this code.Could you please release the total code?
(I found that someone asked this question but some scripts like uitls is missed again)
image

AttributeError

When I finish 1 epoch and save the checkpoint,the following error occurred
File "train_autodeeplab.py", line 185, in validation
'state_dict': self.model.module.state_dict(),
AttributeError: 'AutoDeeplab' object has no attribute 'module'
Can u tell me where is wrong,please,thanks.

Single GPU with workers=1, but I still met cuda runtime error.

I use a single GPU card( NVIDIA GTX1080 Ti ) and set the workers to 1 according to other issues. However, when I ran the train_autodeeplab.py, after some images, I still met the problem that cuda runtime error (2): at c:\programdata\miniconda3\conda-bld\pytorch_1524549877902\work\aten\src\thc\generic/THCStorage.cu:58. Could somebody tell me how to sovle this if I only have a single GPU card?
image

about the crop image size for input

in the original code, the crop size is set for 224, while the problem-"out of memory" will take place in my GPU device——Tesla V100- 32G, have any one meet this problem. Now, I make the crop size 128, and it works.
And I think for esmantic segmentation, should we not crop the image size too small? look forward to good answers.

self.train_loader1, self.train_loader2 in train_autodeeplab.py

in the train_autodeeplab.py file, line 32:
self.train_loader1, self.train_loader2, self.val_loader, self.test_loader, self.nclass = make_data_loader(args, **kwargs);
go into the function make_data_loader which path is dataloaders / init.py,
we can see train_loader1 = train_loader, and train_loader2 = train_loader. If what i said is true, this code is not right according to original paper. Have anyone corrected this? please upload your code. Thx

ModuleNotFoundError

When I run Train Autodeeplab.py,the following error occurred
File "train_autodeeplab.py", line 8, in
from modeling.sync_batchnorm.replicate import patch_replication_callback
ModuleNotFoundError: No module named 'modeling'

test

Hi, can you provide the test and visualization?

performance report

Hi, Thanks for the great project. Could you please report the searched arch`s performance on VOC? More details training strategy will be helpful. Thanks a lot.

Still 'CUDA out of memory' with 4 TITAN X (pascal) when training model in PASCAL VOC dataset

Hey, thx for your codes released!

OS: Ubuntu 16.04
CUDA: 8.0.44
GPU: TITAN X Pascal (11.2GB as the memory) X 4

I intend to train model in PASCAL VOC 2012, and I run CUDA_VISIBLE_DEVICES=0,1,2,3 python train_autodeeplab.py --backbone resnet --lr 0.007 --workers 4 --epochs 40 --batch_size 1 --eval_interval 1 --dataset pascal

and the error message show as below

Namespace(arch_lr=0.003, arch_weight_decay=0.001, backbone='resnet', base_size=320, batch_size=1, checkname='deeplab-resnet', crop_size=320, cuda=True, dataset='pascal', epochs=40, eval_interval=1, freeze_bn=False, ft=False, gpu_ids=0, loss_type='ce', lr=0.007, lr_scheduler='cos', momentum=0.9, nesterov=False, no_cuda=False, no_val=False, out_stride=16, resize=512, resume=None, seed=1, start_epoch=0, sync_bn=True, test_batch_size=1, use_balanced_weights=False, use_sbd=False, weight_decay=0.0003, workers=4)Number of images in train: 1464Number of images in val: 1449cuda finished
Using cos LR Scheduler!
Starting Epoch: 0 Total Epoches: 40 0%| | 0/1464 [00:00<?, ?it/s]=>Epoches 0, learning rate = 0.0070, previous best = 0.0000
/home/ljy/anaconda3/envs/p36c8t041ljy/lib/python3.6/site-packages/torch/nn/modules/upsampling.py:122: UserWarning: nn.Upsampling is deprecated. Use nn.functional.interpolate instead. warnings.warn("nn.Upsampling is deprecated. Use nn.functional.interpolate instead.")
/home/ljy/anaconda3/envs/p36c8t041ljy/lib/python3.6/site-packages/torch/nn/functional.py:1961: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. "See the documentation of nn.Upsample for details.".format(mode))
Traceback (most recent call last):
File "train_autodeeplab.py", line 324, in main()
File "train_autodeeplab.py", line 317, in main trainer.training(epoch) File "train_autodeeplab.py", line 116, in training output = self.model(image)
File "/home/ljy/anaconda3/envs/p36c8t041ljy/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, **kwargs)
File "/home/ljy/li-codes/lwz/codes/AutoDeeplab/auto_deeplab.py", line 214, in forward level4_new_2 = self.cells[count] (self.level_4[-2], self.level_8[-1], weight_cells)
File "/home/ljy/anaconda3/envs/p36c8t041ljy/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, **kwargs)
File "/home/ljy/li-codes/lwz/codes/AutoDeeplab/model_search.py", line 68, in forward s = sum(self._ops[offset+j](h, weights[offset+j]) for j, h in enumerate(states) if h is not None)
File "/home/ljy/li-codes/lwz/codes/AutoDeeplab/model_search.py", line 68, in s = sum(self._ops[offset+j](h, weights[offset+j]) for j, h in enumerate(states) if h is not None)
File "/home/ljy/anaconda3/envs/p36c8t041ljy/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, **kwargs)
File "/home/ljy/li-codes/lwz/codes/AutoDeeplab/model_search.py", line 22, in forward return sum(w * op(x) for w, op in zip(weights, self._ops))
File "/home/ljy/li-codes/lwz/codes/AutoDeeplab/model_search.py", line 22, in return sum(w * op(x) for w, op in zip(weights, self._ops))
RuntimeError: CUDA error: out of memory

I guess I may fail to use multi-GPUs, so I even change a line the code into self.model = torch.nn.DataParallel(self.model, device_ids=[0, 1, 2, 3]), but the same error message show again.

What can I do to resolve it, please

Thx in advance

RuntimeError

RuntimeError: CuDNN error: CUDNN_STATUS_SUCCESS

anaconda install pytorch=0.4

cudatoolkit=9.0
cudnn=7.3.1
pytorch=0.4.1

Softmax applied twice to the archtecture parameters?

    weight_cells = F.softmax(self.alphas_cell, dim=-1) at line 168, 172, auto_deeplab.py
    weight_network = F.softmax (self.alphas_network, dim = -1) at line 169, 171, auto_deeplab.py

The softmax applied to archtecture parameters are called twice, wonder this is what auto-deeplab meant to do or is a bug?

Network Architecture Weights are non zero for connections that do not exist

The network architecture weights alphas_network are initialized non zero even for connections that do not exist. For example in the first layer we only have two connections, but three weight are initialized. This will cause problems when taking the softmax over values that are always non zero even though the weights are not used.

Therefore one should change the values to zero for all connections that are not used and only consider the non zero values when calculating softmax. I would suggest to add some kind of masking to alphas_network paramaters.

Are these total codes?

It seems there are many codes missed, such as utils. When would you release the total codes?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.