Git Product home page Git Product logo

lednet's Introduction

python-image pytorch-image

Table of Contents:

Introduction

This project contains the code (Note: The code is test in the environment with python=3.6, cuda=9.0, PyTorch-0.4.1, also support Pytorch-0.4.1+) for: LEDNet: A Lightweight Encoder-Decoder Network for Real-time Semantic Segmentation by Yu Wang.

The extensive computational burden limits the usage of CNNs in mobile devices for dense estimation tasks, a.k.a semantic segmentation. In this paper, we present a lightweight network to address this problem, namely **LEDNet**, which employs an asymmetric encoder-decoder architecture for the task of real-time semantic segmentation.More specifically, the encoder adopts a ResNet as backbone network, where two new operations, channel split and shuffle, are utilized in each residual block to greatly reduce computation cost while maintaining higher segmentation accuracy. On the other hand, an attention pyramid network (APN) is employed in the decoder to further lighten the entire network complexity. Our model has less than 1M parameters, and is able to run at over 71 FPS on a single GTX 1080Ti GPU card. The comprehensive experiments demonstrate that our approach achieves state-of-the-art results in terms of speed and accuracy trade-off on Cityscapes dataset. and becomes an effective method for real-time semantic segmentation tasks.

Project-Structure

├── datasets  # contains all datasets for the project
|  └── cityscapes #  cityscapes dataset
|  |  └── gtCoarse #  Coarse cityscapes annotation
|  |  └── gtFine #  Fine cityscapes annotation
|  |  └── leftImg8bit #  cityscapes training image
|  └── cityscapesscripts #  cityscapes dataset label convert scripts!
├── utils
|  └── dataset.py # dataloader for cityscapes dataset
|  └── iouEval.py # for test 'iou mean' and 'iou per class'
|  └── transform.py # data preprocessing
|  └── visualize.py # Visualize with visdom 
|  └── loss.py # loss function 
├── checkpoint
|  └── xxx.pth # pretrained models encoder form ImageNet
├── save
|  └── xxx.pth # trained models form scratch 
├── imagenet-pretrain
|  └── lednet_imagenet.py # 
|  └── main.py # 
├── train
|  └── lednet.py  # model definition for semantic segmentation
|  └── main.py # train model scripts
├── test
|  |  └── dataset.py 
|  |  └── lednet.py # model definition
|  |  └── lednet_no_bn.py # Remove the BN layer in model definition
|  |  └── eval_cityscapes_color.py # Test the results to generate RGB images
|  |  └── eval_cityscapes_server.py # generate result uploaded official server
|  |  └── eval_forward_time.py # Test model inference time
|  |  └── eval_iou.py 
|  |  └── iouEval.py 
|  |  └── transform.py 

Installation

  • Python 3.6.x. Recommended using Anaconda3
  • Set up python environment
pip3 install -r requirements.txt
  • Env: PyTorch_0.4.1; cuda_9.0; cudnn_7.1; python_3.6,

  • Clone this repository.

git clone https://github.com/xiaoyufenfei/LEDNet.git
cd LEDNet-master

Datasets

├── leftImg8bit
│   ├── train
│   ├──  val
│   └── test
├── gtFine
│   ├── train
│   ├──  val
│   └── test
├── gtCoarse
│   ├── train
│   ├── train_extra
│   └── val

Training-LEDNet

  • For help on the optional arguments you can run: python main.py -h

  • By default, we assume you have downloaded the cityscapes dataset in the ./data/cityscapes dir.

  • To train LEDNet using the train/main.py script the parameters listed in main.py as a flag or manually change them.

python main.py --savedir logs --model lednet --datadir path/root_directory/  --num-epochs xx --batch-size xx ...

Resuming-training-if-decoder-part-broken

  • for help on the optional arguments you can run: python main.py -h
python main.py --savedir logs --name lednet --datadir path/root_directory/  --num-epochs xx --batch-size xx --decoder --state "../save/logs/model_best_enc.pth.tar"...

Testing

  • the trained models of training process can be found at here. This may not be the best one, you can train one from scratch by yourself or Fine-tuning the training decoder with model encoder pre-trained on ImageNet, For instance
more details refer ./test/README.md

Results

  • Please refer to our article for more details.
Method Dataset Fine Coarse IoU_cla IoU_cat FPS
LEDNet cityscapes yes yes 70.6​% 87.1​%​ 70​+​

qualitative segmentation result examples:

Citation

If you find this code useful for your research, please use the following BibTeX entry.

 @article{wang2019lednet,
  title={LEDNet: A Lightweight Encoder-Decoder Network for Real-time Semantic Segmentation},
  author={Wang, Yu and Zhou, Quan and Liu, Jia and Xiong,Jian and Gao, Guangwei and Wu, Xiaofu, and Latecki Jan Longin},
  journal={arXiv preprint arXiv:1905.02423},
  year={2019}
}

Tips

  • Limited by GPU resources, the project results need to be further improved...
  • It is recommended to pre-train Encoder on ImageNet and then Fine-turning Decoder part. The result will be better.

Reference

  1. Deep residual learning for image recognition
  2. Enet: A deep neural network architecture for real-time semantic segmentation
  3. Erfnet: Efficient residual factorized convnet for real-time semantic segmentation
  4. Shufflenet: An extremely efficient convolutional neural network for mobile devices

lednet's People

Contributors

xiaoyufenfei avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lednet's Issues

输入尺寸和输出尺寸的问题

我并没有看到在输入端把图像resize到1024512,仅仅是把短边等比例缩放到512,长边并没有固定到1024,为何输出的时候,就直接缩放到1024512呢?
class Decoder (nn.Module):
def init(self, num_classes):
super().init()

    self.apn = APN_Module(in_ch=128,out_ch=20)
    # self.upsample = Interpolate(size=(512, 1024), mode="bilinear")
    # self.output_conv = nn.ConvTranspose2d(16, num_classes, kernel_size=4, stride=2, padding=1, output_padding=0, bias=True)
    # self.output_conv = nn.ConvTranspose2d(16, num_classes, kernel_size=3, stride=2, padding=1, output_padding=1, bias=True)
    # self.output_conv = nn.ConvTranspose2d(16, num_classes, kernel_size=2, stride=2, padding=0, output_padding=0, bias=True)

def forward(self, input):
    
    output = self.apn(input)
    out = interpolate(output, size=(512, 1024), mode="bilinear", align_corners=True)
    # out = self.upsample(output)
    # print(out.shape)
    return out

training data

could you tell me how to convert the dataset to 19 categories.I just use the *_color.png picture for training ,but after some step,it will report an error.
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1535490206202/work/aten/src/THC/generated/../generic/THCTensorMathPointwise.cu:266
/opt/conda/conda-bld/pytorch_1535490206202/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [1,0,0], thread: [160,0,0] Assertion t >= 0 && t < n_classes failed.

训练错误

encoder训练正常,decoder训练时出现illegal memory encounter。

使用自己数据集时报错

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1550780889552/work/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu line=128 error=59 : device-side assert triggered
Traceback (most recent call last):
File "main.py", line 518, in
main(parser.parse_args())
File "main.py", line 472, in main
model = train(args, model, True) #Train encoder
File "main.py", line 236, in train
loss = criterion(outputs, targets[:, 0])
File "/home/disk/software/anaconda3/envs/pytorch_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/disk/LEDNet/utils/loss.py", line 15, in forward
return self.loss(F.log_softmax(outputs, dim=0), targets)
File "/home/disk/software/anaconda3/envs/pytorch_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/disk/software/anaconda3/envs/pytorch_env/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 210, in forward
return F.nll_loss(input, target, weight=self.weight, ignore_index=self.ignore_index, reduction=self.reduction)
File "/home/disksoftware/anaconda3/envs/pytorch_env/lib/python3.6/site-packages/torch/nn/functional.py", line 1792, in nll_loss
ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1550780889552/work/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu:128

你好,我在使用自己数据集进行训练的时候报了如下错误,数据集格式按照cityscapes文件制作的,图片大小一致、格式一致。修改了类别数目。但是会报如下错误,如果有时间方便看一下吗,十分感谢。。。

Question about reproduction

I want to implement your paper, but have several questions:

  1. is there missing first convolution before downsampling in the paper ? (from channel 3 to channel 32)
  2. Did you pretrained you encoder in imagenet or just scratch ? (And whould you mind to tell the training epochs you use ?)
  3. What kind of weight init strategy you used?
  4. The training image size

Here are my implementaion : https://github.com/AceCoooool/LEDNet
But there is still some gap to the paper

running time

running time Run time is calculated using eval_forward_time??

About the loss weights

Hi, how is the loss weights (0 ~19) calculated? Based on the number of images? or the number of pixels?

eval speed

i run test/eval_forward_time.py with default params, the mean time is 28ms, when i set image size (2048,1024) ,the mean time is 100ms.

when i set batchsize to 2,4,8, the mean time is all 32ms, so the inference time increases linearly

作者你好 我在训练过程中遇到了如下问题 请解答谢谢你

for step, (images, labels) in enumerate(loader):
print('images = ',images.shape) #images = torch.Size([5, 3, 512, 682])
print('labels = ',labels.shape) #labels = torch.Size([5, 1, 64, 85])
inputs = images.cuda()
targets = labels.cuda()

        start_time = time.time()

        #print('inputs= ',inputs.shape)   # inputs=  torch.Size([5, 3, 512, 682])
        #print('targets= ',targets.shape) # targets=  torch.Size([5, 1, 64, 85])

        imgs_batch = images.shape[0]
        if imgs_batch != args.batch_size:
            break            
                                 
        outputs = model(inputs, only_encode=enc)
        print('outputs = ',outputs)
        print('outputs=',outputs.shape) #torch.Size([5, 2, 64, 86])
        print('targets[:, 0] = ',targets[:, 0].shape) #torch.Size([5, 64, 85])

input经过model()后输出的output的size变成了([5, 2, 64, 86]) 与标签维度不符合 出现了错误 请问什么原因可以造成这样的结果呢?

预训练模型 链接失效

作者你好,感谢你的贡献以及公开代码!但你给的预训练模型链接失效了。 能重新分享一下吗。非常感谢

problems about cityscripts

Can you tell me where the cityscapesscripts are used? I download the whole dataset including cityscapes and cityscapesscripts, but I can't find the use of scripts!

training data

Thank you for your contribution. Can you give more details about training on VOC dataset? I think the number class on VOC dataset is 21(background + other category labels).

achievement of this paper

Thanks for the kind help of @xiaoyufenfei .
I have provided this code in LEDNet, this implementation is nearly close to the author's version.
The mainly difference is the training input-size, I use 768x768 rather than 1024x512.

Anyone who interesting in this paper can use our code before author release their code.

data

THCudaCheck FAIL file=/pytorch/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu line=134 error=710 : device-side assert triggered
Traceback (most recent call last):
File "train/main.py", line 519, in
main(parser.parse_args())
File "train/main.py", line 473, in main
model = train(args, model, True) #Train encoder
File "train/main.py", line 237, in train
loss = criterion(outputs, targets[:, 0])
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/content/drive/My Drive/LEDNet/utils/loss.py", line 15, in forward
return self.loss(F.log_softmax(outputs, dim=1), targets)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/loss.py", line 211, in forward
return F.nll_loss(input, target, weight=self.weight, ignore_index=self.ignore_index, reduction=self.reduction)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 2220, in nll_loss
ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: cuda runtime error (710) : device-side assert triggered at /pytorch/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu:134

Final output conv layer channels mismatch with number of classes

LEDNet/train/lednet.py

Lines 259 to 263 in 073fcd0

self.apn = APN_Module(in_ch=128,out_ch=20)
#self.upsample = Interpolate(size=(512, 1024), mode="bilinear")
#self.output_conv = nn.ConvTranspose2d(16, num_classes, kernel_size=4, stride=2, padding=1, output_padding=0, bias=True)
#self.output_conv = nn.ConvTranspose2d(16, num_classes, kernel_size=3, stride=2, padding=1, output_padding=1, bias=True)
#self.output_conv = nn.ConvTranspose2d(16, num_classes, kernel_size=2, stride=2, padding=0, output_padding=0, bias=True)

Hi, congrats on your impressive work.
I noticed that you commented the output conv layer, which should give an output with same number of channels to the number of classes, so why do you use a transposed convolution instead of a generic one? Since you already put an upsampling layer before that last layer and I guess you don't need to upsample with transposed convolution anymore.

Regards.

Decoder channels

I tried to implement LEDNet in keras and found that I have ~2.5m parameters which is incorrect according to the paper (claim to have ~900K). I saw you wrote two versions of LEDNet one with ~900K and the other like my implementation with ~2.5m. In the 900K implementation, you downsampled the number of channels to 1 straight away from the first conv layer in the downsampling branch of the decoder (why is that?). In the attached figure (from the paper) the number of the channels should remain the same through the first two levels of the downsampling branch of the decoder (and they still claim to have just ~900k parameters). What am I missing here? is this problematic paper?

image

histogram calculate

What way can I use to calculate the histogram of image classes of my own datasets to get the weights in the train/main.py?

训练时出现问题

你好,当我运行python main.py --savedir logs --model lednet --datadir D:/bishe/LEDNet-master/datasets/cityscapes时,跳出错误

========== ENCODER TRAINING ===========
D:/bishe/LEDNet-master/datasets/cityscapes\leftImg8bit/train
D:/bishe/LEDNet-master/datasets/cityscapes\leftImg8bit/val
<class 'utils.loss.CrossEntropyLoss2d'>
----- TRAINING - EPOCH 1 -----
LEARNING RATE: 0.0005
Traceback (most recent call last):
File "main.py", line 510, in
main(parser.parse_args())
File "main.py", line 464, in main
model = train(args, model, True) #Train encoder
File "main.py", line 211, in train
for step, (images, labels) in enumerate(loader):
File "C:\Users\admin\anaconda3\envs\DFANet\lib\site-packages\torch\utils\data\dataloader.py", line 336, in next
return self._process_next_batch(batch)
File "C:\Users\admin\anaconda3\envs\DFANet\lib\site-packages\torch\utils\data\dataloader.py", line 357, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
IndexError: Traceback (most recent call last):
File "C:\Users\admin\anaconda3\envs\DFANet\lib\site-packages\torch\utils\data\dataloader.py", line 106, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "C:\Users\admin\anaconda3\envs\DFANet\lib\site-packages\torch\utils\data\dataloader.py", line 106, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "D:\bishe\LEDNet-master\utils\dataset.py", line 86, in getitem
filenameGt = self.filenamesGt[index]
IndexError: list index out of range

请问这个问题应该怎么解决

about "encoder pretrained model" and "final pretrained model"

Thanks for your great work!
I read your code carefully, my question are as follows:

  • Can you upload the pretrained model for the encoder? This will make it easier for us to reproduce your results.
  • Can you explain your hyper parameters(like initial learning rate) when pretrain the encoder?There is no explanation for these contents in the paper.
  • Can you upload the pretrained model for the final model? The link in README.md is invalid.

Thanks for your work again!

About inference speed

I am using the eval_forward_time.py test the inference speed.
But the inference speed is quite slow, I only get around 0.20.

Forward time per img (b=1): 0.014 (Mean: 0.020)
Forward time per img (b=1): 0.014 (Mean: 0.019)
Forward time per img (b=1): 0.014 (Mean: 0.019)
Forward time per img (b=1): 0.014 (Mean: 0.019)
Forward time per img (b=1): 0.036 (Mean: 0.020)
Forward time per img (b=1): 0.014 (Mean: 0.019)
Forward time per img (b=1): 0.031 (Mean: 0.020)
Forward time per img (b=1): 0.011 (Mean: 0.020)
Forward time per img (b=1): 0.011 (Mean: 0.019)
Forward time per img (b=1): 0.011 (Mean: 0.019)
Forward time per img (b=1): 0.037 (Mean: 0.020)
Forward time per img (b=1): 0.013 (Mean: 0.019)
Forward time per img (b=1): 0.037 (Mean: 0.020)
Forward time per img (b=1): 0.011 (Mean: 0.020)
Forward time per img (b=1): 0.011 (Mean: 0.019)
Forward time per img (b=1): 0.011 (Mean: 0.019)
Forward time per img (b=1): 0.014 (Mean: 0.019)
Forward time per img (b=1): 0.038 (Mean: 0.019)
Forward time per img (b=1): 0.014 (Mean: 0.019)
Forward time per img (b=1): 0.014 (Mean: 0.019)
Forward time per img (b=1): 0.037 (Mean: 0.020)
Forward time per img (b=1): 0.036 (Mean: 0.020)
Forward time per img (b=1): 0.013 (Mean: 0.020)
Forward time per img (b=1): 0.014 (Mean: 0.020)
Forward time per img (b=1): 0.014 (Mean: 0.020)
Forward time per img (b=1): 0.037 (Mean: 0.020)
Forward time per img (b=1): 0.012 (Mean: 0.020)
Forward time per img (b=1): 0.014 (Mean: 0.020)
Forward time per img (b=1): 0.014 (Mean: 0.020)
Forward time per img (b=1): 0.014 (Mean: 0.019)
Forward time per img (b=1): 0.037 (Mean: 0.020)
Forward time per img (b=1): 0.011 (Mean: 0.020)
Forward time per img (b=1): 0.014 (Mean: 0.020)
Forward time per img (b=1): 0.036 (Mean: 0.020)
Forward time per img (b=1): 0.014 (Mean: 0.020)
Forward time per img (b=1): 0.011 (Mean: 0.020)
Forward time per img (b=1): 0.038 (Mean: 0.020)
Forward time per img (b=1): 0.039 (Mean: 0.020)
Forward time per img (b=1): 0.014 (Mean: 0.020)
Forward time per img (b=1): 0.014 (Mean: 0.020)

Thegpu is nvidia rtx 2080 Ti.
I have set the inference on gpu.
parser.add_argument('--cpu', action='store_true', default=False)

Label size of eval_cityscapes_server.py

It seems that your code of eval_cityscapes_server.py produces labels with the resolution of 512X1024(not1024X2048). So I confused whether the Cityscapes test server can eval it correctly?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.