vita-group / fasterseg Goto Github PK

[ICLR 2020] "FasterSeg: Searching for Faster Real-time Semantic Segmentation" by Wuyang Chen, Xinyu Gong, Xianming Liu, Qian Zhang, Yuan Li, Zhangyang Wang

License: MIT License

Python 100.00%

pytorch neural-architecture-search semantic-segmentation latency tensorrt cityscapes

fasterseg's Introduction

FasterSeg: Searching for Faster Real-time Semantic Segmentation [PDF]

Wuyang Chen, Xinyu Gong, Xianming Liu, Qian Zhang, Yuan Li, Zhangyang Wang

In ICLR 2020.

Overview

Our predictions on Cityscapes Stuttgart demo video #0

We present FasterSeg, an automatically designed semantic segmentation network with not only state-of-the-art performance but also faster speed than current methods.

Highlights:

Novel search space: support multi-resolution branches.
Fine-grained latency regularization: alleviate the "architecture collapse" problem.
Teacher-student co-searching: distill the teacher to the student for further accuracy boost.
SOTA: FasterSeg achieves extremely fast speed (over 30% faster than the closest manually designed competitor on CityScapes) and maintains competitive accuracy.
- see our Cityscapes submission here.

Methods

Prerequisites

Ubuntu 16.04
Python 3.6.8
CUDA 10.1 (lower versions may work but were not tested)
NVIDIA GPU (>= 11G graphic memory) + CuDNN v7.3

This repository has been tested on GTX 1080Ti. Configurations (e.g batch size, image patch size) may need to be changed on different platforms.

Installation

Clone this repo:

git clone https://github.com/chenwydj/FasterSeg.git
cd FasterSeg

Install dependencies:

pip install -r requirements.txt

Install PyCuda which is a dependency of TensorRT.
Install TensorRT (v5.1.5.0): a library for high performance inference on NVIDIA GPUs with Python API.

Usage

Work flow: pretrain the supernet → search the archtecture → train the teacher → train the student.
You can monitor the whole process in the Tensorboard.

0. Prepare the dataset

Download the leftImg8bit_trainvaltest.zip and gtFine_trainvaltest.zip from the Cityscapes.
Prepare the annotations by using the createTrainIdLabelImgs.py.
Put the file of image list into where you save the dataset.
Remember to properly set the C.dataset_path in the config files mentioned below.

1. Search

cd search

1.1 Pretrain the supernet

We first pretrain the supernet without updating the architecture parameter for 20 epochs.

Set C.pretrain = True in config_search.py.
Start the pretrain process:

CUDA_VISIBLE_DEVICES=0 python train_search.py

The pretrained weight will be saved in a folder like FasterSeg/search/search-pretrain-256x512_F12.L16_batch3-20200101-012345.

1.2 Search the architecture

We start the architecture searching for 30 epochs.

Set the name of your pretrained folder (see above) C.pretrain = "search-pretrain-256x512_F12.L16_batch3-20200101-012345" in config_search.py.
Start the search process:

CUDA_VISIBLE_DEVICES=0 python train_search.py

The searched architecture will be saved in a folder like FasterSeg/search/search-224x448_F12.L16_batch2-20200102-123456.
arch_0 and arch_1 contains architectures for teacher and student networks, respectively.

2. Train from scratch

cd FasterSeg/train
Copy the folder which contains the searched architecture into FasterSeg/train/ or create a symlink via ln -s ../search/search-224x448_F12.L16_batch2-20200102-123456 ./

2.1 Train the teacher network

Set C.mode = "teacher" in config_train.py.

Set the name of your searched folder (see above) C.load_path = "search-224x448_F12.L16_batch2-20200102-123456" in config_train.py. This folder contains arch_0.pt and arch_1.pth for teacher and student's architectures.
Start the teacher's training process:

CUDA_VISIBLE_DEVICES=0 python train.py

The trained teacher will be saved in a folder like train-512x1024_teacher_batch12-20200103-234501

2.2 Train the student network (FasterSeg)

Set C.mode = "student" in config_train.py.

Set the name of your searched folder (see above) C.load_path = "search-224x448_F12.L16_batch2-20200102-123456" in config_train.py. This folder contains arch_0.pt and arch_1.pth for teacher and student's architectures.
Set the name of your teacher's folder (see above) C.teacher_path = "train-512x1024_teacher_batch12-20200103-234501" in config_train.py. This folder contains the weights0.pt which is teacher's pretrained weights.
Start the student's training process:

CUDA_VISIBLE_DEVICES=0 python train.py

3. Evaluation

Here we use our pretrained FasterSeg as an example for the evaluation.

cd train

Set C.is_eval = True in config_train.py.
Set the name of the searched folder as C.load_path = "fasterseg" in config_train.py.
Download the pretrained weights of the teacher and student and put them into folder train/fasterseg.

Start the evaluation process:

CUDA_VISIBLE_DEVICES=0 python train.py

You can switch the evaluation of teacher or student by changing C.mode in config_train.py.

4. Test

We support generating prediction files (masks as images) during training.

Set C.is_test = True in config_train.py.
During the training process, the prediction files will be periodically saved in a folder like train-512x1024_student_batch12-20200104-012345/test_1_#epoch.
Simply zip the prediction folder and submit to the Cityscapes submission page.

5. Latency

5.0 Latency measurement tools

If you have successfully installed TensorRT, you will automatically use TensorRT for the following latency tests (see function here).
Otherwise you will be switched to use Pytorch for the latency tests (see function here).

5.1 Measure the latency of the FasterSeg

Run the script:

CUDA_VISIBLE_DEVICES=0 python run_latency.py

5.2 Generate the latency lookup table:

cd FasterSeg/latency
Run the script:

CUDA_VISIBLE_DEVICES=0 python latency_lookup_table.py

which will generate an .npy file. Be careful not to overwrite the provided latency_lookup_table.npy in this repo.

The .npy contains a python dictionary mapping from an operator to its latency (in ms) under specific conditions (input size, stride, channel number etc.)

Citation

@inproceedings{chen2020fasterseg,
  title={FasterSeg: Searching for Faster Real-time Semantic Segmentation},
  author={Chen, Wuyang and Gong, Xinyu and Liu, Xianming and Zhang, Qian and Li, Yuan and Wang, Zhangyang},
  booktitle={International Conference on Learning Representations},
  year={2020}
}

Acknowledgement

Segmentation training and evaluation code from BiSeNet.
Search method from the DARTS.
slimmable_ops from the Slimmable Networks.
Segmentation metrics code from PyTorch-Encoding.

fasterseg's People

Contributors

Stargazers

Watchers

Forkers

dbofseuofhust kepengxu yang-fei templeblock jameslong95 chaos1992 xingliujia yogsin wangweidamon gasparian baiti01 chrislee2012 youtang1993 gaimjkp yuv4r4j nnu-gisa tamwaiban djiajunustc oym1994 haoyuanz13 asdlei99 saqibmamoon anguoyang prpankajsingh agentoo7 phymucs bwosh kunato areslp huizhang0110 sandeepangara paul-ang shuangshen maziteng tangyoubao liuguoyou nooneust ashkanaev swdever wenhuach ducthangqd1998 hajungong007 mx1mx2 collector-m lwchn xrosliang qihuacheng gaussianer mohamedalirashad lgqfhwy chengjian2018 ppl520zfl killsking hqss liuyao2019 happog ming1993li shunlu91 wom-ai kadeng libonwpu noticeable xuyangcao cv-ip dongzhang89 fagan2888 tomsirliu 013292 undercontroller hitsz-zuoqi atomeyang sailfish009 coolloveboy david-chenn quuhua911 fangyuchuan ed830418 zhangzheng0131 sg47 wan1995 yammyd xiaxx244 medical-projects kingshark1 wobjtushisui fabii94lj jingziyou amtf1683 joaoch songzijiang ardywibowo yemx21 liuqinglong110 wavelet2008 drzhoukarl joene-zhou zouwen198317 dolijun the-dropouts wzb1005

fasterseg's Issues

inferencing on a cpu

@chenwydj
can the final trained model use only cpu for inferencing?

How were paths with skip op being compact ?

Hi,

Thanks for giving the excellent research on extending NAS into multi-path realm.

While reading alphas2ops_path_width @ model_seg.py (https://github.com/TAMU-VITA/FasterSeg/blob/master/search/model_seg.py#L40), i have noticed that you have also conduct compacting the path when skip is the argmax op.
Could you kindly explain briefly about the logic behind this method ?

symbol lookup error： undefined symbol：PySlice_Unpack

您好，麻烦您在ReadMe里补充说明python3的具体版本，我查了以下这个问题，PySlice_Unpack在python3.6.1才被引入，解决办法是使用python3.6.2，我之前用的python3.6.0旧遇到这个bug。希望更严谨一些，其他环境配置要求也希望如此完善一下，谢谢。

How to reach mAP 0.69 on validation set

After searching and training on the val set, I can only get 0.69 mAP on training set, on val set it is only 0.64. Do I need to try search and train many times in order to get the claimed result? Or it needs more than 600 epochs?

Head: Paper vs Code

Hello !

In the paper, the heads are fairly clearly defined with the picture.

In your code, I see that and I don't exactly understand what this is for :

Could I use the search result to train model directly ?

Thank you for your great work!!! Search process may take a long time to run. Want use the citysacpe dataset search result model to train my own data at first. But I can't find that code.

Training with custom data with different resolution to cityscapes dataset

Hi, i'm interested in training this with a custom dataset I created from scratch.
So far I have tried following the steps in here, but I can't get pass the pretrain step.

The output i'm getting is:

The error doesn't tell me too much, just that it is inside model_search.py the momento the logits are calculated which makes me believe that the problem might be with the input, I'm wondering if the resolutions of the images matters at this point, my images are 576x640, but I have noticed that inside the config the crop size always keeps the aspect ratio of the images from cityscapes, could that be the problem? I also notices that inside train_search.py when the model is created the input has the cityscapes resolution hardcoded, should I change it to my resolution?

One extra question, for testing purposes my dataset only has one object, should the number of clases be 1 o the background counts as another class?

Thank you for your work.

Latency issue

Thank you for your great job. I have one question about latency issue.

When I checked your code, I found that you use bilinear interpolation in train/operations.py

but, In the test part you used nearest interpolation in latency/operations.py.

Can you tell me why?
(I'm asking you this question because nearest interpolation is faster than bilinear interpolation in pytorch)

you use as below in train/operations.py
out = F.interpolate(x, size=(int(x.size(2))//2, int(x.size(3))//2), mode='bilinear', align_corners=True)
out = F.interpolate(out, size=(int(x.size(2)), int(x.size(3))), mode='bilinear', align_corners=True)

you use as below in latency/operations.py
out = F.interpolate(x, size=(int(x.size(2))//2, int(x.size(3))//2), mode='nearest')
out = F.interpolate(out, size=(int(x.size(2)), int(x.size(3))), mode='nearest')

Does this code support multi-GPU training?

Why don't you sample Alphas and Betas using Gumbel-Softmax?

In your model_search.py code you simply use F.softmax(...) on alphas and betas to learn the super-cell and down-sampling connections (line 275):

        alphas0 = F.softmax(getattr(self, self._arch_names[self.arch_idx]["alphas"][0]), dim=-1)
        alphas1 = F.softmax(getattr(self, self._arch_names[self.arch_idx]["alphas"][1]), dim=-1)
        alphas2 = F.softmax(getattr(self, self._arch_names[self.arch_idx]["alphas"][2]), dim=-1)
        alphas = [alphas0, alphas1, alphas2]
        betas1 = F.softmax(getattr(self, self._arch_names[self.arch_idx]["betas"][0]), dim=-1)
        betas2 = F.softmax(getattr(self, self._arch_names[self.arch_idx]["betas"][1]), dim=-1)
        betas = [None, betas1, betas2]`

I was wondering why you didn't use hard Gumbel-Softmax sampling here (like you did with the ratios)?

I think that using plain softmax is actually very dangerous as the optimizer learns to output optimal weighted combinations of feature-maps from all available operations.

Furthermore, I observed in a small Experiment using only 2 Operations (Conv and Skip) that using plaing softmax didn't converge at all but Gumbel-Softmax-Sampling leads to good alphas.

What do you think? Did I miss something?

ValueError: not enough values to unpack (expected 2, got 0)

Hello, Thanks for your job!!When i use your code i meet a question how can i solve it?

05 21:04:37 params = 2.579183MB, FLOPs = 71.419396GB
architect initialized!
using downsampling: 2
Found 1487 images
using downsampling: 2
Found 1488 images
using downsampling: 2
Found 500 images
0%| | 0/20 [00:00<?, ?it/s]05 21:04:38 True
05 21:04:38 search-pretrain-256x512_F12.L16_batch3-20200605-210423
05 21:04:38 lr: 0.02
05 21:04:38 update arch: False
[00:00<?,?it/s]ain...]: 0%| | 0/20 [00:00<?, ?it/s]
[Epoch 1/20][train...]: 0%| | 0/20 [00:00<?, ?it/s]
Traceback (most recent call last):
File "train_search.py", line 303, in
main(pretrain=config.pretrain)
File "train_search.py", line 134, in main
train(pretrain, train_loader_model, train_loader_arch, model, architect, ohem_criterion, optimizer, lr_policy, logger, epoch, update_arch=update_arch)
File "train_search.py", line 223, in train
minibatch = dataloader_model.next()
File "/home/kukby/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in next
data = self._next_data()
File "/home/kukby/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data
return self._process_data(data)
File "/home/kukby/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
data.reraise()
File "/home/kukby/.local/lib/python3.6/site-packages/torch/_utils.py", line 394, in reraise
raise self.exc_type(msg)
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/kukby/.local/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/kukby/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/kukby/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/kukby/FasterSeg/tools/datasets/BaseDataset.py", line 42, in getitem
img, gt = self._fetch_data(img_path, gt_path)
File "/home/kukby/FasterSeg/tools/datasets/BaseDataset.py", line 67, in _fetch_data
gt = self._open_image(gt_path, cv2.IMREAD_GRAYSCALE, dtype=dtype, down_sampling=self._down_sampling)
File "/home/kukby/FasterSeg/tools/datasets/BaseDataset.py", line 130, in _open_image
H, W = img.shape[:2]
ValueError: not enough values to unpack (expected 2, got 0)

How to train a new model with supervisely-person dataset

Hello
How are you?
Thanks for contributing this project.
I am going to train a new model for ONLY human semantic segmentation with supervisely-person-dataset.
How can I train?
Thanks

Parse Issue of TensorRT

When I use the computation code of the tensorrt in your repo, there is a problem.

When the TensorRT ONNXParser parses the onnx model, the returend value is false. And the TensorRT gives an error: Network must have at leaset one output.

Do you have meet the similar problem? And how to solve it?

How to train FasterSeg with customized labels?

Hello,
it's very interesting to train and use FasterSeg with own custom data.
To get information about that, I read the comments of this issue description.

Based on the description linked above, I did the following steps:

prepare my own labels in labels.py
create my own annotated pictures with createTrainIdLabelImgs.py
2.1 make sure that hight and width are divisible by 32
prepare file name lists as "dataset files" for data loading see examples here
adjust the config file with number of classes

Are there any other points in the FasterSeg repository which I need to adjust for using customized labels?

For example:
In cityscapes.py, camvid.py and bdd.py are some methodes like get_class_colors() and get_class_names() which return color or class names of the cityscapes data.

Is it neccessary to add the customized labels to this methods?
For which purpose are this methods?

It would be great if you can give me some hints to answer this questions, so I can run the training process with customized labels.

Best regards

NaN values in loss function during Step 2.2

Hey @chenwydj ,

during the step 2.2 with custom labels, the value of the loss function leads to "NaN" in the last few steps (see image below).

Everything appers to be ok till epoch 341.

From epoch 351 on, something is wrong with the mIoU. All labels except "unlabeld" reach a mIoU of 0%.

It seems to be a numerical stability problem.
Do you agree with that?
Do you have an idea to solve this problem?

It would be great if you can give me some hints to answer this questions.

Kind regards.

I reported an error on my own data set while I was infering

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 1. Got 19 and 20 in dimension 2 at /pytorch/aten/src/THC/generic/THCTensorMath.cu:71

Best regards

How to train for a new dataset ?

So i have been playing a little bit with the config_search.py to adapt my new dataset to the pipeline but i kept getting this error

RuntimeError: merge_sort: failed on 2nd step: device-side assert triggered

After some research i knew it's a problem with the configuration i did to the dataset but the problem is that i don't know what every parameter do, so can anyone elaborate on this ?

Typo error in readme file

This repository has been tested on GTX 1808Ti. Configurations (e.g batch size, image patch size) may need to be changed on different platforms.

should be GTX 1080Ti?

dataset details

Why did you search directly in the train set and did not sample the val set from the cityscapes train set?
Will the searched arch not overfit in the cityscapes val set?

What is the difference between arch and model? Why should we have them both?

Getting lower mIOU than reported in paper

One of the best works for semantic seg with NAS. Thank You for sharing train as well as search code.

I downloaded student's weights from the location mentioned in the readme https://drive.google.com/file/d/1O56HnA0ug2M3K4SR3_AUzIs0wegy9BX6/view?usp=sharing and ran train.py to check the validation accuracy but I got 70.5%.
But your paper suggest it should generate 73.1%. Am I doing something wrong?

Thanks.

Can the framework train the sun-rgbd data set?

TranA and TrainB is overlapped according to your implementation

Hi,
According to the statement of ur paper, the training dataset are split into two parts，one is to update model weights and the other to update arch params ，which are thought to be disjointed. but they are not in your implemetaton, you can verify this via calling the _get_file_names method of Dataset , I think this may lead some of the training data never be used during Arch searching, Am I right?
Correct me if I were wrong.

How to get the semantic segmentation result?

Runtime error: Cuda error:out of memory

Hi
Thanks for sharing your nice work. i have found an issue while using your code for training on my Nvidia GTX 1080 ti card.
Can you please help me about this issue?

The issue is

when i run train_search.py it generate the Runtime Error. The error pic is attached

How to visulaize the result？

THX for your job!!!!
How can I get the result after segmatic segmentation?

How do you handle fluctuations in latency measurement?

I noticed that the measured latencies differ substantially from run to run.
For example I get following results at different program runs:

BasicResidual_downup_2x_H24_W24_Cin80_Cout80_stride1: 0.0549 ms
BasicResidual_downup_2x_H12_W12_Cin80_Cout80_stride1: 0.0511 ms
BasicResidual_downup_2x_H24_W24_Cin80_Cout80_stride1: 0.0517 ms
BasicResidual_downup_2x_H12_W12_Cin80_Cout80_stride1: 0.0502 ms
BasicResidual_downup_2x_H24_W24Cin80_Cout80_stride1: 0.0505 ms
BasicResidual_downup_2xH12_W12_Cin80_Cout80_stride1: 0.0513 ms

Note that in the last run the operation on quarter resolution (12x12) took longer than the operation on the base resolution (24x24).

This will substantially influence the network-architecture search as the optimizer will prefer pathes on higher resolutions.

How did you make sure that your measured latencies "make sense"? Did you set static GPU clocks or similar to get stable results?

RuntimeError: merge_sort: failed to synchronize: device-side assert triggered

I am trying to train FasterSeg for a custom dataset with six classes. I have formatted the annotations and written datasets class just like cityscapes.py. I am having issue while Pretraining the supernet (Section 1.1 in readMe.md)

CUDA Version: 10.2
torchvision : 0.3.0
torch : 1.1.0

Traceback (most recent call last):
File "train_search.py", line 304, in
main(pretrain=config.pretrain)
File "train_search.py", line 133, in main
train(pretrain, train_loader_model, train_loader_arch, model, architect, ohem_criterion, optimizer, lr_policy, logger, epoch, update_arch=update_arch)
File "train_search.py", line 243, in train
loss = model._loss(imgs, target, pretrain)
File "/home/soccer/Desktop/Muaz/FasterSeg/search/model_search.py", line 491, in _loss
loss = loss + sum(self._criterion(logit, target) for logit in logits)
File "/home/soccer/Desktop/Muaz/FasterSeg/search/model_search.py", line 491, in
loss = loss + sum(self._criterion(logit, target) for logit in logits)
File "/home/soccer/anaconda3/envs/pipeline_cloned/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/soccer/Desktop/Muaz/FasterSeg/tools/seg_opr/loss_opr.py", line 81, in forward
index = mask_prob.argsort()
RuntimeError: merge_sort: failed to synchronize: device-side assert triggered

Before this error i get a number of CUDA errors but that doesn't crash the code

/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [2,0,0], thread: [127,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [2,0,0], thread: [61,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.

NOTE: I recreated the experiment on citspscapes dataset and i am still encountering the same issue.

如何用自己的数据来训练这个模型？

您好！这个模型的高性能高效率非常棒，我打算在我的研究项目中用它来做道路分割，但我在这方面没有研究过，暂时只想将算法应用起来，请问后期可以出一个“如何从自己的原始图片数据集来训练该网络”的教程吗？给只想应用的后来者们一个清晰的指导，谢谢啦

how to use multi-gpu training?

How can FasterSeg be used to perform inference on a video?

Hello,

how can I run the inference of FasterSeg on a video?

I saw that you created the gif in the readme using a script. Did you use the tester.py script for this?
If so, how do you use this script?

Best regards

How to prepare the training dataset from custom dataset

Hello
How are you?
Thanks for contributing this project.
I am going to train a new model with my custom dataset (supervisely-person-dataset).
How should I prepare the training dataset?
Thanks

inference speed

Can you provide some compare about TensorRT and Pytorch at inference speed? thanks.

Where does weight sharing takes place in model_search.py?

In your paper you write:

It is worth noting that, the multi-resolution branches will share both cell weights and feature maps
if their cells are of the same operator type, spatial resolution, and expansion ratio. This design
contributes to a faster network. Once cells in branches diverge, the sharing between the branches
will be stopped and they become individual branches (See Figure 6).

This gives the impression that cells actually can choose their downsampling-rate (s = [8, 16, 32] ) freely. But this is not reflected in your code, where you initialize each cell at a fixed downsampling-rate:
https://github.com/TAMU-VITA/FasterSeg/blob/bb52a004ff83f64d3dd8989104234fdb862d1cc5/search/model_search.py#L153

Furthermore, when cells are placed at fixed downsampling-rates I don't get how weight sharing according to your paper (above quote) between branches can ever take place as the spatial resolutions always differ between different branches.

tl;dr:
Can you elaborate on how you can fuse cells from different branches even when they operate always (according to your code) on different spatial resolutions?

CamVid dataset

Hi,

In the paper you mentioned about using CamVid as another dataset for experiment. Would you mind to share the code of CamVid as well ?

There are deprecations in torch 1.1.0

Traceback (most recent call last):
File "E:/vSLAM/FasterSeg-master/train/train.py", line 28, in
from eval import SegEvaluator
File "E:\vSLAM\FasterSeg-master\train\eval.py", line 9, in
from engine.evaluator import Evaluator
File "E:\vSLAM\FasterSeg-master\tools\engine\evaluator.py", line 11, in
from utils.pyt_utils import load_model, link_file, ensure_dir
File "E:\vSLAM\FasterSeg-master\tools\utils\pyt_utils.py", line 23, in
def reduce_tensor(tensor, dst=0, op=dist.ReduceOp.SUM, world_size=1):
AttributeError: module 'torch.distributed' has no attribute 'ReduceOp'

pytorh=1.1.0 windows10

Bad link of [ file of image list ] in Prepare the dataset

link in file of image list is not available

Support of the training-independent inference script with pre-trained model

Hello
How are you?
Thanks for contributing this project.
I think that the inference with pre-trained model is dependent on training dataset.
Could u provide an independent script for test inference?
Thanks.

Can it be used for interior semantic segmentation?

Any work based on SUNRGBD data sets?

pretrained model to test 1 image

Do you have a pretrained model and test code which can do inference on a single image?

Thanks,

How to use Tensorboard within the FasterSeg project?

Hello, FasterSeg is very interesting as part of a research project at our university. We would like to train it with our own data set.

How can I monitor the whole process with Tensorboard? In which directory do I have to be and which command do I have to execute?

We have forked your repository for this and are in the process of providing a guide on how to train FasterSeg on custom data.

We already provide a docker file for the training environment. Other required scripts are currently being implemented.

See https://github.com/Gaussianer/FasterSeg

Best regards

CUDA_VISIBLE_DEVICES=0 python train_search.py when we run this command getting error?

Your work seem to be good enough to see but i have some doubts When i tried running the above command it showing error
Traceback (most recent call last):
File "train_search.py", line 23, in
from datasets.cityscapes.cityscapes import Cityscapes
File "/media/drive/2f08d5f1-1439-4edf-9bb4-1d19896475a5/Naveen/FasterSeg/tools/datasets/init.py", line 1, in
from .cityscapes import Cityscapes
File "/media/drive/2f08d5f1-1439-4edf-9bb4-1d19896475a5/Naveen/FasterSeg/tools/datasets/cityscapes/init.py", line 1, in
from .cityscapes import Cityscapes
File "/media/drive/2f08d5f1-1439-4edf-9bb4-1d19896475a5/Naveen/FasterSeg/tools/datasets/cityscapes/cityscapes.py", line 6, in
class Cityscapes(BaseDataset):
TypeError: module.init() takes at most 2 arguments (3 given)
May i know what is the reason for this error @chenwydj @TianlongChenTAMU

Error about TensorRT and Pycuda

When I run CUDA_VISIBLE_DEVICES=1 python train_search.py, I get this
/export/data/lwangcg/FasterSeg/tools/utils/darts_utils.py:179: UserWarning: TensorRT (or pycuda) is not installed. compute_latency_ms_tensorrt() cannot be used.
warnings.warn("TensorRT (or pycuda) is not installed. compute_latency_ms_tensorrt() cannot be used.")

I have tried to install Pycuda and TensorRT, but I might did not do it successfully. I get a warning in Pycuda test and fail in TensorRT test. Can you kindly help me with this? Note, I have no root privilege to the system(Centos 7)

FasterSeg:

TensorRT:

Pycuda:

flop calcuate

--- Logging error ---
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/logging/__init__.py", line 994, in emit
    msg = self.format(record)
  File "/usr/local/lib/python3.6/logging/__init__.py", line 840, in format
    return fmt.format(record)
  File "/usr/local/lib/python3.6/logging/__init__.py", line 577, in format
    record.message = record.getMessage()
  File "/usr/local/lib/python3.6/logging/__init__.py", line 338, in getMessage
    msg = msg % self.args
TypeError: not all arguments converted during string formatting
Call stack:
  File "train_search.py", line 306, in <module>
    main(pretrain=config.pretrain)
  File "train_search.py", line 68, in main
    flops, params = profile(model, inputs=(torch.randn(1, 3, 1024, 2048)))
  File "/usr/local/lib/python3.6/site-packages/thop/profile.py", line 97, in profile
    model.apply(add_hooks)
  File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 247, in apply
    module.apply(fn)
  File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 247, in apply
    module.apply(fn)
  File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 247, in apply
    module.apply(fn)
  [Previous line repeated 4 more times]
  File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 248, in apply
    fn(self)
  File "/usr/local/lib/python3.6/site-packages/thop/profile.py", line 87, in add_hooks
    logger.info("THOP has not implemented counting method for ", m)
Message: 'THOP has not implemented counting method for '
Arguments: (USConv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False),)
30 22:57:41 Register FLOP counter for module BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module Conv2d(384, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(384, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(192, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(192, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(128, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(192, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(192, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(128, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(96, 19, kernel_size=(1, 1), stride=(1, 1))
30 22:57:41 Register FLOP counter for module Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(64, 19, kernel_size=(1, 1), stride=(1, 1))
30 22:57:41 Register FLOP counter for module Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(96, 19, kernel_size=(1, 1), stride=(1, 1))
30 22:57:41 Register FLOP counter for module Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(64, 19, kernel_size=(1, 1), stride=(1, 1))
30 22:57:41 Register FLOP counter for module Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(96, 19, kernel_size=(1, 1), stride=(1, 1))
30 22:57:41 Register FLOP counter for module Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(64, 19, kernel_size=(1, 1), stride=(1, 1))
30 22:57:41 Register FLOP counter for module Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(192, 19, kernel_size=(1, 1), stride=(1, 1))
30 22:57:41 Register FLOP counter for module Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(128, 19, kernel_size=(1, 1), stride=(1, 1))
30 22:57:41 Register FLOP counter for module Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(192, 19, kernel_size=(1, 1), stride=(1, 1))
30 22:57:41 Register FLOP counter for module Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(128, 19, kernel_size=(1, 1), stride=(1, 1))
Traceback (most recent call last):
  File "train_search.py", line 306, in <module>
    main(pretrain=config.pretrain) 
  File "train_search.py", line 68, in main
    flops, params = profile(model, inputs=(torch.randn(1, 3, 1024, 2048)))
  File "/usr/local/lib/python3.6/site-packages/thop/profile.py", line 100, in profile
    model(*inputs)
  File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/cephfs_new_wj/cv/xiaxin/NAS/nas_seg/FasterSeg/search/model_search.py", line 287, in forward
    out_prev = [[stem(input), None]] # stem: one cell
  File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/cephfs_new_wj/cv/xiaxin/NAS/nas_seg/FasterSeg/search/operations.py", line 125, in forward
    assert x.size()[1] == self.C_in, "{} {}".format(x.size()[1], self.C_in)
AssertionError: 1024 3

thanks for your excellent work!

when I run CUDA_VISIBLE_DEVICES=0 python train_search.py, The code running here produces the above error

invalid argument 0: Sizes of tensors must match except in dimension 1. Got 19 and 20 in dimension 2

to change myself datasets will happen to this error.

Test Issue

Hello. I'm try to do training,validate and test. The first two process work well but when I modify the C.is_eval = True.
The error will be :
Traceback (most recent call last):
File "train.py", line 295, in
main()
File "train.py", line 82, in main
train_loader = get_train_loader(config, Cityscapes, test=config.is_test)
File "/opt/data/private/FasterSeg-master/train/dataloader.py", line 52, in get_train_loader
train_dataset = dataset(data_setting, "train", train_preprocess, config.batch_size * config.niters_per_epoch)
File "../tools/datasets/BaseDataset.py", line 23, in init
self._file_names = self._get_file_names(split_name)
File "../tools/datasets/BaseDataset.py", line 80, in _get_file_names
with open(source) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/opt/data/private/FasterSeg-master/tools/datasets/cityscapes/cityscapes_train_val_fine.txt'

I open the dir and find there is no such file, does it mean that I should merge the trainset and valset for training?
Could you help me? Thank you!

The inference speed is too slow

Why do you weight latencies from different scales differently?

In your code you weight the latency coming from each branch differently with

scale_latency_weights=[3./12, 4./12, 5./12]

in:
https://github.com/TAMU-VITA/FasterSeg/blob/bb52a004ff83f64d3dd8989104234fdb862d1cc5/search/model_search.py#L474

Can you elaborate on why you do this?

multi gpu training

Thanks for your excellent work !

Training is very slow. Does it support distributed training?

Looking for weight file for 73.1% cs val accuracy with Student model

One of the best works for semantic seg with NAS. Thank You for sharing train as well as search code.

I downloaded student's weights from the location mentioned in the readme https://drive.google.com/file/d/1O56HnA0ug2M3K4SR3_AUzIs0wegy9BX6/view?usp=sharing and ran train.py to check the validation accuracy but I got 70.5%.
Is this model supposed to generate 70.5% or 73.1% val accuracy?
If there is some other weights for getting 73.1% can you pls share it?

Thanks.

vita-group / fasterseg Goto Github PK

fasterseg's Introduction

FasterSeg: Searching for Faster Real-time Semantic Segmentation [PDF]

Overview

Methods

Prerequisites

Installation

Usage

0. Prepare the dataset

1. Search

1.1 Pretrain the supernet

1.2 Search the architecture

2. Train from scratch

2.1 Train the teacher network

2.2 Train the student network (FasterSeg)

3. Evaluation

4. Test

5. Latency

5.0 Latency measurement tools

5.1 Measure the latency of the FasterSeg

5.2 Generate the latency lookup table:

Citation

Acknowledgement

fasterseg's People

Contributors

Stargazers

Watchers

Forkers

fasterseg's Issues

Recommend Projects

Recommend Topics

Recommend Org