Git Product home page Git Product logo

fasterseg's Introduction

FasterSeg: Searching for Faster Real-time Semantic Segmentation [PDF]

Language grade: Python License: MIT

Wuyang Chen, Xinyu Gong, Xianming Liu, Qian Zhang, Yuan Li, Zhangyang Wang

In ICLR 2020.

Overview

Cityscapes
Our predictions on Cityscapes Stuttgart demo video #0

We present FasterSeg, an automatically designed semantic segmentation network with not only state-of-the-art performance but also faster speed than current methods.

Highlights:

  • Novel search space: support multi-resolution branches.
  • Fine-grained latency regularization: alleviate the "architecture collapse" problem.
  • Teacher-student co-searching: distill the teacher to the student for further accuracy boost.
  • SOTA: FasterSeg achieves extremely fast speed (over 30% faster than the closest manually designed competitor on CityScapes) and maintains competitive accuracy.
    • see our Cityscapes submission here.

Cityscapes

Methods

supernet

fasterseg

Prerequisites

  • Ubuntu 16.04
  • Python 3.6.8
  • CUDA 10.1 (lower versions may work but were not tested)
  • NVIDIA GPU (>= 11G graphic memory) + CuDNN v7.3

This repository has been tested on GTX 1080Ti. Configurations (e.g batch size, image patch size) may need to be changed on different platforms.

Installation

  • Clone this repo:
git clone https://github.com/chenwydj/FasterSeg.git
cd FasterSeg
  • Install dependencies:
pip install -r requirements.txt
  • Install PyCuda which is a dependency of TensorRT.
  • Install TensorRT (v5.1.5.0): a library for high performance inference on NVIDIA GPUs with Python API.

Usage

0. Prepare the dataset

1. Search

cd search

1.1 Pretrain the supernet

We first pretrain the supernet without updating the architecture parameter for 20 epochs.

  • Set C.pretrain = True in config_search.py.
  • Start the pretrain process:
CUDA_VISIBLE_DEVICES=0 python train_search.py
  • The pretrained weight will be saved in a folder like FasterSeg/search/search-pretrain-256x512_F12.L16_batch3-20200101-012345.

1.2 Search the architecture

We start the architecture searching for 30 epochs.

  • Set the name of your pretrained folder (see above) C.pretrain = "search-pretrain-256x512_F12.L16_batch3-20200101-012345" in config_search.py.
  • Start the search process:
CUDA_VISIBLE_DEVICES=0 python train_search.py
  • The searched architecture will be saved in a folder like FasterSeg/search/search-224x448_F12.L16_batch2-20200102-123456.
  • arch_0 and arch_1 contains architectures for teacher and student networks, respectively.

2. Train from scratch

  • cd FasterSeg/train
  • Copy the folder which contains the searched architecture into FasterSeg/train/ or create a symlink via ln -s ../search/search-224x448_F12.L16_batch2-20200102-123456 ./

2.1 Train the teacher network

  • Set C.mode = "teacher" in config_train.py.
  • Set the name of your searched folder (see above) C.load_path = "search-224x448_F12.L16_batch2-20200102-123456" in config_train.py. This folder contains arch_0.pt and arch_1.pth for teacher and student's architectures.
  • Start the teacher's training process:
CUDA_VISIBLE_DEVICES=0 python train.py
  • The trained teacher will be saved in a folder like train-512x1024_teacher_batch12-20200103-234501

2.2 Train the student network (FasterSeg)

  • Set C.mode = "student" in config_train.py.
  • Set the name of your searched folder (see above) C.load_path = "search-224x448_F12.L16_batch2-20200102-123456" in config_train.py. This folder contains arch_0.pt and arch_1.pth for teacher and student's architectures.
  • Set the name of your teacher's folder (see above) C.teacher_path = "train-512x1024_teacher_batch12-20200103-234501" in config_train.py. This folder contains the weights0.pt which is teacher's pretrained weights.
  • Start the student's training process:
CUDA_VISIBLE_DEVICES=0 python train.py

3. Evaluation

Here we use our pretrained FasterSeg as an example for the evaluation.

cd train
  • Set C.is_eval = True in config_train.py.
  • Set the name of the searched folder as C.load_path = "fasterseg" in config_train.py.
  • Download the pretrained weights of the teacher and student and put them into folder train/fasterseg.
  • Start the evaluation process:
CUDA_VISIBLE_DEVICES=0 python train.py
  • You can switch the evaluation of teacher or student by changing C.mode in config_train.py.

4. Test

We support generating prediction files (masks as images) during training.

  • Set C.is_test = True in config_train.py.
  • During the training process, the prediction files will be periodically saved in a folder like train-512x1024_student_batch12-20200104-012345/test_1_#epoch.
  • Simply zip the prediction folder and submit to the Cityscapes submission page.

5. Latency

5.0 Latency measurement tools

  • If you have successfully installed TensorRT, you will automatically use TensorRT for the following latency tests (see function here).
  • Otherwise you will be switched to use Pytorch for the latency tests (see function here).

5.1 Measure the latency of the FasterSeg

  • Run the script:
CUDA_VISIBLE_DEVICES=0 python run_latency.py

5.2 Generate the latency lookup table:

  • cd FasterSeg/latency
  • Run the script:
CUDA_VISIBLE_DEVICES=0 python latency_lookup_table.py

which will generate an .npy file. Be careful not to overwrite the provided latency_lookup_table.npy in this repo.

  • The .npy contains a python dictionary mapping from an operator to its latency (in ms) under specific conditions (input size, stride, channel number etc.)

Citation

@inproceedings{chen2020fasterseg,
  title={FasterSeg: Searching for Faster Real-time Semantic Segmentation},
  author={Chen, Wuyang and Gong, Xinyu and Liu, Xianming and Zhang, Qian and Li, Yuan and Wang, Zhangyang},
  booktitle={International Conference on Learning Representations},
  year={2020}
}

Acknowledgement

fasterseg's People

Contributors

chenwydj avatar david-chenn avatar dependabot[bot] avatar mohamedalirashad avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fasterseg's Issues

symbol lookup error: undefined symbol:PySlice_Unpack

您好,麻烦您在ReadMe里补充说明python3的具体版本,我查了以下这个问题,PySlice_Unpack在python3.6.1才被引入,解决办法是使用python3.6.2,我之前用的python3.6.0旧遇到这个bug。希望更严谨一些,其他环境配置要求也希望如此完善一下,谢谢。

How to reach mAP 0.69 on validation set

After searching and training on the val set, I can only get 0.69 mAP on training set, on val set it is only 0.64. Do I need to try search and train many times in order to get the claimed result? Or it needs more than 600 epochs?

Head: Paper vs Code

Hello !

In the paper, the heads are fairly clearly defined with the picture.

In your code, I see that and I don't exactly understand what this is for :

image

Training with custom data with different resolution to cityscapes dataset

Hi, i'm interested in training this with a custom dataset I created from scratch.
So far I have tried following the steps in here, but I can't get pass the pretrain step.

The output i'm getting is:

image

The error doesn't tell me too much, just that it is inside model_search.py the momento the logits are calculated which makes me believe that the problem might be with the input, I'm wondering if the resolutions of the images matters at this point, my images are 576x640, but I have noticed that inside the config the crop size always keeps the aspect ratio of the images from cityscapes, could that be the problem? I also notices that inside train_search.py when the model is created the input has the cityscapes resolution hardcoded, should I change it to my resolution?

One extra question, for testing purposes my dataset only has one object, should the number of clases be 1 o the background counts as another class?

Thank you for your work.

Latency issue

Thank you for your great job. I have one question about latency issue.

When I checked your code, I found that you use bilinear interpolation in train/operations.py

but, In the test part you used nearest interpolation in latency/operations.py.

Can you tell me why?
(I'm asking you this question because nearest interpolation is faster than bilinear interpolation in pytorch)

you use as below in train/operations.py
out = F.interpolate(x, size=(int(x.size(2))//2, int(x.size(3))//2), mode='bilinear', align_corners=True)
out = F.interpolate(out, size=(int(x.size(2)), int(x.size(3))), mode='bilinear', align_corners=True)

you use as below in latency/operations.py
out = F.interpolate(x, size=(int(x.size(2))//2, int(x.size(3))//2), mode='nearest')
out = F.interpolate(out, size=(int(x.size(2)), int(x.size(3))), mode='nearest')

Why don't you sample Alphas and Betas using Gumbel-Softmax?

In your model_search.py code you simply use F.softmax(...) on alphas and betas to learn the super-cell and down-sampling connections (line 275):

        alphas0 = F.softmax(getattr(self, self._arch_names[self.arch_idx]["alphas"][0]), dim=-1)
        alphas1 = F.softmax(getattr(self, self._arch_names[self.arch_idx]["alphas"][1]), dim=-1)
        alphas2 = F.softmax(getattr(self, self._arch_names[self.arch_idx]["alphas"][2]), dim=-1)
        alphas = [alphas0, alphas1, alphas2]
        betas1 = F.softmax(getattr(self, self._arch_names[self.arch_idx]["betas"][0]), dim=-1)
        betas2 = F.softmax(getattr(self, self._arch_names[self.arch_idx]["betas"][1]), dim=-1)
        betas = [None, betas1, betas2]`

I was wondering why you didn't use hard Gumbel-Softmax sampling here (like you did with the ratios)?

I think that using plain softmax is actually very dangerous as the optimizer learns to output optimal weighted combinations of feature-maps from all available operations.

Furthermore, I observed in a small Experiment using only 2 Operations (Conv and Skip) that using plaing softmax didn't converge at all but Gumbel-Softmax-Sampling leads to good alphas.

What do you think? Did I miss something?

ValueError: not enough values to unpack (expected 2, got 0)

Hello, Thanks for your job!!When i use your code i meet a question how can i solve it?

05 21:04:37 params = 2.579183MB, FLOPs = 71.419396GB
architect initialized!
using downsampling: 2
Found 1487 images
using downsampling: 2
Found 1488 images
using downsampling: 2
Found 500 images
0%| | 0/20 [00:00<?, ?it/s]05 21:04:38 True
05 21:04:38 search-pretrain-256x512_F12.L16_batch3-20200605-210423
05 21:04:38 lr: 0.02
05 21:04:38 update arch: False
[00:00<?,?it/s]ain...]: 0%| | 0/20 [00:00<?, ?it/s]
[Epoch 1/20][train...]: 0%| | 0/20 [00:00<?, ?it/s]
Traceback (most recent call last):
File "train_search.py", line 303, in
main(pretrain=config.pretrain)
File "train_search.py", line 134, in main
train(pretrain, train_loader_model, train_loader_arch, model, architect, ohem_criterion, optimizer, lr_policy, logger, epoch, update_arch=update_arch)
File "train_search.py", line 223, in train
minibatch = dataloader_model.next()
File "/home/kukby/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 345, in next
data = self._next_data()
File "/home/kukby/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 856, in _next_data
return self._process_data(data)
File "/home/kukby/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 881, in _process_data
data.reraise()
File "/home/kukby/.local/lib/python3.6/site-packages/torch/_utils.py", line 394, in reraise
raise self.exc_type(msg)
ValueError: Caught ValueError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/kukby/.local/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
data = fetcher.fetch(index)
File "/home/kukby/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/kukby/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/kukby/FasterSeg/tools/datasets/BaseDataset.py", line 42, in getitem
img, gt = self._fetch_data(img_path, gt_path)
File "/home/kukby/FasterSeg/tools/datasets/BaseDataset.py", line 67, in _fetch_data
gt = self._open_image(gt_path, cv2.IMREAD_GRAYSCALE, dtype=dtype, down_sampling=self._down_sampling)
File "/home/kukby/FasterSeg/tools/datasets/BaseDataset.py", line 130, in _open_image
H, W = img.shape[:2]
ValueError: not enough values to unpack (expected 2, got 0)

Parse Issue of TensorRT

When I use the computation code of the tensorrt in your repo, there is a problem.

When the TensorRT ONNXParser parses the onnx model, the returend value is false. And the TensorRT gives an error: Network must have at leaset one output.

Do you have meet the similar problem? And how to solve it?

How to train FasterSeg with customized labels?

Hello,
it's very interesting to train and use FasterSeg with own custom data.
To get information about that, I read the comments of this issue description.

Based on the description linked above, I did the following steps:

  1. prepare my own labels in labels.py
  2. create my own annotated pictures with createTrainIdLabelImgs.py
    2.1 make sure that hight and width are divisible by 32
  3. prepare file name lists as "dataset files" for data loading see examples here
  4. adjust the config file with number of classes

Are there any other points in the FasterSeg repository which I need to adjust for using customized labels?

For example:
In cityscapes.py, camvid.py and bdd.py are some methodes like get_class_colors() and get_class_names() which return color or class names of the cityscapes data.

Is it neccessary to add the customized labels to this methods?
For which purpose are this methods?

It would be great if you can give me some hints to answer this questions, so I can run the training process with customized labels.

Best regards

NaN values in loss function during Step 2.2

Hey @chenwydj ,

during the step 2.2 with custom labels, the value of the loss function leads to "NaN" in the last few steps (see image below).

image

Everything appers to be ok till epoch 341.
image

From epoch 351 on, something is wrong with the mIoU. All labels except "unlabeld" reach a mIoU of 0%.
image

It seems to be a numerical stability problem.
Do you agree with that?
Do you have an idea to solve this problem?

It would be great if you can give me some hints to answer this questions.

Kind regards.

How to train for a new dataset ?

So i have been playing a little bit with the config_search.py to adapt my new dataset to the pipeline but i kept getting this error

RuntimeError: merge_sort: failed on 2nd step: device-side assert triggered

After some research i knew it's a problem with the configuration i did to the dataset but the problem is that i don't know what every parameter do, so can anyone elaborate on this ?

Typo error in readme file

This repository has been tested on GTX 1808Ti. Configurations (e.g batch size, image patch size) may need to be changed on different platforms.

should be GTX 1080Ti?

dataset details

Why did you search directly in the train set and did not sample the val set from the cityscapes train set?
Will the searched arch not overfit in the cityscapes val set?

TranA and TrainB is overlapped according to your implementation

Hi,
According to the statement of ur paper, the training dataset are split into two parts,one is to update model weights and the other to update arch params ,which are thought to be disjointed. but they are not in your implemetaton, you can verify this via calling the _get_file_names method of Dataset , I think this may lead some of the training data never be used during Arch searching, Am I right?
Correct me if I were wrong.

Runtime error: Cuda error:out of memory

Hi
Thanks for sharing your nice work. i have found an issue while using your code for training on my Nvidia GTX 1080 ti card.
Can you please help me about this issue?

The issue is

when i run train_search.py it generate the Runtime Error. The error pic is attached
Screenshot from 2020-01-16 14-22-52

How do you handle fluctuations in latency measurement?

I noticed that the measured latencies differ substantially from run to run.
For example I get following results at different program runs:

  1. BasicResidual_downup_2x_H24_W24_Cin80_Cout80_stride1: 0.0549 ms
    BasicResidual_downup_2x_H12_W12_Cin80_Cout80_stride1: 0.0511 ms
  2. BasicResidual_downup_2x_H24_W24_Cin80_Cout80_stride1: 0.0517 ms
    BasicResidual_downup_2x_H12_W12_Cin80_Cout80_stride1: 0.0502 ms
  3. BasicResidual_downup_2x_H24_W24Cin80_Cout80_stride1: 0.0505 ms
    BasicResidual_downup_2x
    H12_W12_Cin80_Cout80_stride1: 0.0513 ms

Note that in the last run the operation on quarter resolution (12x12) took longer than the operation on the base resolution (24x24).

This will substantially influence the network-architecture search as the optimizer will prefer pathes on higher resolutions.

How did you make sure that your measured latencies "make sense"? Did you set static GPU clocks or similar to get stable results?

RuntimeError: merge_sort: failed to synchronize: device-side assert triggered

I am trying to train FasterSeg for a custom dataset with six classes. I have formatted the annotations and written datasets class just like cityscapes.py. I am having issue while Pretraining the supernet (Section 1.1 in readMe.md)

CUDA Version: 10.2
torchvision : 0.3.0
torch : 1.1.0

Traceback (most recent call last):
File "train_search.py", line 304, in
main(pretrain=config.pretrain)
File "train_search.py", line 133, in main
train(pretrain, train_loader_model, train_loader_arch, model, architect, ohem_criterion, optimizer, lr_policy, logger, epoch, update_arch=update_arch)
File "train_search.py", line 243, in train
loss = model._loss(imgs, target, pretrain)
File "/home/soccer/Desktop/Muaz/FasterSeg/search/model_search.py", line 491, in _loss
loss = loss + sum(self._criterion(logit, target) for logit in logits)
File "/home/soccer/Desktop/Muaz/FasterSeg/search/model_search.py", line 491, in
loss = loss + sum(self._criterion(logit, target) for logit in logits)
File "/home/soccer/anaconda3/envs/pipeline_cloned/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/soccer/Desktop/Muaz/FasterSeg/tools/seg_opr/loss_opr.py", line 81, in forward
index = mask_prob.argsort()
RuntimeError: merge_sort: failed to synchronize: device-side assert triggered

Before this error i get a number of CUDA errors but that doesn't crash the code

  • /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [2,0,0], thread: [127,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.

  • /pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:53: lambda ->auto::operator()(int)->auto: block: [2,0,0], thread: [61,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.

NOTE: I recreated the experiment on citspscapes dataset and i am still encountering the same issue.

如何用自己的数据来训练这个模型?

您好!这个模型的高性能高效率非常棒,我打算在我的研究项目中用它来做道路分割,但我在这方面没有研究过,暂时只想将算法应用起来,请问后期可以出一个“如何从自己的原始图片数据集来训练该网络”的教程吗?给只想应用的后来者们一个清晰的指导,谢谢啦

inference speed

Can you provide some compare about TensorRT and Pytorch at inference speed? thanks.

Where does weight sharing takes place in model_search.py?

In your paper you write:

It is worth noting that, the multi-resolution branches will share both cell weights and feature maps
if their cells are of the same operator type, spatial resolution, and expansion ratio. This design
contributes to a faster network. Once cells in branches diverge, the sharing between the branches
will be stopped and they become individual branches (See Figure 6).

This gives the impression that cells actually can choose their downsampling-rate (s = [8, 16, 32] ) freely. But this is not reflected in your code, where you initialize each cell at a fixed downsampling-rate:
https://github.com/TAMU-VITA/FasterSeg/blob/bb52a004ff83f64d3dd8989104234fdb862d1cc5/search/model_search.py#L153

Furthermore, when cells are placed at fixed downsampling-rates I don't get how weight sharing according to your paper (above quote) between branches can ever take place as the spatial resolutions always differ between different branches.

tl;dr:
Can you elaborate on how you can fuse cells from different branches even when they operate always (according to your code) on different spatial resolutions?

CamVid dataset

Hi,

In the paper you mentioned about using CamVid as another dataset for experiment. Would you mind to share the code of CamVid as well ?

There are deprecations in torch 1.1.0

Traceback (most recent call last):
File "E:/vSLAM/FasterSeg-master/train/train.py", line 28, in
from eval import SegEvaluator
File "E:\vSLAM\FasterSeg-master\train\eval.py", line 9, in
from engine.evaluator import Evaluator
File "E:\vSLAM\FasterSeg-master\tools\engine\evaluator.py", line 11, in
from utils.pyt_utils import load_model, link_file, ensure_dir
File "E:\vSLAM\FasterSeg-master\tools\utils\pyt_utils.py", line 23, in
def reduce_tensor(tensor, dst=0, op=dist.ReduceOp.SUM, world_size=1):
AttributeError: module 'torch.distributed' has no attribute 'ReduceOp'

pytorh=1.1.0 windows10

How to use Tensorboard within the FasterSeg project?

Hello, FasterSeg is very interesting as part of a research project at our university. We would like to train it with our own data set.

How can I monitor the whole process with Tensorboard? In which directory do I have to be and which command do I have to execute?

We have forked your repository for this and are in the process of providing a guide on how to train FasterSeg on custom data.

We already provide a docker file for the training environment. Other required scripts are currently being implemented.

See https://github.com/Gaussianer/FasterSeg

Best regards

CUDA_VISIBLE_DEVICES=0 python train_search.py when we run this command getting error?

Your work seem to be good enough to see but i have some doubts When i tried running the above command it showing error
Traceback (most recent call last):
File "train_search.py", line 23, in
from datasets.cityscapes.cityscapes import Cityscapes
File "/media/drive/2f08d5f1-1439-4edf-9bb4-1d19896475a5/Naveen/FasterSeg/tools/datasets/init.py", line 1, in
from .cityscapes import Cityscapes
File "/media/drive/2f08d5f1-1439-4edf-9bb4-1d19896475a5/Naveen/FasterSeg/tools/datasets/cityscapes/init.py", line 1, in
from .cityscapes import Cityscapes
File "/media/drive/2f08d5f1-1439-4edf-9bb4-1d19896475a5/Naveen/FasterSeg/tools/datasets/cityscapes/cityscapes.py", line 6, in
class Cityscapes(BaseDataset):
TypeError: module.init() takes at most 2 arguments (3 given)
May i know what is the reason for this error @chenwydj @TianlongChenTAMU

Error about TensorRT and Pycuda

When I run CUDA_VISIBLE_DEVICES=1 python train_search.py, I get this
/export/data/lwangcg/FasterSeg/tools/utils/darts_utils.py:179: UserWarning: TensorRT (or pycuda) is not installed. compute_latency_ms_tensorrt() cannot be used.
warnings.warn("TensorRT (or pycuda) is not installed. compute_latency_ms_tensorrt() cannot be used.")

I have tried to install Pycuda and TensorRT, but I might did not do it successfully. I get a warning in Pycuda test and fail in TensorRT test. Can you kindly help me with this? Note, I have no root privilege to the system(Centos 7)

FasterSeg:
image

TensorRT:
image

Pycuda:
image

flop calcuate

--- Logging error ---
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/logging/__init__.py", line 994, in emit
    msg = self.format(record)
  File "/usr/local/lib/python3.6/logging/__init__.py", line 840, in format
    return fmt.format(record)
  File "/usr/local/lib/python3.6/logging/__init__.py", line 577, in format
    record.message = record.getMessage()
  File "/usr/local/lib/python3.6/logging/__init__.py", line 338, in getMessage
    msg = msg % self.args
TypeError: not all arguments converted during string formatting
Call stack:
  File "train_search.py", line 306, in <module>
    main(pretrain=config.pretrain)
  File "train_search.py", line 68, in main
    flops, params = profile(model, inputs=(torch.randn(1, 3, 1024, 2048)))
  File "/usr/local/lib/python3.6/site-packages/thop/profile.py", line 97, in profile
    model.apply(add_hooks)
  File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 247, in apply
    module.apply(fn)
  File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 247, in apply
    module.apply(fn)
  File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 247, in apply
    module.apply(fn)
  [Previous line repeated 4 more times]
  File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 248, in apply
    fn(self)
  File "/usr/local/lib/python3.6/site-packages/thop/profile.py", line 87, in add_hooks
    logger.info("THOP has not implemented counting method for ", m)
Message: 'THOP has not implemented counting method for '
Arguments: (USConv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False),)
30 22:57:41 Register FLOP counter for module BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module BatchNorm2d(320, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module BatchNorm2d(384, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module Conv2d(384, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(384, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(192, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(192, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(128, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(192, 96, kernel_size=(1, 1), stride=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(192, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(128, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(128, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(96, 19, kernel_size=(1, 1), stride=(1, 1))
30 22:57:41 Register FLOP counter for module Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(64, 19, kernel_size=(1, 1), stride=(1, 1))
30 22:57:41 Register FLOP counter for module Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(96, 19, kernel_size=(1, 1), stride=(1, 1))
30 22:57:41 Register FLOP counter for module Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(64, 19, kernel_size=(1, 1), stride=(1, 1))
30 22:57:41 Register FLOP counter for module Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(96, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(96, 19, kernel_size=(1, 1), stride=(1, 1))
30 22:57:41 Register FLOP counter for module Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(64, 19, kernel_size=(1, 1), stride=(1, 1))
30 22:57:41 Register FLOP counter for module Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(192, 19, kernel_size=(1, 1), stride=(1, 1))
30 22:57:41 Register FLOP counter for module Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(128, 19, kernel_size=(1, 1), stride=(1, 1))
30 22:57:41 Register FLOP counter for module Conv2d(192, 192, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(192, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(192, 19, kernel_size=(1, 1), stride=(1, 1))
30 22:57:41 Register FLOP counter for module Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
30 22:57:41 Register FLOP counter for module BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
30 22:57:41 Register FLOP counter for module ReLU(inplace)
30 22:57:41 Register FLOP counter for module Conv2d(128, 19, kernel_size=(1, 1), stride=(1, 1))
Traceback (most recent call last):
  File "train_search.py", line 306, in <module>
    main(pretrain=config.pretrain) 
  File "train_search.py", line 68, in main
    flops, params = profile(model, inputs=(torch.randn(1, 3, 1024, 2048)))
  File "/usr/local/lib/python3.6/site-packages/thop/profile.py", line 100, in profile
    model(*inputs)
  File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/cephfs_new_wj/cv/xiaxin/NAS/nas_seg/FasterSeg/search/model_search.py", line 287, in forward
    out_prev = [[stem(input), None]] # stem: one cell
  File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/mnt/cephfs_new_wj/cv/xiaxin/NAS/nas_seg/FasterSeg/search/operations.py", line 125, in forward
    assert x.size()[1] == self.C_in, "{} {}".format(x.size()[1], self.C_in)
AssertionError: 1024 3

thanks for your excellent work!

when I run CUDA_VISIBLE_DEVICES=0 python train_search.py, The code running here produces the above error

Test Issue

Hello. I'm try to do training,validate and test. The first two process work well but when I modify the C.is_eval = True.
The error will be :
Traceback (most recent call last):
File "train.py", line 295, in
main()
File "train.py", line 82, in main
train_loader = get_train_loader(config, Cityscapes, test=config.is_test)
File "/opt/data/private/FasterSeg-master/train/dataloader.py", line 52, in get_train_loader
train_dataset = dataset(data_setting, "train", train_preprocess, config.batch_size * config.niters_per_epoch)
File "../tools/datasets/BaseDataset.py", line 23, in init
self._file_names = self._get_file_names(split_name)
File "../tools/datasets/BaseDataset.py", line 80, in _get_file_names
with open(source) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/opt/data/private/FasterSeg-master/tools/datasets/cityscapes/cityscapes_train_val_fine.txt'

I open the dir and find there is no such file, does it mean that I should merge the trainset and valset for training?
Could you help me? Thank you!

multi gpu training

Thanks for your excellent work !

Training is very slow. Does it support distributed training?

Looking for weight file for 73.1% cs val accuracy with Student model

One of the best works for semantic seg with NAS. Thank You for sharing train as well as search code.

I downloaded student's weights from the location mentioned in the readme https://drive.google.com/file/d/1O56HnA0ug2M3K4SR3_AUzIs0wegy9BX6/view?usp=sharing and ran train.py to check the validation accuracy but I got 70.5%.
Is this model supposed to generate 70.5% or 73.1% val accuracy?
If there is some other weights for getting 73.1% can you pls share it?

Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.