Git Product home page Git Product logo

sfsegnets's Introduction

SFSegNets(ECCV-2020-oral) and SFNet-Lite (Extension, IJCV-2023)

Reproduced Implementation of Our ECCV-2020 oral paper: Semantic Flow for Fast and Accurate Scene Parsing.

News! SFNet-Lite is accepted by IJCV!! A good end to my last work in the PhD study!!!

Extension: SFNet-Lite achieve 78.8 mIoU while running 120 FPS, 80.1 mIoU while running at 50 FPS on TITAN-RTX.

avatar

Extension: SFNet-Lite achieve new state-of-the-art results (best speed and accuracy trade-off) on domain agnostic driving segmentation benchmark (Unified Driving Segmentation). avatar

SFNet is the first real time nework which achieves the 80 mIoU on Cityscape test set!!!! It also contains our another concurrent work: SRNet-IEEE-TIP:link.

avatar Our methods achieve the best speed and accuracy trade-off on multiple scene parsing datasets.

avatar Note that the original paper link is on TorchCV where you can train SFnet models. However, that repo is over-complex for further research and exploration.

Question and Dissussion

If you have any question and or dissussion on fast segmentation, just open an issue. I will reply asap if I have the spare time.

DataSet Setting

Please see the DATASETs.md for the details.

Requirements

pytorch >=1.4.0 apex opencv-python mmcv-cpu

Pretrained models and Trained CKPTs

Please download the pretrained models and put them into the pretrained_models dir on the root of this repo.

pretrained imagenet models

resnet101-deep-stem-pytorch:link

resnet50-deep-stem-pytorch:link

resnet18-deep-stem-pytorch:link

dfnetv1:link

dfnetv2:link

stdcv1/stdc2:link

trained ckpts:

SFNet ckpts:

Cityscape:

sf-resnet18-Mapillary:link

Please download the trained model, the mIoU is on Cityscape validation dataset.

resnet18(no-balanced-sample): 78.4 mIoU

resnet18: 79.0 mIoU link +dsn link

resnet18 + map: 79.9 mIoU link

resnet50: 80.4 mIoU link

dfnetv1: 72.2 mIoU link

dfnetv2: 75.8 mIoU link

SFNet-Lite ckpts:

Cityscape:

sfnet_lite_r18: link

sfnet_lite_r18_coarse_boost: link

sfnet_lite_stdcv2: link

sfnet_lite_stdcv1: link

Unified Driving Segmentation dataset ckpts:

sfnet_lite_r18: link

sfnet_lite_stdcv1: link

sfnet_lite_stdcv2: link

IDD dataset

sfnet_lite_r18: link

sfnet_lite_stdcv1: link

sfnet_lite_stdcv2: link

BDD dataset

sfnet_lite_r18: link

sfnet_r18: link

Mapillary dataset

to be release.

Demo

Visualization Results

python demo_folder.py --arch choosed_architecture --snapshot ckpt_path --demo_floder images_folder --save_dir save_dir_to_disk

Training

All the models are trained with 8 GPUs. The train settings require 8 GPU with at least 11GB memory. Please download the pretrained models before training.

Train ResNet18 model on Cityscapes

SFNet r18

sh ./scripts/cityscapes/train_cityscapes_sfnet_res18.sh

SFNet-Lite r18

sh ./scripts/cityscapes/train_cityscapes_sfnet_res18_v2_lite_1000e.sh

Train ResNet101 models

sh ./scripts/cityscapes/train_cityscapes_sfnet_res101.sh

Submission for test

sh ./scripts/submit_test_cityscapes/submit_cityscapes_sfnet_res101.sh

Train the Domain Agnostic SFNet for UDS dataset.

Please use the DATASETs.md to prepare the UDS dataset.

sh ./scripts/uds/train_merged_sfnet_res18_v2.sh

Citation

If you find this repo is useful for your research, Please consider citing our paper:

@article{Li2022SFNetFA,
  title={SFNet: Faster and Accurate Domain Agnostic Semantic Segmentation via Semantic Flow},
  author={Xiangtai Li and Jiangning Zhang and Yibo Yang and Guangliang Cheng and Kuiyuan Yang and Yu Tong and Dacheng Tao},
  journal={IJCV},
  year={2023},
}
@inproceedings{sfnet,
  title={Semantic Flow for Fast and Accurate Scene Parsing},
  author={Li, Xiangtai and You, Ansheng and Zhu, Zhen and Zhao, Houlong and Yang, Maoke and Yang, Kuiyuan and Tong, Yunhai},
  booktitle={ECCV},
  year={2020}
}

@article{Li2020SRNet,
  title={Towards Efficient Scene Understanding via Squeeze Reasoning},
  author={Xiangtai Li and Xia Li and Ansheng You and Li Zhang and Guang-Liang Cheng and Kuiyuan Yang and Y. Tong and Zhouchen Lin},
  journal={IEEE-TIP},
  year={2021},
}

Acknowledgement

This repo is based on Semantic Segmentation from NVIDIA and DecoupleSegNets

Great Thanks to SenseTime Research for Reproducing All these model ckpts and pretrained model.

License

MIT

sfsegnets's People

Contributors

lxtgh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

sfsegnets's Issues

export sfnet_r18_78.pth to onnx

I tried to export sfnet_r18_78.pth to onnx, but encountered an error of linspace is not supported by onnx.
Have you an experience to convert your model to onnx?

Thanks,

License?

Hi,
Can you tell me what license applies to this repo?

The TensorRT implementation

Can you provide the TensorRT implementation code, or some useful help to implement the TensorRT deployment? Thanks

about align_corners in grid_sample

First thank you for making this promising work public !
I came across some problems while running the demo.

UserWarning: Default grid_sample and affine_grid behavior has changed to align_corners=False since 1.3.0. Please specify align_corners=True if the old behavior is desired. See the documentation of grid_sample for details.

It seems like the version you recommend is pytorch>=1.2, which is not enough for me to deduce what value should be set here.

关于FAM代码和torchcv里面有些许不同,??

norm = torch.tensor([[[[out_w, out_h]]]]).type_as(input).to(input.device)
h = torch.linspace(-1.0, 1.0, out_h).view(-1, 1).repeat(1, out_w)
w = torch.linspace(-1.0, 1.0, out_w).repeat(out_h, 1)
grid = torch.cat((w.unsqueeze(2), h.unsqueeze(2)), 2)
grid = grid.repeat(n, 1, 1, 1).type_as(input).to(input.device)
grid = grid + flow.permute(0, 2, 3, 1) / norm

在torchcv中,能不能说明一下 啊
norm = torch.tensor([[[[out_w, out_h]]]]).type_as(input).to(input.device)
w = torch.linspace(-1.0, 1.0, out_h).view(-1, 1).repeat(1, out_w)
h = torch.linspace(-1.0, 1.0, out_w).repeat(out_h, 1)
grid = torch.cat((h.unsqueeze(2), w.unsqueeze(2)), 2)
grid = grid.repeat(n, 1, 1, 1).type_as(input).to(input.device)
grid = grid + flow.permute(0, 2, 3, 1) / norm

problem of training train_cityscapes_sfnet_dfv1.sh

Hi, when I trained trining train_cityscapes_sfnet_dfv1.sh, an error as fllows which is from b,c,h,w = pred.size() in loss.py:
'tuple object has no attribute size‘
I fond that pred contains 2 predicted tensors whose shape is 2x19x64x64 . Then, I use pred[0] as pred but other error come out as fllows:

File "/media/ssd2/yr/CV/segmentation/SFSegNets/loss.py", line 272, in forward
prob = prob.masked_fill_(~valid_mask, 1)
RuntimeError: The expanded size of the tensor (8192) must match the existing size (2097152) at non-singleton dimension 1. Ta rget sizes: [19, 8192]. Tensor sizes: [2097152]

What shold I do to train train_cityscapes_sfnet_dfv1.sh correctly?

关于测试问题

您好!
下载了您的训练模型,在使用eval,默认参数在1080下测试时,速度比较慢,DFnetv1只有9秒/帧,是我哪里设置不对吗?
该如何处理可以达到您论文中描述的速度。谢谢!

The implementation seems a little different from the paper

Hi, first of all, thanks for sharing your work for open access.
When reviewing the code, I found that the decoding part in the FPN. The paper said that it uses multiple FAM to process feature maps of outputs from each decoding layer, and concatenate results together into 4x stride resolution, then predict their final result.
While in the code, the FAM module only used in the step-by-step upsample in the FPN architecture, without using it to get the final concatenated logits.
I wanna know that does it matter for performance?
For now, I think the result is good, it's a good trade-off between speed and accuracy. Besides, adding more FAM may lower the speed, so if anyone tried, please share your experience.

Can you provide the test code?

I can get the eval result, but when I use the trained model to get the test result , I can't the result of papers.So can you provide the test code?

How did u train sfnet_resnet_dsn??

Please have a close look at your code:

def forward(self, x, gts=None):
        x_size = x.size()  # 800
        x0 = self.layer0(x)  # 400
        x1 = self.layer1(x0)  # 400
        x2 = self.layer2(x1)  # 100
        x3 = self.layer3(x2)  # 100
        x4 = self.layer4(x3)  # 100
        x = self.head([x1, x2, x3, x4])
        # main_out = Upsample(x[0], x_size[2:])
        main_out = F.upsample(x[0], size=x_size[2:], mode='bilinear', align_corners=True)
        # main_out = Upsample(x[0], x_size[2:])
        print('main out: {}'.format(main_out))
        print(main_out.shape)
        if self.training:
            if not self.fpn_dsn:
                return self.criterion(main_out, gts)
            return self.criterion(x, gts)
        return main_out

in sfnet_resnet line 178, if not self.fpn_dsn which I set dsn to be True, then:

You will using return self.criterion(x, gts)
But, x is a tuple.....
How do u able to train in this logic???

CrossEntropy2D can not performance .size or .dim on tuple, do u know?

I am totally don't know how do u able to train it.

Unable to train

I got Nan loss:

04-08 10:51:43.489 NaN or Inf found in input tensor.
04-08 10:51:43.559 [epoch 28], [iter 8136 / 13576], [train main loss nan], [lr 0.001678]
04-08 10:51:43.560 NaN or Inf found in input tensor.
04-08 10:51:43.632 [epoch 28], [iter 8137 / 13576], [train main loss nan], [lr 0.001678]
04-08 10:51:43.632 NaN or Inf found in input tensor.
04-08 10:51:43.965 [epoch 28], [iter 8138 / 13576], [train main loss nan], [lr 0.001678]
04-08 10:51:43.965 NaN or Inf found in input tensor.
04-08 10:51:44.036 [epoch 28], [iter 8139 / 13576], [train main loss nan], [lr 0.001678]
04-08 10:51:44.037 NaN or Inf found in input tensor.

Demo still invalid, Please provide pretrained weigth

image

Please provide pretrained weights with this model: network.sfnet_resnet.DeepR18_SF_deeply_dsn

I tried both network.sfnet_resnet.DeepR18_SF_deeply_dsn and network.sfnet_resnet.DeepR18_SF_deeply, none of them work.

also question about fps

@lxtGH
Hi, thank you for your work,
but when i run the demo_folder.py with sfnet_r18_78.pth to test the cityscapes image(1024x2048), i got the message that: ' inference takes 75.56 seconds, which takes 0.42 seconds per image, including saving results', i have one GTX 1080Ti. the 0.42 seconds per image is much longer than the 26 fps of the SFNet(ResNet-18) in your paper.
Could you please tell me what is the problem?
thank you.

visualize result is weired

image

def demo():
    df = args.data

    model = get_net()
    img_transform = transforms.Compose(
        [transforms.ToTensor(), transforms.Normalize(*mean_std)])

    if os.path.isfile(df):
        pass
    elif os.path.isdir(df):
        all_img = glob.glob(os.path.join(df, '*.png'))
        print('all images: ', len(all_img))
        for im in all_img:
            im = cv2.imread(im)
            im_rgb = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
            ori_h, ori_w, _ = im.shape
            image = img_transform(im_rgb)
            with torch.no_grad():
                inp = image.unsqueeze(0).cuda()
                scale_out = model(inp)
                scale_out = F.upsample(scale_out, size=(
                    ori_h, ori_w), mode="bilinear", align_corners=True)
                out = torch.argmax(scale_out, dim=1)
                out = out.cpu().numpy()[0].astype(np.uint8)
                print(out.shape)
                res = label2color_mask(out)
                cv2.imshow('a', out)
                cv2.imshow('aa', im)
                cv2.imshow('aaa', res)
                cv2.waitKey(0)

Question about the result of sf-dfnet

The miou of the dfv2 trained model you released is 75.8% on validation set while the miou is 77.8% on the test set reported in your paper,Would you mind to explain why the results are so different。

question about fps

The fps testing file in utils/misc.py uses tensor size of 896 to do speed test. However, in paper, (1024, 2048) is used to inference. Could you please explain for this mistake?

关于torchcv的sfnet实现

您好,我看了一下torchcv的sfnet实现,感觉和论文上的网络结构图有些出入。
image
文章中最后融合decoder各个层的feature maps的时候是应用了FAM,我看sfnet的代码实现中,fusion_list的实现是decoder各层feature maps先通过conv3x3_bn_relu,然后用二次线性插值加上torch.cat实现融合的,我感觉和论文的描述有点不同,也可能是我代码没理解透,我不知道您怎么看这个问题。

Question about the flow_make and flow_warp functions.

Hi, thank you for your nice work.
But I have some questions about the flow_make and flow_warp functions.
In your code, you use a Conv2d self.flow_make = nn.Conv2d(outplane*2, 2, kernel_size=kernel_size, padding=1, bias=False) to make flow. But the Conv2d operator cannot guarantee that the range of output Flow is reasonable. So even you use a norm operation, you can not get a reasonable grid in flow_wap.
Is this a problem?
Looking forward for your reply. Thanks.

mIoU is lower than 79%

Hello.
Thanks for publishing a great code.
I trained SFNet with default setting of scripts/train_cityscapes_sfnet_res18.sh.But i got only 73 mIoU.
Diff:

  • Res18 pretrain model.(Model channels(64) does not match the model,so I don't load the pretrain model)
  • CityScapes Dir name.(Such as img_dir_name = 'leftImg8bit_trainvaltest')
  • Pytorch version. (Some default settings)

直接打印模型结构和提供的训练好的参数对应不太一致?

使用resnet-18

net = DeepR18_SF_deeply(19, None)
print(net)

可以看到在网络结构里面layer1中使用了downsample
但是我下载sf_r18_799_map_trained.pth输出参数名称发现,layer1 的参数中并不存在 downsample这一步运算,为什么这个结构在训练好的参数中并不存在。

Dilated ResNet in SFNet?

Hi, I find that the SFNet w/ ResNet-18 has adopt dilated convolutions in C4 and C5:

return AlignNetResNet(num_classes, trunk='resnet-18-deep', criterion=criterion, variant='D', skip='m1', fpn_dsn=True)

However, it hasn't been mentioned in the paper or the other implementation torchcv.
I'm confused about it.

FAM模块在反向传播的时候权重会更新吗

你好同学,首先感谢你开源了这个仓库。
阅读你的代码发现,这里是通过FAM模块计算出两个特征图的流场,再以流场为grid,通过双线性插值将高层语义特征图上采样,请问我的理解是正确的吗?
如果是的话,那么在反向传播的过程中,会计算对于流场的导数吗?或者进一步说,在反向传播的过程中会更新FAM模块的参数吗?按照我浅陋的理解,感觉反向传播的时候对FAM模块的参数不会更新呀。
如果我理解有误,还望指点一二,谢谢同学!

which parameter controls the sampling?

I use train_cityscapes_sfnet_res18.sh to get 78.5. Is this corresponse "resnet18(no-balanced-sample): 78.4 mIoU"?

and how can I get "resnet18: 79.0 mIoU"?

Singel GPU with 8 memory

Hi, your code trained with multi GPUs, but I only have a single GPU with 8 memory.

So what should I set to run this code with my single GPU?

I saw torch.distributed.init_process_group(), torch.distributed.all_reduce(). I think this may interrupting my training with single GPU.

DFv1作为backbone训练

我在使用DFv1作为backbone进行训练时,报错显示如下:
https://github.com/ahhLS/picture/blob/main/1.png
断点调试到错误出现的地方,显示如下:
https://github.com/ahhLS/picture/blob/main/2.png
这里的x的是一个tuple,我理解的是x[0]是输入的一个tensor,但是x[0]的shape为[2,9,64,64],与输入的gts尺寸不匹配。这里的main_out是对特征图进行了上采样,才使得x[0]与gts尺寸一致,从而顺利进行forward。请问这块代码能具体解释一下吗?我如何修改才能进行正确的forward呢?

Error: CUDA out of memory

@lxtGH
thank you for your work.
when i train sfnet_res101 on my own data,i got the error: CUDA outof memory, my own training image is 2710x3384, i have two GTX1080Ti which have about 11G,
Could you please tell me how to solve it.
thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.