vitae-transformer / rsp Goto Github PK

The official repo for [TGRS'22] "An Empirical Study of Remote Sensing Pretraining"

License: MIT License

Python 60.47% Jupyter Notebook 36.24% Dockerfile 0.04% Makefile 0.02% Batchfile 0.03% C++ 1.65% Cuda 1.31% C 0.01% Shell 0.24% CSS 0.01%

change-detection classification deep-learning foundation-models imagenet object-detection pre-training remote-sensing semantic-segmentation transfer-learning

rsp's Introduction

ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond

Updates | Introduction | Statement |

Current applications

Image Classification: Please see ViTAE-Transformer for image classification;

Object Detection: Please see ViTAE-Transformer for object detection;

Sementic Segmentation: Please see ViTAE-Transformer for semantic segmentation;

Animal Pose Estimation: Please see ViTAE-Transformer for animal pose estimation;

Matting: Please see ViTAE-Transformer for matting;

Remote Sensing: Please see ViTAE-Transformer for Remote Sensing;

Updates

09/04/2021

The pretrained models for ViTAE on matting and remote sensing are released! Please try and have fun!

24/03/2021

The pretrained models for both ViTAE and ViTAEv2 are released. The code for downstream tasks are also provided for reference.

07/12/2021

The code is released!

19/10/2021

The paper is accepted by Neurips'2021! The code will be released soon!

06/08/2021

The paper is post on arxiv! The code will be made public available once cleaned up.

Introduction

This repository contains the code, models, test results for the paper ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias. It contains several reduction cells and normal cells to introduce scale-invariance and locality into vision transformers. In ViTAEv2, we explore the usage of window attentions without shift operations to obtain a better balance between memory footprint, speed, and performance. We also stack the proposed RC and NC in a multi-stage manner to faciliate the learning on other vision tasks including detection, segmentation, and pose.

Fig.1 - The details of RC and NC design in ViTAE.

Fig.2 - The multi-stage design of ViTAEv2.

Statement

This project is for research purpose only. For any other questions please contact yufei.xu at outlook.com qmzhangzz at hotmail.com .

Citing ViTAE and ViTAEv2

@article{xu2021vitae,
  title={Vitae: Vision transformer advanced by exploring intrinsic inductive bias},
  author={Xu, Yufei and Zhang, Qiming and Zhang, Jing and Tao, Dacheng},
  journal={Advances in Neural Information Processing Systems},
  volume={34},
  year={2021}
}
@article{zhang2022vitaev2,
  title={ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond},
  author={Zhang, Qiming and Xu, Yufei and Zhang, Jing and Tao, Dacheng},
  journal={arXiv preprint arXiv:2202.10108},
  year={2022}
}

rsp's People

Contributors

Stargazers

Watchers

Forkers

togetheryh fanrongbo yummywaffle nezihkasim kaoyuky winter-jon

rsp's Issues

what is the version of mmcv and mmseg

请问uploader这个mmcv和mmseg的版本是什么呢，貌似找不到这些个函数

is there still have pretrained weights saved for the SeCo dataset

I don't know if you have saved it, I really need it. anyway, thank you.

When model's pretrained is none, get AttributeError: 'ViTAE_Window_NoShift_basic' object has no attribute 'use_abs_pos_embed'

when model's pretrained is none, as follows:
model = dict(
type='EncoderDecoder',
data_preprocessor=data_preprocessor,
pretrained=None,)

get AttributeError: 'ViTAE_Window_NoShift_basic' object has no attribute 'use_abs_pos_embed' in https://github.com/ViTAE-Transformer/RSP/blob/main/Semantic%20Segmentation/mmseg/models/backbones/ViTAE_Window_NoShift/base_model.py#L185

使用levir数据集报错问题

你好作者，我按照你的readme文件将levir数据集切换成了256*256的格式但是报错，如上图，这个可以帮忙看一下吗，谢谢

关于semantic segmentation 测试结果可能会有差异

您好，我使用upernet-rsp-resnet-50-potsdam-latest.pth权重在potsdam数据集上进行测试的时候，发现得到的结果与log中的结果有所出入；
我得到的结果如下：

aAcc 86.43 vs 90.61 , mIou 65.69 vs 81.96,等等都有所差异；

你好，RSP-ResNet-50-E300给出的模型是训练好的权重吗？

我用于其他的航空数据集目标检测，性能与imagenet给出的预训练模型的性能相差很多，想问一下这个百度网盘给出的权重是已经训练好的吗？

关于MillionAID数据集

作者您好，我想请问一下我看b站中有提到，这个数据集是包含标签文件的，我在数据集官网下载后，并没有找到相关的标签文件，只有图片数据，您方便给我提供一下标签文件的下载地址吗，非常感谢！

是否可以提供MAE的代码或相关参考

您好，看到您论文中有提及使用MAE来训练得到预训练权重，不知是否可以提供相关代码呢，或者可以提供相关参考吗，谢谢～

您好，请问你们使用的ISAID数据集是原始数据集吗？

给出的RSP-ViTAEv2-S-E100的log中我发现验证集迭代了2884次，但是我下载的数据集中验证集图片仅有458张

upernet_vit_base_win.py model config is in the Remote-Sensing-RVSA repo

I had trouble finding it. Might help others:

https://github.com/ViTAE-Transformer/Remote-Sensing-RVSA/blob/36c4c28c40f167c1bff8183a1c5642513765c4e2/Semantic%20Segmentation/configs/_base_/models/upernet_vit_base_win.py

Classification train/test set split.

Where can I find the train/test split for AID, UCM and NWPU-RESISC datasets? Could you provide the 'train_labels_{}_{}.txt'.format(ratio,split)) files?

Warning: grad and param do not obey the gradient layout contract. This is not an error, but may impair performance.

您好，我的程序运行过程中，第1次验证val后总会出现如下警告：
[W accumulate_grad.h:185] Warning: grad and param do not obey the gradient layout contract. This is not an error, but may impair performance.
grad.sizes() = [2, 512, 1, 1], strides() = [512, 1, 1, 1]
param.sizes() = [2, 512, 1, 1], strides() = [512, 1, 512, 512] (function operator())
而且每回也就出现这1次，不知道您是否有遇到一样的Warning。

训练时间+GPU是否利用问题

作者您好！我在复现您的变化检测 LEVIR-CD 数据集部分内容时（BIT-ResNet50 训练，ResNet50 预训练模型使用torch官方结果），batch size设置为4 （大于4则会爆显存）。发现训练一个epoch需要十五分钟左右，这和 LEVIR-CD 较小的数据量不符。在复现 BIT 算法官方论文时，同样的模型设置训练一个 epoch 仅需一分钟，而且能接受更大（12）的 batch size 且不会爆显存。
希望咨询作者，这种情况发生的原因是什么？因为代码是在 BIT 基础上修改的，理论上不会有这么大的训练差距，所以非常疑惑。
非常期待作者的回复，谢谢！