wofmanaf / rest Goto Github PK

View Code? Open in Web Editor NEW

270.0 6.0 29.0 227 KB

This is an official implementation for "ResT: An Efficient Transformer for Visual Recognition".

License: Apache License 2.0

Python 99.46% Shell 0.54%

rest's Introduction

Updates

(2022/05/10) Code of ResTV2 is released! ResTv2 simplifies the EMSA structure in ResTv1 (i.e., eliminating the multi-head interaction part) and employs an upsample operation to reconstruct the lost medium- and high-frequency information caused by the downsampling operation.

ResT: An Efficient Transformer for Visual Recognition

Official PyTorch implementation of ResTv1 and ResTv2, from the following paper:

ResT: An Efficient Transformer for Visual Recognition. NeurIPS 2021.
ResT V2: Simpler, Faster and Stronger. NeurIPS 2022.
By Qing-Long Zhang and Yu-Bin Yang
State Key Laboratory for Novel Software Technology at Nanjing University

ResTv1 is initially described in arxiv, which capably serves as a general-purpose backbone for computer vision. It can tackle input images with arbitrary size. Besides, ResT compressed the memory of standard MSA and model the interaction between multi-heads while keeping the diversity ability.

Catalog

ImageNet-1K Training Code
ImageNet-1K Fine-tuning Code
Downstream Transfer (Detection, Segmentation) Code

Results and Pre-trained Models

ImageNet-1K trained models

name	resolution	acc@1	#params	FLOPs	Throughput	model
ResTv1-Lite	224x224	77.2	11M	1.4G	1246	baidu
ResTv1-S	224x224	79.6	14M	1.9G	1043	baidu
ResTv1-B	224x224	81.6	30M	4.3G	673	baidu
ResTv1-L	224x224	83.6	52M	7.9G	429	baidu
ResTv2-T	224x224	82.3	30M	4.1G	826	baidu
ResTv2-T	384x384	83.7	30M	12.7G	319	baidu
ResTv2-S	224x224	83.2	41M	6.0G	687	baidu
ResTv2-S	384x384	84.5	41M	18.4G	256	baidu
ResTv2-B	224x224	83.7	56M	7.9G	582	baidu
ResTv2-B	384x384	85.1	56M	24.3G	210	baidu
ResTv2-L	224x224	84.2	87M	13.8G	415	baidu
ResTv2-L	384x384	85.4	87M	42.4G	141	baidu

Note: Access code for baidu is rest. Pretrained models of ResTv1 is now available in google drive.

Installation

Please check INSTALL.md for installation instructions.

Evaluation

We give an example evaluation command for a ImageNet-1K pre-trained, then ImageNet-1K fine-tuned ResTv2-T:

Single-GPU

python main.py --model restv2_tiny --eval true \
--resume restv2_tiny_384.pth \
--input_size 384 --drop_path 0.1 \
--data_path /path/to/imagenet-1k

This should give

* Acc@1 83.708 Acc@5 96.524 loss 0.777

For evaluating other model variants, change --model, --resume, --input_size accordingly. You can get the url to pre-trained models from the tables above.
Setting model-specific --drop_path is not strictly required in evaluation, as the DropPath module in timm behaves the same during evaluation; but it is required in training. See TRAINING.md or our paper for the values used for different models.

Training

See TRAINING.md for training and fine-tuning instructions.

Acknowledgement

This repository is built using the timm library.

License

This project is released under the Apache License 2.0. Please see the LICENSE file for more information.

Citation

If you find this repository helpful, please consider citing:

ResTv1

@inproceedings{zhang2021rest,
  title={ResT: An Efficient Transformer for Visual Recognition},
  author={Qinglong Zhang and Yu-bin Yang},
  booktitle={Advances in Neural Information Processing Systems},
  year={2021},
  url={https://openreview.net/forum?id=6Ab68Ip4Mu}
}

ResTv2

@article{zhang2022rest,
  title={ResT V2: Simpler, Faster and Stronger},
  author={Zhang, Qing-Long and Yang, Yu-Bin},
  journal={arXiv preprint arXiv:2204.07366},
  year={2022}

Third-party Implementation

[2022/05/26] ResT and ResT v2 have been integrated into PaddleViT, checkout here for the 3rd party implementation on Paddle framework!

rest's People

Contributors

Stargazers

Watchers

rest's Issues

很棒的工作！但反映一个问题，论文中的PVT的引用全错了...

论文中的PVT[5]引用的全是ViT[5]，实际应该是：Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions，参考文献中没有，需要新加这个文献

Is there any operation similar to ‘padding mask’?

Hi, thanks for your great work, and Is there any operation similar to 'padding mask' like as DERT to indicate where is the image and where is padding.

How to train resT, not resT-v2

Hello. I want to train resT on my own dataset.
Therefore, I tried to write the model name 'rest_base' on args.model since this name exists in rest.py file.
However the error 'unrecognized arguments: -model rest_base' occured.

So which commands can I enter into args.model in order to train resT v1?

Performance in MVS

Hi~
Thanks for your excellent work! I see this words in your code:
x = self.avg_pool(x).flatten(1) # if use in MVS, should abandon this part
x = self.head(x) # if use in MVS, should abandon this part
It seems you have tried Rest in MVS task, I'm wandering what the perfomance of Rest in MVS, and what experiment you have tried.
Looking forward to your reply! Thanks!

关于类PA的作用

你好很感谢你的工作。有一个问题想请教下：代码中class PA ()的作用是什么？感觉与论文2.4 Position Encoding不对应。因为代码中PA调用是在class PatchEmbed （认为与论文2.3 Patch Embedding相关）class BasicStem （the first patch embedding module）。综上 class PA ()与论文2.3 Patch Embedding相关，不与2.4 Position Encoding相关，但是论文2.4 Position Encoding中公式（8）描述了PA的Conv2d（）和Sigmoid() 。
再次谢谢

数据集

你好，我想请问一下您训练的数据集是ImageNet-1K的哪一个呢

Is there any quantitative analysis of the experimental results of MSA and EMSA?

I think it is a necessary ablation study to make a quantitative comparison of performance and efficiency of the two modules. But it's not in the paper

rest.py

Hello, if I want to try your backborn on other visual tasks, can I call rest.py directly?

ResT分类模型如何从断点resume继续训练

您好，我训练ResT-large到100多个epochs断了，输出文件有个checkpoint.pth.tar,我通过解压这个tar文件得到data.pkl和data，请问一下如果想获得pth权重文件，是解压data.pkl文件么

How to calculate FLOPS on ResT

Hi @wofmanaf

Thank you for your source code.
Can you share source code to calculate the number of FLOPS on ResT models ?

Best,

Chakkrit

About finetune

Hello,when I use your ResT_base weight to my fine-frained image classification task, I use the "--finetune",but i got a mistake . in 264 line(main.py) :
**UnboundLocalError: local variable 'model_without_ddp' referenced before assignment.**how can i fix it?

by the way, in line 431 in main.py: n_parameters is not define,can you give the definition？

大佬，能不能把V2的预训练模型发一份google drive

或者那个V2百度网盘的提取密码能告知下吗？

How to use ResT as my own backbone network

Hi, thanks for your great work. Now I want to use your Rest as my backbone network for visual target tracking. How do I implement this operation? Can I directly download your pre-trained model for loading?

Each head in MSA only responsible for a subset of the input tokens ？

maybe "Each head in MSA only responsible for a subset of channels of the input tokens".
According to the Transformer, each head is not responsible for part of input tokens, but part of channels of one input token.

pretrained model file's password of baiduYun?(百度网盘提取码)

感谢这项开源代码!
请问预训练权重的网盘提取密码是?
感谢!

What's the password of the pretrained model in baiduYun ?
Thanks a lot!

restv2预训练模型提取码是什么？谢谢

RestV2-L-384模型链接失效

可以补发一下嘛

convert_to_d2.py 转成pth还是pkl格式？

您好，我使用 convert_to_d2.py转换的权重文件，都加载不成功

ResTV2 模型百度网盘密码不知道

你好，v2百度网盘密码不知道，能给一份么谢谢

用ResTv1做主干，做目标检测出现张量错误

做目标检测类别为1 运行时出现下面的错误，这是什么原因？

KeyError: 'Non-existent config key: MODEL.REST'

你好，我安装了detectron2，直接在d2文件夹下，运行./train_net.py --num-gpus 1
--config-file ./configs/COCO-InstanceSegmentation/mask_rcnn_rest_base_FPN_1x.yaml
SOLVER.IMS_PER_BATCH 2 SOLVER.BASE_LR 0.0025。
然后报错：KeyError: 'Non-existent config key: MODEL.REST'。
请问可以麻烦教一下，怎么运行成功在的detectron2中运行ResT么

Mistake of your paper, "MCSA" should be replaced with "EMSA".

arxiv上 v2 版本中图四的排版问题

您好

    我从https://arxiv.org/pdf/2105.13677v2.pdf下载了最新的版本，但是其中第5页的figure4存在排版问题。