Website: http://dk-liang.github.io/
Google Scholar: https://scholar.google.com/dk-liang
Collect some papers about transformer with vision. Awesome Transformer with Computer Vision (CV)
Website: http://dk-liang.github.io/
Google Scholar: https://scholar.google.com/dk-liang
Hi. I came across these paper so its a good idea to add them so we can refer to them when we decide to read them later (i hope i can finally start reading my never-ending list :)
Three things everyone should know about Vision Transformers: https://arxiv.org/pdf/2203.09795.pdf
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection: https://arxiv.org/pdf/2203.03605.pdf
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection: https://arxiv.org/pdf/2112.01526.pdf
A-ViT: Adaptive Tokens for Efficient Vision Transformer: https://arxiv.org/pdf/2112.07658.pdf
Shunted Self-Attention via Multi-Scale Token Aggregation: https://arxiv.org/pdf/2111.15193.pdf
Dear Dingkang,
Thanks a lot for your project. Our paper P2T has been accepted by IEEE TPAMI 2022 recently.
Could you please update the status of P2T?
BTW, full code of P2T has also been released here: https://github.com/yuhuan-wu/P2T
IEEE online address is here: https://ieeexplore.ieee.org/document/9870559
Best,
Yu-Huan
Styleformer: Transformer based Generative Adversarial Networks with Style Vector
PDF: https://arxiv.org/abs/2106.07023
Code: https://github.com/Jeeseung-Park/Styleformer
Uniformer: Unified Transformer for Efficient Spatiotemporal Representation Learning
Accepted by ICLR 2022
arxiv: https://arxiv.org/abs/2201.04676
code: https://github.com/Sense-X/UniFormer
hi, there are some recent papers i read, and they are missing from here:
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
https://arxiv.org/pdf/2106.11297v2.pdf
Sliced Recursive Transformer
https://arxiv.org/pdf/2111.05297.pdf
Transformer in Transformer https://arxiv.org/abs/2103.00112
Hi, thanks for your awesome repo!
please consider adding the new arxiv paper:
Uformer: A General U-Shaped Transformer for Image Restoration
arxiv: https://arxiv.org/abs/2106.03106
Thanks for your awesome paper list ! Our paper 'Augmented Shortcuts for Vision Transformers' has accepted by NeurIPS 2021. Could you add it in the paper list? Thanks.
paper link: https://arxiv.org/abs/2106.15941
TransRPPG: Remote Photoplethysmography Transformer for 3D Mask Face Presentation Attack Detection https://arxiv.org/abs/2104.07419
Contextual Transformer Networks for Visual Recognition
PDF: https://arxiv.org/pdf/2107.12292.pdf
Code: https://github.com/JDAI-CV/CoTNet
Thank you for your great project, and we are glad that our paper CrossFormer is also listed.
While our paper is listed as an arxiv pre-print, it has been accepted by ICLR 2022 CrossFormer: A Versatile Vision Transformer Hinging on Cross-scale Attention. You may wish to transfer it to the ICLR section.
Hi, thanks for such a great collection of awesome vision transformer works! Could you please add our Focal Transformers:
Paper: https://arxiv.org/pdf/2107.00641.pdf
Code: https://github.com/microsoft/Focal-Transformer
thanks!
Thank you for sharing this collection of papers
I also made a paper collection list about vision attention and transformer:
https://github.com/cmhungsteve/Awesome-Transformer-Attention
Feel free to check and share it!
I will also be appreciative if you can add a link to my repo.
Thank you
CoAtNet: Marrying Convolution and Attention for All Data Sizes
https://arxiv.org/pdf/2106.04803.pdf
Thank you for great repo.
Please consider to add:
Unsupervised MRI Reconstruction via Zero-Shot Learned Adversarial Transformers (SLATER)
DearKD: Data-Efficient Early Knowledge Distillation for Vision Transformers
paper: https://arxiv.org/abs/2204.12997
Thanks~
Thanks for your awesome repo.
Please consider add: MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens (https://arxiv.org/abs/2105.15168).
XCiT: Cross-Covariance Image Transformers
PDF: https://arxiv.org/pdf/2106.09681.pdf
Code: https://github.com/facebookresearch/xcit
Hi, @dk-liang, please help add the below papers:
[ICT] High-Fidelity Pluralistic Image Completion with Transformers [paper], [code], ICCV 2021
[BEVT] BEVT: BERT Pretraining of Video Transformers [paper], [code], CVPR 2022
[PeCo] PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers [paper]
[MobileFormer] Mobile-Former: Bridging MobileNet and Transformer [paper], CVPR 2022
Please update the status of the following paper :
[Container] Container: Context Aggregation Network
to
[Container] Container: Context Aggregation Network [paper][code] [Neuips 2021]
code : https://github.com/gaopengcuhk/Container
Hello! Great work!
I would like to introduce CVPR2023 paper called QD-DETR (Query-Dependent Detection Transformer).
Paper : Query-Dependent Video Representation for Moment Retrieval and Highlight Detection
arxiv link : https://arxiv.org/abs/2303.13874
Github link : https://github.com/wjun0830/QD-DETR
Thank you.
Hi, @dk-liang, thanks for this great repository. Could you please consider adding HGOnet, which has been accepted in WACV 2022? Thanks in advance!
Image-Adaptive Hint Generation via Vision Transformer for Outpainting
paper: https://openaccess.thecvf.com/content/WACV2022/papers/Kong_Image-Adaptive_Hint_Generation_via_Vision_Transformer_for_Outpainting_WACV_2022_paper.pdf
code: https://github.com/kdh4672/hgonet
Augmented Transformer with Adaptive Graph for Temporal Action Proposal Generation https://arxiv.org/abs/2103.16024
paper : https://arxiv.org/abs/2106.02520
github & code : https://github.com/SunghwanHong/CATs
Hi, @dk-liang. Thanks for this great repository. Please add CoFormer.
Collaborative Transformers for Grounded Situation Recognition
Paper: https://arxiv.org/abs/2203.16518
Code: https://github.com/jhcho99/CoFormer
This paper is accepted to CVPR 2022.
Please add SignBERT: https://arxiv.org/abs/2110.05382,
Thanks for your support~
Hi @dk-liang, thanks for your awesome repository.
Could you add BatchFormer which has been accepted in CVPR2022.
arxiv: https://arxiv.org/abs/2203.01522
code: https://github.com/zhihou7/BatchFormer
In addition, a more general version, BatchFormerV2, is also released in https://arxiv.org/abs/2204.01254, in which we design a new module and present the consistent effectiveness on object detection, panoptic segmentation, and image classification.
Regards,
TransGAN: Two Transformers Can Make One Strong GAN
arxiv: https://arxiv.org/abs/2102.07074
Chinese media: https://zhuanlan.zhihu.com/p/351062165
Youtube: https://www.youtube.com/watch?v=R5DiLFOMZrc&t=941s
Hi, @dk-liang, thanks for this great repository. Could you please consider adding TransFusion, which has been accepted in BMVC 2021? Thanks in advance!
TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation
paper: https://arxiv.org/abs/2110.09554
code: https://github.com/HowieMa/TransFusion-Pose
Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer appears twice.
Please update the status of the following paper :
Dual-stream Network for Visual Recognition [paper]
to
Dual-stream Network for Visual Recognition [paper][code] [Neuips 2021]
https://github.com/gaopengcuhk/DSNet
VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training
paper: https://arxiv.org/abs/2203.12602
code: https://github.com/MCG-NJU/VideoMAE
VisTR: End-to-End Video Instance Segmentation with Transformers https://arxiv.org/abs/2011.14503
Thanks for your great repo!
Please consider add "What Makes for Hierarchical Vision Transformer?" (https://arxiv.org/abs/2107.02174)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.