haochen-wang409 / droppos Goto Github PK
View Code? Open in Web Editor NEW[NeurIPS'23] DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions
License: Apache License 2.0
[NeurIPS'23] DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions
License: Apache License 2.0
Hi, I tried the official code and hyperparameters suggested in the paper for training ViT-L for 200 epochs. And after the fine-tuning, I only can achieve 82.8 Top-1 Acc on ImageNet-1K. Is there any missing details for training DropPos?
大佬,请问方便释放预训练权重吗?
Hi,
I am willing to download the models you've uploaded recently. Do you consider uploading them on something like google drive or dropbox ? Or is there a way to download them via link you provided without registration and installing baidu?
Thanks
您好,请问这个模型在预训练阶段结束后的loss大概在多少为佳呢?我不知道当前训练出的loss是否太大了
Hi author, thank you for contributing such interesting and solid work.
I got a question (maybe is a trivial question), the reconstruct target of DropPos are the actual positions of maksed PE right? But why would you consider to firstly mask a subset of patches? ( I can understand that it's necessary for MAE due to its target is RGB pixel) Is this because reconstructing the masked PE is a simply pretext task for pre-training ViT? (as the paper claims: trivial solution)
If so, directly feeding all patches into encoder will produces a suboptimal results, since all patches are visible for encoder, and it can reason the masked PE according all possible positions. In contrast, if we only allow it to "see" part of patches, it has to reason the masked PE only by the visible patch.
Am I right for this question? I hope you can provide some insight to me, thanks a lot!
Hi,
Thank you for the impressive work. I want to double-check a few points about the paper and code.
[1] Zhai et al, Position Prediction as an Effective Pretraining Strategy
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.