Comments (4)
Hi, thanks for your attention to our work! Here are point-to-point responses:
- The positional embeddings will be added to downstream tasks when setting
pos_mask_ratio=1
in pre-training. DropPos is not equivalent to MP3 [1] withpos_mask_ratio=1
because the visible patches of DropPos are encoded with positional embeddings while no positional information is added tocontext tokens
in [1]. Moreover, DropPos employs a patch masking stage. Therefore, DropPos is more efficient than [1]. - The
multi_task
setting is expected to boost ~0.5% of the top-1 accuracy on ImageNet-1K with a ViT-B backbone pre-trained with 200 epochs. - DropPos tries to reconstruct dropped positions based on patch appearances. These visible patches without positional embeddings provide sufficient information for further position reconstruction. Similar to most self-supervised methods, the encoder is responsible for learning scalable feature representations while the decoder is served to the particular pre-text task, i.e., reconstructing dropped positions in DropPos.
from droppos.
Thank you for the explanation. I still have a few questions.
- When
pos_mask_ratio=1
, DropPos didn't see any position info either, did it? - Regarding my 3rd question, if there are no new tokens joined in the decoder, what's the difference between a "12 layers encoder + 2 layers decoder" setting and a "14 layers encoder" setting?
from droppos.
It seems no difference. The only thing that matters may be to choose features from which layer for downstream classification.
from droppos.
Thank you. That answers my questions.
from droppos.
Related Issues (8)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from droppos.