Hi,
I am willing to download the models you've uploaded recently. Do you consider uploading them on something like google drive or dropbox ? Or is there a way to download them via link you provided without registration and installing baidu?
Thanks
Hi, I tried the official code and hyperparameters suggested in the paper for training ViT-L for 200 epochs. And after the fine-tuning, I only can achieve 82.8 Top-1 Acc on ImageNet-1K. Is there any missing details for training DropPos?
Hi,
Thank you for the impressive work. I want to double-check a few points about the paper and code.
When setting pos_mask_ratio=1 in pre-training, do we apply any position encoding in downstream tasks, e.g., linear probing? Also, could we say DropPos is almost equivalent to Zhai et al [1], under this setting?
I found "--multi_task" in the pre-train code. However, it seems no related reports about it. I am curious about its performance boosting.
The visible patches with masked positions are involved in the encoder processing. This is somehow different from MAE, shouldn't they join later in the decoder stage (further speed up training?)? Under this setting, what's the difference between an encoder and a decoder?
[1] Zhai et al, Position Prediction as an Effective Pretraining Strategy
It is interesting that DropPos achieves exactly the same performance as your CVPR paper (HPM), is it a coincidence or there is some internal connection?