Hi, I tried the official code and hyperparameters suggested in the paper for training

Could you please share the s and accompanying training logs for ViT-L? Having ac

Cannot Reproduce the results on ViT-L about droppos HOT 12 CLOSED

vateye commented on August 19, 2024

Cannot Reproduce the results on ViT-L

from droppos.

Comments (12)

Haochen-Wang409 commented on August 19, 2024

Hi, have you reproduced the performance of ViT-B? Also, could you provide more details of the configuration of your experiments using ViT-L?

from droppos.

vateye commented on August 19, 2024

Yes, I can reproduce for base but not for large, for both base and large, I follow the configuration in Appendix A.1 of the paper.
For pretraining

python -m torch.distributed.launch --nproc_per_node=8 --nnodes 4 --node_rank 0 \
    main_pretrain.py \
    --batch_size 128 \
    --accum_iter 1 \
    --model DropPos_mae_vit_large_patch16_dec512d2b \
    \
    --drop_pos_type mae_pos_target \
    --use_mask_token \
    --pos_mask_ratio 0.75 \
    --pos_weight 0.1 \
    --label_smoothing_sigma 1 \
    --sigma_decay \
    --attn_guide \
    \
    --input_size 224 \
    --token_size 14 \
    --mask_ratio 0.75 \
    --epochs 200 \
    --warmup_epochs 40 \
    --blr 1.5e-4 --weight_decay 0.05 \
    --data_path /path/to/imagenet \
    --output_dir  ./output_dir \
    --log_dir   ./log_dir \
    --experiment droppos_pos_mask0.75_posmask0.75_smooth1to0_sim_in1k_ep200

For finetuning

python -m torch.distributed.launch --nproc_per_node=8 --nnodes 4 --node_rank 0 \
    main_finetune.py \
    --batch_size 32 \
    --accum_iter 1 \
    --model vit_large_patch16 \
    --finetune /path/to/checkpoint \
    \
    --epochs 50 \
    --warmup_epochs 5 \
    --blr 1e-3 --layer_decay 0.75 --weight_decay 0.05 \
    --drop_path 0.1 --reprob 0.25 --mixup 0.8 --cutmix 1.0 \
    --dist_eval \
    --data_path /path/to/imagenet \
    --nb_classes 1000 \
    --output_dir  ./output_dir \
    --log_dir   ./log_dir \
    --experiment droppos_vit_large_patch16_in1k_ep200

from droppos.

Haochen-Wang409 commented on August 19, 2024

How about trying enlarging the decoder by setting the depth to 4 or 8?

from droppos.

vateye commented on August 19, 2024

Is this the default setting for ViT-L? If it is possible, could you provide your running command for training ViT-L for the best performance?

from droppos.

Haochen-Wang409 commented on August 19, 2024

The default setting is just what we have provided. However, the training of ViT-L is much more unstable than ViT-B.

from droppos.

vateye commented on August 19, 2024

So for your default setting, the model used for pre-training is DropPos_mae_vit_large_patch16_dec512d2b instead of DropPos_mae_vit_large_patch16_dec512d8b. I will launch multiple jobs to see whether the stability is the core issue.

from droppos.

vateye commented on August 19, 2024

Hi, haochen,

I have tried to rerun the job several times. I also have tried different depth number of decoders and the number of epochs for pre-training. I still cannot reproduce the results reported in the paper (ViT-L 200epochs can achieve 84.5 top-1 acc.).

For the "unstable training", I have run the same jobs with the hyperparameters suggested in the paper for 3 different trails. For each trail, I vary the seed during pre-training. And after 200 epochs pre-training on ViT-L, I can only get 82.83, 83.04, and 82.86.

For the depth issue, I have run the experiments with depth=2 and depth=8. The results after pre-training for 200 epochs are 82.83 and 82.73.

For different epochs, I have tried to pre-training ViT-L for 200, 400, and 800 epochs. The fine-tuning performance are similar. (82.83 v.s. 82.51 v.s. 83.20), which is much lower than that reported in the paper.

from droppos.

vateye commented on August 19, 2024

Could you please share the scripts and accompanying training logs for ViT-L? Having access to these would greatly enhance our understanding and enable us to replicate your work more effectively. ：）

from droppos.

Haochen-Wang409 commented on August 19, 2024

I would like to first clarify that the top-1 of ViT-L with 200 epochs of pre-training is 83.7 instead of 84.5 (see Table 5 in our paper for more details).
And, using the exact configuration reported in our paper on page 15 is expected to reproduce the result. But sometimes, setting the base learning rate of fine-tuning as 5e-4 may get better results. It is quite unexpected to achieve only ~83% top-1 with ViT-L.
Finally, I will release both the pre-trained and the fine-tuned checkpoints for both ViT-B and ViT-L with 800 epochs of pre-training within a few hours.

from droppos.

vateye commented on August 19, 2024

Sorry for the wrong reference number. Is there any training logs and scripts that can reproduce the results? Since releasing the training logs and scripts would be more helpful for research community to fully reproduce the results.

from droppos.

Haochen-Wang409 commented on August 19, 2024

I am willing to provide the training log but I can only find the fine-tuning log for 800 epochs of ViT-L, which is attached below, and now I have no resources to reproduce DropPos since it requires a large amount of GPUs :(

DropPos_vit_large_patch16_ft_log.txt

I have checked the training scripts you provided. The global batch size is correct.

from droppos.

Haochen-Wang409 commented on August 19, 2024

The pre-trained and fine-tuned models are available at: https://pan.baidu.com/s/1xj9XiHgagKGJrJt88IfhLw?pwd=4gik.
The fetch code is 4gik.

from droppos.

Cannot Reproduce the results on ViT-L about droppos HOT 12 CLOSED

Comments (12)

Related Issues (8)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent