Thank you for releasing code for such a creative method! However I've faced some probl

Hi, I recommend you use the code from this repo to pre-train y

About training parameters for WRN about tim HOT 3 CLOSED

mboudiaf commented on May 27, 2024

About training parameters for WRN

from tim.

Comments (3)

mboudiaf commented on May 27, 2024

Hi !
Thank you very much for your interest and for raising this issue. I've tried running 10'000 16-ways -1-shot tasks using the models I had trained (the ones available for download) and I get the following convergence plot for TIM-GD:

with a final result just below 49%, and a baseline (i.e SimpleShot method) around 39%, which makes more sense. I will investigate further on the training script, and run everything from scratch to make sure no mistake was introduced in my latest commits. I will get back to you soon.

UPDATE : It seems in my latest commit I introduced a duplicate scheduler.step() in both main.py and trainer.py, which caused the learning rate to decay twice as fast as it should, degrading the results. Using the "bugged" coded, I was still able to get 44% in your setting, and fixing the bug I could obtain 48% + after training. I have pushed my changes. Please could you pull and see if that solves your problem ? Thank you :)

Malik

from tim.

SnowyJune973 commented on May 27, 2024

Hi !
Thank you very much for your interest and for raising this issue. I've tried running 10'000 16-ways -1-shot tasks using the models I had trained (the ones available for download) and I get the following convergence plot for TIM-GD:

with a final result just below 49%, and a baseline (i.e SimpleShot method) around 39%, which makes more sense. I will investigate further on the training script, and run everything from scratch to make sure no mistake was introduced in my latest commits. I will get back to you soon.

UPDATE : It seems in my latest commit I introduced a duplicate scheduler.step() in both main.py and trainer.py, which caused the learning rate to decay twice as fast as it should, degrading the results. Using the "bugged" coded, I was still able to get 44% in your setting, and fixing the bug I could obtain 48% + after training. I have pushed my changes. Please could you pull and see if that solves your problem ? Thank you :)

Malik

Thank you for your efforts!
It was my fault that I didn't mention further details on how I obtained the 39.44% result. I didn't use TIM-GD or other TIM-* methods to do this, instead I trained a classifier with WRN as the backbone and a FC layer attached to it, using the training parameters adopted in this project, so I think the "bugged" code will not affect the result I got (my codes are based on pretrain.py and model/models/classifier.py in FEAT) .
It should be fine if the configuration in the scripts/train/wideres.sh and src/trainer.py is nothing wrong. Besides, I noticed you introduced a MixUp augmentation in the training process of trainer.py, but it seems to be deprecated in recent versions. Could you tell me if you activated it when getting the results shown on the paper? That may be a crucial factor. Thanks :)

from tim.

mboudiaf commented on May 27, 2024

Hi,

I recommend you use the code from this repo to pre-train your model if you want to obtain results for TIM. From what I see, the code in pretrain.py from the FEAT code is not using a standard cross-entropy training as we do, but an episodic one. In this case, using the same hyperparameters as the ones I am using does not really make sense. Am I missing something ?
I have left the possibility to use mixup for future attempts, but the results presented in our paper does not include mixup augmentation during pre-training.

Best,
Malik

from tim.

About training parameters for WRN about tim HOT 3 CLOSED

Comments (3)

Related Issues (11)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent