Hi, Thank you for your great job. I have a question with training own dataset.

As an additional source of information, from <a href="https://gist.github.com/szagoruy

Thank you <a class="user-mention notranslate" data-hovercard-type="user" data-hovercar

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Question for training own dataset about detr HOT 9 CLOSED

facebookresearch commented on July 2, 2024

Question for training own dataset

from detr.

Comments (9)

fmassa commented on July 2, 2024 8

As an additional source of information, from the training logs for COCO that we show in https://github.com/facebookresearch/detr#training for the shorter schedule, if you look for the test_coco_eval_bbox field, you'll see the following for the first few epochs on COCO

epoch	mAP
1	0.2
2	1.2
3	2.1
4	3.3
5	5.2
10	12.1
20	21.6
30	25.3

Note that an epoch of COCO corresponds roughly to 118k images. So for a dataset of 1000 images, I would expect that you would start to see non-zero mAP after 200-300 epochs

from detr.

fmassa commented on July 2, 2024 6

Hi,

Transformers are notoriously slow to train and requires a lot of training iterations for it to converge with Adam optimizer.

As a rule of thumb, I would encourage training for at least 300k training steps, which, for your current dataset size and batch size, would correspond to roughly 1300 epochs, while you have only trained for 10 epochs.

Note that if you increase the number of epochs, you should also change the lr_step, as it is currently set to 200 (I would recommend setting it to 1000 if you use 1300 iterations).

Also, you have set a learning rate of 1e-3, which in our experiments is too large and doesn't work well, and we would recommend using the default lr of 1e-4.

As an additional note, it looks like your training dataset is fairly small (~1000 images), which might not contain enough data for the transformer to learn to generalize very well, but it still needs to be tested.

I believe I have answered your questions, and as such I'm closing the issue, but let us know if you have further questions.

from detr.

fmassa commented on July 2, 2024 4

That being said, maybe you could start from a pre-trained DETR model and replace the last fully-connected layers. We haven't tried transfer learning ourselves, but there is no reason why it wouldn't work I think. You can find more information in #9

from detr.

wenjun90 commented on July 2, 2024

Thank you @fmassa for your answer.
I tried train with 1000 images and valid 300 images with 2 class. I set the config like that:
--lr 1e-3 --batch_size 4 --epochs 10.

I surprise when all epoch from 0 to 9, the result of eval is no change and always zeros.

from detr.

fmassa commented on July 2, 2024

yes, that is expected because the learning rate is too high and the number of epochs is too small compared to the size of your dataset, as I explained in the above messages.

from detr.

wenjun90 commented on July 2, 2024

Thank you for your clearly explain. With detr, do we always need a large amount of epoch to get good metrics?
We can express a function of the number of images and the number of epochs, in the case I set lr = 1e-4.
For example 1000 images need 1300 epoch, if 10000-20000 images, how many epoch we need?

Thank you for your advice

from detr.

fmassa commented on July 2, 2024

I dont have exact rules that always work, but as a rule of thumb I would say to target at least 300k training iterations.

from detr.

chamathabeysinghe commented on July 2, 2024

@wenjun90 Did you see any change after 300 epochs?

from detr.

mei123456789 commented on July 2, 2024

Hi,

Transformers are notoriously slow to train and requires a lot of training iterations for it to converge with Adam optimizer.

As a rule of thumb, I would encourage training for at least 300k training steps, which, for your current dataset size and batch size, would correspond to roughly 1300 epochs, while you have only trained for 10 epochs.

Note that if you increase the number of epochs, you should also change the lr_step, as it is currently set to 200 (I would recommend setting it to 1000 if you use 1300 iterations).

Also, you have set a learning rate of 1e-3, which in our experiments is too large and doesn't work well, and we would recommend using the default lr of 1e-4.

As an additional note, it looks like your training dataset is fairly small (~1000 images), which might not contain enough data for the transformer to learn to generalize very well, but it still needs to be tested.

I believe I have answered your questions, and as such I'm closing the issue, but let us know if you have further questions.

45/5000
Hello, I would like to ask why loss fluctuated when I changed the code to train my own data set. Thank you.

from detr.

Question for training own dataset about detr HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent