Git Product home page Git Product logo

Comments (9)

fmassa avatar fmassa commented on July 2, 2024 8

As an additional source of information, from the training logs for COCO that we show in https://github.com/facebookresearch/detr#training for the shorter schedule, if you look for the test_coco_eval_bbox field, you'll see the following for the first few epochs on COCO

epoch mAP
1 0.2
2 1.2
3 2.1
4 3.3
5 5.2
10 12.1
20 21.6
30 25.3

Note that an epoch of COCO corresponds roughly to 118k images. So for a dataset of 1000 images, I would expect that you would start to see non-zero mAP after 200-300 epochs

from detr.

fmassa avatar fmassa commented on July 2, 2024 6

Hi,

Transformers are notoriously slow to train and requires a lot of training iterations for it to converge with Adam optimizer.

As a rule of thumb, I would encourage training for at least 300k training steps, which, for your current dataset size and batch size, would correspond to roughly 1300 epochs, while you have only trained for 10 epochs.

Note that if you increase the number of epochs, you should also change the lr_step, as it is currently set to 200 (I would recommend setting it to 1000 if you use 1300 iterations).

Also, you have set a learning rate of 1e-3, which in our experiments is too large and doesn't work well, and we would recommend using the default lr of 1e-4.

As an additional note, it looks like your training dataset is fairly small (~1000 images), which might not contain enough data for the transformer to learn to generalize very well, but it still needs to be tested.

I believe I have answered your questions, and as such I'm closing the issue, but let us know if you have further questions.

from detr.

fmassa avatar fmassa commented on July 2, 2024 4

That being said, maybe you could start from a pre-trained DETR model and replace the last fully-connected layers. We haven't tried transfer learning ourselves, but there is no reason why it wouldn't work I think. You can find more information in #9

from detr.

wenjun90 avatar wenjun90 commented on July 2, 2024

Thank you @fmassa for your answer.
I tried train with 1000 images and valid 300 images with 2 class. I set the config like that:
--lr 1e-3 --batch_size 4 --epochs 10.

I surprise when all epoch from 0 to 9, the result of eval is no change and always zeros.

from detr.

fmassa avatar fmassa commented on July 2, 2024

yes, that is expected because the learning rate is too high and the number of epochs is too small compared to the size of your dataset, as I explained in the above messages.

from detr.

wenjun90 avatar wenjun90 commented on July 2, 2024

Thank you for your clearly explain. With detr, do we always need a large amount of epoch to get good metrics?
We can express a function of the number of images and the number of epochs, in the case I set lr = 1e-4.
For example 1000 images need 1300 epoch, if 10000-20000 images, how many epoch we need?

Thank you for your advice

from detr.

fmassa avatar fmassa commented on July 2, 2024

I dont have exact rules that always work, but as a rule of thumb I would say to target at least 300k training iterations.

from detr.

chamathabeysinghe avatar chamathabeysinghe commented on July 2, 2024

@wenjun90 Did you see any change after 300 epochs?

from detr.

mei123456789 avatar mei123456789 commented on July 2, 2024

Hi,

Transformers are notoriously slow to train and requires a lot of training iterations for it to converge with Adam optimizer.

As a rule of thumb, I would encourage training for at least 300k training steps, which, for your current dataset size and batch size, would correspond to roughly 1300 epochs, while you have only trained for 10 epochs.

Note that if you increase the number of epochs, you should also change the lr_step, as it is currently set to 200 (I would recommend setting it to 1000 if you use 1300 iterations).

Also, you have set a learning rate of 1e-3, which in our experiments is too large and doesn't work well, and we would recommend using the default lr of 1e-4.

As an additional note, it looks like your training dataset is fairly small (~1000 images), which might not contain enough data for the transformer to learn to generalize very well, but it still needs to be tested.

I believe I have answered your questions, and as such I'm closing the issue, but let us know if you have further questions.

45/5000
Hello, I would like to ask why loss fluctuated when I changed the code to train my own data set. Thank you.

from detr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.