Git Product home page Git Product logo

Comments (9)

fmassa avatar fmassa commented on August 23, 2024 3

@gaopengcuhk we didn't scale the learning rate for our experiments, we found out that by using Adam it was ok to use the same default values for all configurations (even if using 64 GPUs).

The linear scaling rule is definitely too aggressive, and the model will probably not train at all with it. If you want to try some scaling rule for the learning rate, using the square-root scaling could potentially work (so increase batch size by 2, multiply learning rate by sqrt(2))

I believe I've answered your question, and as such I'm closing the issue, but let us know if you have futher questions.

from detr.

DeppMeng avatar DeppMeng commented on August 23, 2024 3

@szagoruyko I tested train DETR for 150 epochs with 8V100 GPUs and 8V100 GPUs * 4 nodes setting, with the learning rate unchanged. However, there is still a performance gap.

GPU config AP
8 39.9
8 * 4 38.4
8 * 8 running

Did you have similar observation? Or the gap will diminish in 300 epoch setting?

from detr.

gaopengcuhk avatar gaopengcuhk commented on August 23, 2024 1

If you keep the learning rate unchanged, the performance of 16GPU is worse than 8GPU at the same epoch, right?

from detr.

szagoruyko avatar szagoruyko commented on August 23, 2024 1

@gaopengcuhk depends on the total batch size, for example if we keep total batch size of 32 images with 2 im/gpu on 16 cards we get the same results as with 4 im/gpu on 8 cards. If we increase the total batch size, e.g. by training with 4 im/gpu on 16 cards, we observe that the model converges slower but with longer training it catches up.

from detr.

gaopengcuhk avatar gaopengcuhk commented on August 23, 2024

I tried scaling up the learning rate and backbone learning rate from 1e-4/1e-5 => 3e-4/3e-5 when training with 24GPU. The mAP is always zero. Can you give any suggestion about learning rate scaling law?

from detr.

gaopengcuhk avatar gaopengcuhk commented on August 23, 2024

similar answer here :#46

Keep the learning rate unchanged for all GPU configuration.

from detr.

gaopengcuhk avatar gaopengcuhk commented on August 23, 2024

Hi, I observe the same thing.
2 im/GPU 8 cards will get better results than 2im/GPU 16 cards for the same epoch. I guess 16 cards will finally catch up. I will update the results when I finished the full training.

from detr.

lld533 avatar lld533 commented on August 23, 2024

@gaopengcuhk depends on the total batch size, for example if we keep total batch size of 32 images with 2 im/gpu on 16 cards we get the same results as with 4 im/gpu on 8 cards. If we increase the total batch size, e.g. by training with 4 im/gpu on 16 cards, we observe that the model converges slower but with longer training it catches up.

Hi, could you please share your gpu model and how many GPU memory (in MB) are actually used on each GPU card to train with 2 im/GPU? Many thanks!

from detr.

advdfacd avatar advdfacd commented on August 23, 2024

@gaopengcuhk could you pls share if your larger batch size model finally catch up ?

from detr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.