Comments (9)
@gaopengcuhk we didn't scale the learning rate for our experiments, we found out that by using Adam it was ok to use the same default values for all configurations (even if using 64 GPUs).
The linear scaling rule is definitely too aggressive, and the model will probably not train at all with it. If you want to try some scaling rule for the learning rate, using the square-root scaling could potentially work (so increase batch size by 2, multiply learning rate by sqrt(2))
I believe I've answered your question, and as such I'm closing the issue, but let us know if you have futher questions.
from detr.
@szagoruyko I tested train DETR for 150 epochs with 8V100 GPUs and 8V100 GPUs * 4 nodes setting, with the learning rate unchanged. However, there is still a performance gap.
GPU config | AP |
---|---|
8 | 39.9 |
8 * 4 | 38.4 |
8 * 8 | running |
Did you have similar observation? Or the gap will diminish in 300 epoch setting?
from detr.
If you keep the learning rate unchanged, the performance of 16GPU is worse than 8GPU at the same epoch, right?
from detr.
@gaopengcuhk depends on the total batch size, for example if we keep total batch size of 32 images with 2 im/gpu on 16 cards we get the same results as with 4 im/gpu on 8 cards. If we increase the total batch size, e.g. by training with 4 im/gpu on 16 cards, we observe that the model converges slower but with longer training it catches up.
from detr.
I tried scaling up the learning rate and backbone learning rate from 1e-4/1e-5 => 3e-4/3e-5 when training with 24GPU. The mAP is always zero. Can you give any suggestion about learning rate scaling law?
from detr.
similar answer here :#46
Keep the learning rate unchanged for all GPU configuration.
from detr.
Hi, I observe the same thing.
2 im/GPU 8 cards will get better results than 2im/GPU 16 cards for the same epoch. I guess 16 cards will finally catch up. I will update the results when I finished the full training.
from detr.
@gaopengcuhk depends on the total batch size, for example if we keep total batch size of 32 images with 2 im/gpu on 16 cards we get the same results as with 4 im/gpu on 8 cards. If we increase the total batch size, e.g. by training with 4 im/gpu on 16 cards, we observe that the model converges slower but with longer training it catches up.
Hi, could you please share your gpu model and how many GPU memory (in MB) are actually used on each GPU card to train with 2 im/GPU? Many thanks!
from detr.
@gaopengcuhk could you pls share if your larger batch size model finally catch up ?
from detr.
Related Issues (20)
- Question about object queries. HOT 4
- I want to train the DETR model on a CPU. How can I make it possible on a small computer, 8gb RAM HOT 3
- Why positional encoding is added to different role in encoder and decoder. HOT 1
- 🐛 Bug: Architecture diagram in README.md renders incorrectly when using dark mode
- continue training with chekckpoint
- How to finetune DETR for semantic segmentation task?
- I do not understand what the mask meaning in "samlpes"
- Process finished with exit code 137 (interrupted by signal 9: SIGKILL)Please read & provide the following
- Very low performance for segmentation task.
- box_cxcywh_to_xyxy
- ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 6 (pid: 257736) of binary: /home/public/anaconda3/envs/DL/bin/python
- Average Precision of each class for best epoch and then it's mean HOT 1
- the mAP is chage
- I think there are some errors in the posted code HOT 6
- Queries for images with low number of objects HOT 2
- RuntimeError: Error(s) in loading state_dict for DETRsegm: HOT 2
- Map metrics anomalies after backbone replacement
- when the trained model is used for inference this import error comes: RuntimeError: Failed to import transformers.models.detr.modeling_detr because of the following error (look up to see its traceback): cannot import name 'experimental_functions_run_eagerly' from 'tensorflow.python.eager.def_function' (C:\Anaconda\lib\site-packages\tensorflow\python\eager\def_function.py)
- Get Image masks coordinates.
- GFLOPs instead of GFLOPS?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from detr.