Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-ho

Loss increases during pretraining about mdetr HOT 4 CLOSED

ashkamath commented on August 30, 2024

Loss increases during pretraining

from mdetr.

Comments (4)

alcinos commented on August 30, 2024

Hi @mmaaz60
Thank you for your interest in MDETR.
It looks like you training diverged. Can I ask how many gpus you used?

from mdetr.

mmaaz60 commented on August 30, 2024

Hi @mmaaz60
Thank you for your interest in MDETR.
It looks like you training diverged. Can I ask how many gpus you used?

Thank You @alcinos,

I used 32 GPUs with batch_size of 2 per GPU.

from mdetr.

alcinos commented on August 30, 2024

Hum that’s quite surprising then. Nothing fishy happened, like the job getting preempted then restarted?
Are you sure you have the correct transformers version?
Otherwise mb try with a slightly smaller lr?

from mdetr.

mmaaz60 commented on August 30, 2024

Thank You

Hum that’s quite surprising then. Nothing fishy happened, like the job getting preempted then restarted?

Nothing such happened during training

Are you sure you have the correct transformers version?

I am using transformers version 4.5.1

Otherwise mb try with a slightly smaller lr?

I actually stopped and then resumed the training from the 19th epoch and now it reaches to 25th epoch and seems to be converging. Not sure what went wrong previously as I didn't change anything when resuming.

from mdetr.

Loss increases during pretraining about mdetr HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent