A training problem about Global alloc not supported yet about mask2former HOT 10 CLOSED

facebookresearch commented on July 19, 2024

A training problem about Global alloc not supported yet

from mask2former.

Comments (10)

Jianghanxiao commented on July 19, 2024 5

Based on above comments, I also find that this case happens when some images have no gt. Below is my modification, which can somehow reduce the influence. It can still use jit most of the time and don't need to modify the dataset.

from mask2former.

ShijieVVu commented on July 19, 2024 3

I created a new running environment for mask2former according to the steps. When I train the COCO dataset, I can train normally, but when I train my dataset, I encounter the following problems.

I've been looking for a solution on Google for a long time, so I'd like to ask if you have any similar problems. Thank you very much for your reply.

If your custom training set includes zero annotation images, this error would show up.

from mask2former.

bowenc0221 commented on July 19, 2024 1

Does it work if you use batch_dice_loss instead of batch_dice_loss_jit?

from mask2former.

YellowPig-zp commented on July 19, 2024 1

Just to add to the comments with my personal experience regarding the issue. Seems like even if your data doesn't contain empty annotations, the codes would still throw out the same bug(I manually removed all the images/annotations for ade20k that do not have labels and encountered the same bug).

Seems like it also has something to do with the pytorch version. For v1.10 the bug appears, but when I down-grade to 1.9.1, it runs like a charm, and the training time is reduced for a few hours.

Hope this could help!

from mask2former.

xiehousen commented on July 19, 2024

if I use batch_dice_loss, that's worked.

from mask2former.

haotian-liu commented on July 19, 2024

Hi @bowenc0221, I met the same issue when training the model on YouTube-VIS dataset (with official code). Do we need to turn JIT off or is there some way to fix this RuntimeError issue? Thanks!

from mask2former.

bowenc0221 commented on July 19, 2024

Hi @bowenc0221, I met the same issue when training the model on YouTube-VIS dataset (with official code). Do we need to turn JIT off or is there some way to fix this RuntimeError issue? Thanks!

I have never met this error. I think it is not necessary to use the JIT version of the loss function, so turning it off is the simplest solution. If you really want to fix this error, I would suggest posting it to the PyTorch team for help.

from mask2former.

haotian-liu commented on July 19, 2024

Hi @bowenc0221, I met the same issue when training the model on YouTube-VIS dataset (with official code). Do we need to turn JIT off or is there some way to fix this RuntimeError issue? Thanks!

I have never met this error. I think it is not necessary to use the JIT version of the loss function, so turning it off is the simplest solution. If you really want to fix this error, I would suggest posting it to the PyTorch team for help.

Thank you!

from mask2former.

xiehousen commented on July 19, 2024

Hi @bowenc0221, I met the same issue when training the model on YouTube-VIS dataset (with official code). Do we need to turn JIT off or is there some way to fix this RuntimeError issue? Thanks!

I have never met this error. I think it is not necessary to use the JIT version of the loss function, so turning it off is the simplest solution. If you really want to fix this error, I would suggest posting it to the PyTorch team for help.

Thank you!

I did not use the JIT version, and the The training speed becomes very slow. Do you have this question?

from mask2former.

deeptig84 commented on July 19, 2024

Hi @bowenc0221 bowenc0221

I would like to know how to turn off the batch_dice_loss , is it a config change in the Mask2Former , or I need to go and change the code itself? For Now, I have made changes in the matcher.py file and changed the code to call batch_dice_loss in place of batch_dice_loss_jit. same replacement I have done for batch_sigmoid_ce_loss_jit as well. I wanted to validate the changes with you. My training ran fine after these changes.

from mask2former.

A training problem about Global alloc not supported yet about mask2former HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent