Comments (10)
Based on above comments, I also find that this case happens when some images have no gt. Below is my modification, which can somehow reduce the influence. It can still use jit most of the time and don't need to modify the dataset.
from mask2former.
I created a new running environment for mask2former according to the steps. When I train the COCO dataset, I can train normally, but when I train my dataset, I encounter the following problems.
I've been looking for a solution on Google for a long time, so I'd like to ask if you have any similar problems. Thank you very much for your reply.
If your custom training set includes zero annotation images, this error would show up.
from mask2former.
Does it work if you use batch_dice_loss
instead of batch_dice_loss_jit
?
from mask2former.
Just to add to the comments with my personal experience regarding the issue. Seems like even if your data doesn't contain empty annotations, the codes would still throw out the same bug(I manually removed all the images/annotations for ade20k that do not have labels and encountered the same bug).
Seems like it also has something to do with the pytorch version. For v1.10 the bug appears, but when I down-grade to 1.9.1, it runs like a charm, and the training time is reduced for a few hours.
Hope this could help!
from mask2former.
if I use batch_dice_loss
, that's worked.
from mask2former.
Hi @bowenc0221, I met the same issue when training the model on YouTube-VIS dataset (with official code). Do we need to turn JIT off or is there some way to fix this RuntimeError issue? Thanks!
from mask2former.
Hi @bowenc0221, I met the same issue when training the model on YouTube-VIS dataset (with official code). Do we need to turn JIT off or is there some way to fix this RuntimeError issue? Thanks!
I have never met this error. I think it is not necessary to use the JIT version of the loss function, so turning it off is the simplest solution. If you really want to fix this error, I would suggest posting it to the PyTorch team for help.
from mask2former.
Hi @bowenc0221, I met the same issue when training the model on YouTube-VIS dataset (with official code). Do we need to turn JIT off or is there some way to fix this RuntimeError issue? Thanks!
I have never met this error. I think it is not necessary to use the JIT version of the loss function, so turning it off is the simplest solution. If you really want to fix this error, I would suggest posting it to the PyTorch team for help.
Thank you!
from mask2former.
Hi @bowenc0221, I met the same issue when training the model on YouTube-VIS dataset (with official code). Do we need to turn JIT off or is there some way to fix this RuntimeError issue? Thanks!
I have never met this error. I think it is not necessary to use the JIT version of the loss function, so turning it off is the simplest solution. If you really want to fix this error, I would suggest posting it to the PyTorch team for help.
Thank you!
I did not use the JIT version, and the The training speed becomes very slow. Do you have this question?
from mask2former.
I would like to know how to turn off the batch_dice_loss , is it a config change in the Mask2Former , or I need to go and change the code itself? For Now, I have made changes in the matcher.py file and changed the code to call batch_dice_loss in place of batch_dice_loss_jit. same replacement I have done for batch_sigmoid_ce_loss_jit as well. I wanted to validate the changes with you. My training ran fine after these changes.
from mask2former.
Related Issues (20)
- Installation expects CUDA_HOME on Apple Silicon Macs HOT 1
- How to understand the output of different tasks
- Using ground truth masks instead of the predicted ones
- No module named 'MultiScaleDeformableAttention', Please compile MultiScaleDeformableAttention CUDA op HOT 2
- As for training, how long does it take?
- HAVE ANYONE MEET SUCH ERROR WHEN TRAINING ON OWN DATASET HOT 1
- batch_size doesn't affect evaluation
- how use custom pre-trained backbone in mask2former HOT 1
- why swin accept different input size
- loading swintransformer
- Ambiguous checkpoint key error when running train_net.py HOT 1
- difference among different mode
- Prebuilt wheels provided via 3rd party repository
- Using COCO for the dataset, what is the appropriate adjustment for learning rate if using a single GPU
- Run in colab seems that there's a ModuleNotFoundError related to the MultiScaleDeformableAttention module.
- Poor Output image quality
- Mask loss with soft labels
- Custom dataset registration to use a model trained on Cityscapes for semantic segmentation.
- How should I fix the input size during testing? HOT 3
- Could you please let me know if anyone has successfully trained using the YouTube VIS 2021 dataset? How should the dataset be formatted?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mask2former.