Comments (7)
It seems that the training is still running, but the error pops up constantly.
from mask2former.
I build the docker_image in my own computer, and push the docker_image to the GPU cluster. It works well on my computer, but raise error on the GPU cluster.
and this is my training shell.
#!/bin/bash
source /root/conda/etc/profile.d/conda.sh
conda activate base
which python
nvcc -V
nvidia-smi
echo $CUDA_HOME
echo $TORCH_CUDA_ARCH_LIST
echo $FORCE_CUDA
python -m detectron2.utils.collect_env
cd mask2former/modeling/pixel_decoder/ops
rm -rf build
rm -rf dist
rm -rf MultiScaleDeformableAttention.egg-info
TORCH_CUDA_ARCH_LIST='6.1;6.2;7.0;7.5;8.0;8.6' FORCE_CUDA=1 python setup.py build install
cd /workspace
#python scripts/train_net_video.py --num-gpus 8 --config-file configs/youtubevis_2019/video_maskformer2_R50_bs16_8ep.yaml
#mv /summary/model_final.pth /summary/model_final_2019.pth
#python scripts/train_net_video.py --num-gpus 8 --config-file configs/youtubevis_2021/video_maskformer2_R50_bs16_8ep.yaml MODEL.WEIGHTS /summary/model_final_2019.pth
#mv /summary/model_final.pth /summary/model_final_2021.pth
python scripts/train_net_video.py --num-gpus 2 --config-file configs/youtubevis_2019/video_maskformer2_R50_bs16_8ep.yaml SOLVER.MAX_ITER 1
mv /summary/model_final.pth /summary/model_final_2019.pth
python scripts/train_net_video.py --num-gpus 2 --config-file configs/youtubevis_2021/video_maskformer2_R50_bs16_8ep.yaml MODEL.WEIGHTS /summary/model_final_2019.pth SOLVER.MAX_ITER 1
mv /summary/model_final.pth /summary/model_final_2021.pth
from mask2former.
#54 seems to find the problem
from mask2former.
with the same problem!!!
nvidia:gtx 3090 ti
cuda:11.1
pytorch version: 1.8, 1.9, 1.10, 1.11。
from mask2former.
Has the problem been solved? I have the same problem.
from mask2former.
What is the status of this?
Does anyone have the docker file/image for Mask2Former? @9p15p
from mask2former.
@9p15p would it be possible for you to share the docker file?
from mask2former.
Related Issues (20)
- Installation expects CUDA_HOME on Apple Silicon Macs HOT 1
- How to understand the output of different tasks
- Using ground truth masks instead of the predicted ones
- No module named 'MultiScaleDeformableAttention', Please compile MultiScaleDeformableAttention CUDA op HOT 2
- As for training, how long does it take?
- HAVE ANYONE MEET SUCH ERROR WHEN TRAINING ON OWN DATASET HOT 1
- batch_size doesn't affect evaluation
- how use custom pre-trained backbone in mask2former HOT 1
- why swin accept different input size
- loading swintransformer
- Ambiguous checkpoint key error when running train_net.py HOT 1
- difference among different mode
- Prebuilt wheels provided via 3rd party repository
- Using COCO for the dataset, what is the appropriate adjustment for learning rate if using a single GPU
- Run in colab seems that there's a ModuleNotFoundError related to the MultiScaleDeformableAttention module.
- Poor Output image quality
- Mask loss with soft labels
- Custom dataset registration to use a model trained on Cityscapes for semantic segmentation.
- How should I fix the input size during testing? HOT 3
- Could you please let me know if anyone has successfully trained using the YouTube VIS 2021 dataset? How should the dataset be formatted?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mask2former.