Git Product home page Git Product logo

Comments (7)

9p15p avatar 9p15p commented on July 19, 2024

It seems that the training is still running, but the error pops up constantly.

from mask2former.

9p15p avatar 9p15p commented on July 19, 2024

I build the docker_image in my own computer, and push the docker_image to the GPU cluster. It works well on my computer, but raise error on the GPU cluster.

and this is my training shell.

#!/bin/bash

source /root/conda/etc/profile.d/conda.sh
conda activate base
which python

nvcc -V
nvidia-smi
echo $CUDA_HOME
echo $TORCH_CUDA_ARCH_LIST
echo $FORCE_CUDA
python -m detectron2.utils.collect_env

cd mask2former/modeling/pixel_decoder/ops
rm -rf build
rm -rf dist
rm -rf MultiScaleDeformableAttention.egg-info
TORCH_CUDA_ARCH_LIST='6.1;6.2;7.0;7.5;8.0;8.6' FORCE_CUDA=1 python setup.py build install
cd /workspace

#python scripts/train_net_video.py --num-gpus 8 --config-file configs/youtubevis_2019/video_maskformer2_R50_bs16_8ep.yaml
#mv /summary/model_final.pth /summary/model_final_2019.pth
#python scripts/train_net_video.py --num-gpus 8 --config-file configs/youtubevis_2021/video_maskformer2_R50_bs16_8ep.yaml MODEL.WEIGHTS /summary/model_final_2019.pth
#mv /summary/model_final.pth /summary/model_final_2021.pth

python scripts/train_net_video.py --num-gpus 2 --config-file configs/youtubevis_2019/video_maskformer2_R50_bs16_8ep.yaml SOLVER.MAX_ITER 1
mv /summary/model_final.pth /summary/model_final_2019.pth
python scripts/train_net_video.py --num-gpus 2 --config-file configs/youtubevis_2021/video_maskformer2_R50_bs16_8ep.yaml MODEL.WEIGHTS /summary/model_final_2019.pth SOLVER.MAX_ITER 1
mv /summary/model_final.pth /summary/model_final_2021.pth

from mask2former.

bowenc0221 avatar bowenc0221 commented on July 19, 2024

#54 seems to find the problem

from mask2former.

liuzhihui2046 avatar liuzhihui2046 commented on July 19, 2024

with the same problem!!!
nvidia:gtx 3090 ti
cuda:11.1
pytorch version: 1.8, 1.9, 1.10, 1.11。

from mask2former.

kunjing96 avatar kunjing96 commented on July 19, 2024

Has the problem been solved? I have the same problem.

from mask2former.

Robotatron avatar Robotatron commented on July 19, 2024

What is the status of this?
Does anyone have the docker file/image for Mask2Former? @9p15p

from mask2former.

Robotatron avatar Robotatron commented on July 19, 2024

@9p15p would it be possible for you to share the docker file?

from mask2former.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.