Light

henghuiding / mevis Goto Github PK

[ICCV 2023] MeViS: A Large-scale Benchmark for Video Segmentation with Motion Expressions

Home Page: https://henghuiding.github.io/MeViS/

License: MIT License

Shell 0.11% Python 89.67% C++ 1.02% Cuda 9.20%

multimodal-learning referring-expression-comprehension referring-expression-segmentation referring-video-object-segmentation video-understanding mevis-dataset mose-dataset

mevis's Introduction

Hi there 👋

🔭 Researcher woking on Computer Vision and Artificial Intelligence
🌎 Shanghai, China

Links:

Website

Google Scholar

mevis's People

Contributors

Stargazers

Watchers

Forkers

cvrose eltociear ai-machine-vision-lab ntucver ethcver saulocatharino xjtu-zdh songyang86 findrealworld kang-jaehyun whuhxb longmalongma sailfish009 xemcerk ali2500 bbbiiinnn kumuji cv-seg

mevis's Issues

CodaLab submission

Please kindly follow the submission guidelines. If you need any help or encounter any issues, do not hesitate to raise here.

MultiScaleDeformableAttention import error

I'm using Python 3.8 with PyTorch 1.9 and Cuda 11.1. I have already set cuda_home as such:

export CUDA_HOME=/mnt/slurm_home/remelias/anaconda3/envs/vita30/
cd /mnt/slurm_home/remelias/MeViS-main/mask2former/modeling/pixel_decoder/ops/
sh make.sh

I'm still getting MSDA import error.

Traceback (most recent call last):
line 22, in
import MultiScaleDeformableAttention as MSDA
ImportError: /mnt/slurm_home/remelias/anaconda3/envs/vita30/lib/python3.8/site-packages/MultiScaleDeformableAttention-1.0-py3.8-linux-x86_64.egg/MultiScaleDeformableAttention.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZNK2at10TensorBase8data_ptrIdEEPT_v

How can I resolve this?

results on referformer

Thanks for the excellent work. In table 5 in the paper, the ReferFormer reaches 31.0 J&F on your dataset and how are the results obtained? Is it directly evaluated on your validation set without training (i.e. directly using pretrained referformer) or evaluated after training with the training set?

Is lmpm semantic segmentation model or

instance segmentation model?

Hardware Information

Hello, could you share information about the hardware you are using？I couldn't find it in the paper.

there are no 'expressions' in meta_valid.json

what you are talking about, the meta_expressions.json, is rename from meta_valid.json? i rename meta_valid.json to meta_expressions.json, however, the code throw this exception:

File "/ai/home/project/MeViS-main/lmpm/data/datasets/mevis.py", line 66, in load_mevis_json
for exp_id, exp_dict in vid_data['expressions'].items():
KeyError: 'expressions'

what should i do next?

Shape cannot match the size during training

During the training, in the part of backbone, I got this error:

File "/root/MeViS/mask2former/modeling/backbone/swin.py", line 694, in forward
value = value.reshape(B, self.num_heads, self.value_channels//self.num_heads, n_l)
RuntimeError: shape '[24, 1, 96, 40]' is invalid for input of size 368640

this happened in the part of SpatialImageLanguageAttention, I found num_heads is 1, so this is not a MultiheadAttention right?
but I don't know whether the shape or the size is wrong, so what is the expected shape or size?

and the full error message is below:
Traceback (most recent call last):
File "train_net_lmpm.py", line 318, in
launch(
File "/root/anaconda3/envs/torch1/lib/python3.8/site-packages/detectron2/engine/launch.py", line 69, in launch
mp.start_processes(
File "/root/anaconda3/envs/torch1/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/root/anaconda3/envs/torch1/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/root/anaconda3/envs/torch1/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/root/anaconda3/envs/torch1/lib/python3.8/site-packages/detectron2/engine/launch.py", line 123, in _distributed_worker
main_func(*args)
File "/root/MeViS/train_net_lmpm.py", line 312, in main
return trainer.train()
File "/root/anaconda3/envs/torch1/lib/python3.8/site-packages/detectron2/engine/defaults.py", line 484, in train
super().train(self.start_iter, self.max_iter)
File "/root/anaconda3/envs/torch1/lib/python3.8/site-packages/detectron2/engine/train_loop.py", line 155, in train
self.run_step()
File "/root/anaconda3/envs/torch1/lib/python3.8/site-packages/detectron2/engine/defaults.py", line 494, in run_step
self._trainer.run_step()
File "/root/anaconda3/envs/torch1/lib/python3.8/site-packages/detectron2/engine/train_loop.py", line 494, in run_step
loss_dict = self.model(data)
File "/root/anaconda3/envs/torch1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/root/anaconda3/envs/torch1/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 886, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/root/anaconda3/envs/torch1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/root/MeViS/lmpm/lmpm_model.py", line 281, in forward
return self.train_model(batched_inputs)
File "/root/MeViS/lmpm/lmpm_model.py", line 312, in train_model
features = self.backbone(images.tensor, lang_feat_sentence, lang_mask)
File "/root/anaconda3/envs/torch1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/root/MeViS/mask2former/modeling/backbone/swin.py", line 785, in forward
y = super().forward(x, l, l_mask)
File "/root/MeViS/mask2former/modeling/backbone/swin.py", line 470, in forward
x_out, H, W, x, Wh, Ww = layer(x, Wh, Ww, l, l_mask)
File "/root/anaconda3/envs/torch1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/root/MeViS/mask2former/modeling/backbone/swin.py", line 590, in forward
x_residual = self.fusion(x, l, l_mask)
File "/root/anaconda3/envs/torch1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(input, **kwargs)
File "/root/MeViS/mask2former/modeling/backbone/swin.py", line 627, in forward
lang = self.image_lang_att(x, l, l_mask) # (B, HW, dim)
File "/root/anaconda3/envs/torch1/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/root/MeViS/mask2former/modeling/backbone/swin.py", line 694, in forward
value = value.reshape(B, self.num_heads, self.value_channels//self.num_heads, n_l)
RuntimeError: shape '[24, 1, 96, 40]' is invalid for input of size 368640

Issue with Detectron2 config file

I have installed detectron2 and all other libraries in a python virtualenv and am facing AssertionError: Config file '' does not exist! when running train_net_lmpm.py. I've attached the error below.

Mask GroundTruth

I download the mask_dict.json from the google cloud, however, there seems to be some Encoding error， could you please provide a Annotation folder such as Youtube-vos and Davis?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.