facebookresearch / mask2former Goto Github PK
View Code? Open in Web Editor NEWCode release for "Masked-attention Mask Transformer for Universal Image Segmentation"
License: MIT License
Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"
License: MIT License
is that because of used multiple features and mask attention? the speed seems not very satisfying in terms of some practicle scenarios.
I am running inference using COCO config. How can I get the object label of each class in the way that it is displayed on a visualized output image? Basically looking for some mapping from category_id to label.
During inference, I got category_id
s greater than 91 so I thought the standard COCO mapping won't work.
Hi,
do you guys have information on GPU usage during inference ?
Thank you
Hi thank you for your excellent work. I meet a problem when re-run your experiments.
When I re-train the Instance segmentation model with R-50 on COCO dataset, the results are:
43.5, 23.0, 47.0, 65.1
43.2, 22.7, 46.4, 64.8
which are a bit lower with your reported number:
43.7, 23.4, 47.2, 64.8
I use the standard configuration file to run the experiments without any modification, and run on 4/8 V-100 cards. I do not know whether it's just a common scenario, or did you meet the same problem during training?
Hi, first, thank you for your work it works really well on my custom dataset and the robustness to occlusion is impressive!
I have questions regarding object detection precision. If using mask R-CNN to detect objects, it gives AP bb and AP segm. For the same dataset, the AP segm obtained with Mask2Former is better than the one with Mask R-CNN. However, the AP bb of Mask R-CNN is higher than AP segm of Mask2Former.
My questions are:
Hi,
Could you please share the training logs for models as well? Its common now to share them DeiT and would help in debugging in reproducing. Even just for say, COCO on R50 model.
Best,
Kartik
Hi, I really enjoyed reading Mask2Former paper.
Could 3d image be used for training if appropriate modification is done on code?
Regards,
Tae
Hi, thank you for sharing your work, works greatly on my dataset.
I don't get if the model uses the ground truth of the bounding boxes:
for example, let's say we are working with the Coco dataset, would it change anything in the training phase if we switched every annotations[n].segments_info[m].bbox
into [0,0,0,0]
?
Thanks for your code. I find that I can save GPU memory with modifying MODEL.MASK_FORMER.NUM_OBJECT_QUERIES when running demo.py. But after modifying that, I can't load pretrained model when training. Can you give me Any suggestion about this?
Thanks for your great work!
I added bounding box head to Mask2Former model like DETR.
(parallel with mask label)
But the performance is not good.
Do you think Mask2Former architecture is not good for detecting bounding box?
If you have any idea or intuition please tell me.
Thanks a lot!
Hello,
Thank you for this project and code. I'm running a custom semantic segmentation training job (based on this config) with one class (custom class 'AT') and for some reason my validation accuracy after an epoch is always 100:
[12/21 21:20:26 d2.evaluation.evaluator]: Total inference time: 0:00:44.703675 (0.065935 s / iter per device, on 1 devices)
[12/21 21:20:26 d2.evaluation.evaluator]: Total inference pure compute time: 0:00:37 (0.055430 s / iter per device, on 1 devices)
[12/21 21:20:26 d2.evaluation.sem_seg_evaluation]: OrderedDict([('sem_seg', {'mIoU': 100.0, 'fwIoU': 100.0, 'IoU-AT': 100.0, 'mACC': 100.0, 'pACC': 100.0, 'ACC-AT': 100.0})])
[12/21 21:20:26 d2.engine.defaults]: Evaluation results for ade20k_full_sem_seg_val in csv format:
[12/21 21:20:26 d2.evaluation.testing]: copypaste: Task: sem_seg
[12/21 21:20:26 d2.evaluation.testing]: copypaste: mIoU,fwIoU,mACC,pACC
[12/21 21:20:26 d2.evaluation.testing]: copypaste: 100.0000,100.0000,100.0000,100.0000
When I view the loss curves in tensorboard it seems like the model is learning so I'm not sure what's going on:
Here's the full config:
Any ideas?
Thank you
Hi,
Thank you for sharing such a good work! I have a question regarding to the loss. I found if I use more frames during training, the loss goes very high. Do I need to scale down the lr linearly according to the sampling frame num? Thank you.
Hi,
I run Mask2Former on ADE (maskformer2_swin_small_bs16_160k.yaml) with 4 16GB V-100 GPUs. However, I can only achieve 49.6%, which is much worse than the reported result (51.3%). Could you provide the log for me to analysize the result?
Thanks
I created a new running environment for mask2former according to the steps. When I train the COCO dataset, I can train normally, but when I train my dataset, I encounter the following problems.
I've been looking for a solution on Google for a long time, so I'd like to ask if you have any similar problems. Thank you very much for your reply.
Hi,
Thank you for sharing such a good work! I have a simple question about the implementation of the mask2former for vis. You mentioned that you use a T=2 during training in the report. Did you keep the same setting in the inference stage? It's an IoU tracker to keep the instance id consistent like Vip-deeplab?
got the answer, I didn't read report carefully
Hi,
I have a problem trying to use the demo with Ade20k Panoptic Segmentation. The command used is:
python demo.py --config-file ../configs/ade20k/panoptic-segmentation/maskformer2_R50_bs16_160k.yaml \
--video-input ... \
--output ... \
--opts MODEL.WEIGHTS ../models/model_final_5c90d4.pkl
And the stack trace is:
File "demo.py", line 182, in <module>
for vis_frame in tqdm.tqdm(demo.run_on_video(video), total=num_frames):
File "/home/master/Develop/Mask2Former/lib/python3.8/site-packages/tqdm/std.py", line 1180, in __iter__
for obj in iterable:
File "/home/master/Develop/Mask2Former/demo/predictor.py", line 130, in run_on_video
yield process_predictions(frame, self.predictor(frame))
File "/home/master/Develop/Mask2Former/demo/predictor.py", line 94, in process_predictions
vis_frame = video_visualizer.draw_panoptic_seg_predictions(
File "/home/master/Develop/Mask2Former/lib/python3.8/site-packages/detectron2/utils/video_visualizer.py", line 172, in draw_panoptic_seg_predictions
labels = [self.metadata.thing_classes[k] for k in category_ids]
File "/home/master/Develop/Mask2Former/lib/python3.8/site-packages/detectron2/utils/video_visualizer.py", line 172, in <listcomp>
labels = [self.metadata.thing_classes[k] for k in category_ids]
IndexError: list index out of range
I think I have found the source of the problem in the lines
thing_classes = [k["name"] for k in ADE20K_150_CATEGORIES if k["isthing"] == 1]
thing_colors = [k["color"] for k in ADE20K_150_CATEGORIES if k["isthing"] == 1]
of mask2former/data/datasets/register_ade20k_panoptic.py
And from my understanding it happens because Detectron2 seems to use the id as the index, but these lines remove some items and the index to id mapping is lost.
Changing the lines to
thing_classes = [k["name"] for k in ADE20K_150_CATEGORIES]
thing_colors = [k["color"] for k in ADE20K_150_CATEGORIES]
seems to work, but I don't know if there are any undesired consequences.
I have installed Detectron2 with pip, but the line where the error happens appears to be also in the git version.
Hello all, I am quite confuse with the definition " panoptic_{train,val}2017/ # png annotations " on coco folder structure. When I download the COCO dataset, I couldnt find this folder/dataset. could you please tell me how can I get/generate this folder? I know there is panoptic annotations, but exactly how can I generate the folder. Thank you
Link of Mask2former_r101 for coco panoptic in model zoo is wrong.
Hi Bowen,
I am working on an 8*V100(32G) cluster.
When I use this config for training, it is still out of memory.
python scripts/train_net_video.py \
--num-gpus 8 \
--config-file configs/youtubevis_2021/swin/video_maskformer2_swin_large_IN21k_384_bs16_8ep.yaml
RuntimeError: CUDA out of memory. Tried to allocate 68.00 MiB (GPU 4; 31.75 GiB total capacity; 28.95 GiB already allocated; 11.75 MiB free; 30.10 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I will appreciate it if you can provide more information about GPUs for training.
The original data downloaded from the link is organized as:
ytvis_2021/
{train/valid/test}/
JPEGImages/
instance.json
There are no Annotations as you mentioned like below,
ytvis_2021/
{train,valid,test}.json
{train,valid,test}/
Annotations/
JPEGImages/
How do you evaluation Mask2Former on YouTubeVIS-2021?
Hi Bowen,
Thanks for your excellent work and code! I am retraining the video instance segmentation model on the Youtube VIS 2019 dataset. I managed to train the model, but the quantitative result on the CodaLab turns to be very low (of only about 40).
The command I used for training is:
python3 train_net_video.py \
--config-file configs/youtubevis_2019/swin/video_maskformer2_swin_base_IN21k_384_bs16_8ep.yaml \
--num-gpus 8 \
MODEL.WEIGHTS swin_base_patch4_window12_384_22k.pkl
The backbone weight is got from:
wget https://github.com/SwinTransformer/storage/releases/download/v1.0.0/swin_base_patch4_window12_384_22k.pth
python tools/convert-pretrained-swin-model-to-d2.py swin_base_patch4_window12_384_22k.pth swin_base_patch4_window12_384_22k.pkl
Then I submitted the output/inference/results.json
after the training to CodaLab, but only got 40 accuracy. I also tried to rerun the evaluation using output/model_final.pth
, and the results are almost the same.
The config files are untouched. I am able to reproduce the correct result using the pretrain model, so I assume that my environment and dataset setups are ok. I also checked the tensorboard output and the loss curve looks good. Could you help to check if there is anything wrong with my training process? Thanks!
anyone visualize the attention? where to get the fearure and visualize it.
I fintuned on custom dataset , now, it's time to inspect the model
how to finetuning on custom dataset? is there any config file surport fintune
I‘ve been trying to used it for a nuclei panoptic segmentation task.
Dataset is prepared like ADE20K panoptic do.
However, in evalutaion, it doesn't proposed any instance after a period time of training.
File "/home/---/anaconda3/envs/mask2former/lib/python3.8/site-packages/panopticapi/evaluation.py", line 224, in pq_compute
results[name], per_class_results = pq_stat.pq_average(categories, isthing=isthing)
File "/home/---/anaconda3/envs/mask2former/lib/python3.8/site-packages/panopticapi/evaluation.py", line 73, in pq_average
return {'pq': pq / n, 'sq': sq / n, 'rq': rq / n, 'n': n}, per_class_results
ZeroDivisionError: division by zero
There several possible reasons accounting for it I assume:
prepare_ade20k_sem_seg
, prepare_ade20k_ins_seg
and prepare_ade20k_pan_seg
. Converted the labeled data to panoptic images (in a folder) and label json file. Commented the line "sem_seg_file_name": sem_label_file,
in dataset_dict
.anchor size
or ratio
in panopitc segmentation? Because nuclei in whole slide images (crop multiple patches in size 256*256, with one nuclei around (8~16)*(8~16) pixels) is rather small compared to common things in a natural image captioned by camera.i followed the instruction:
https://github.com/facebookresearch/Mask2Former/blob/main/datasets/README.md and perpare the coco datasets.
I have already run demo successfully but the error occur when i running train scrip:
python train_net.py --num-gpus 8 --config-file configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml
which is followed:
`
[02/01 14:43:34 mask2former.data.dataset_mappers.coco_panoptic_new_baseline_dataset_mapper]: [COCOPanopticNewBaselineDatasetMapper] Full TransformGens used in training: [RandomFlip(), ResizeScale(min_scale=0.1, max_scale=2.0, target_height=1024, target_width=1024), FixedSizeCrop(crop_size=(1024, 1024))]
[02/01 14:43:41 d2.data.build]: Using training sampler TrainingSampler
[02/01 14:43:41 d2.data.common]: Serializing 118287 elements to byte tensors and concatenating them all ...
[02/01 14:43:42 d2.data.common]: Serialized dataset takes 78.29 MiB
[02/01 14:43:51 fvcore.common.checkpoint]: [Checkpointer] Loading from model_final_94dc52.pkl ...
[02/01 14:43:51 fvcore.common.checkpoint]: Reading a file from 'MaskFormer Model Zoo'
WARNING [02/01 14:43:51 mask2former.modeling.transformer_decoder.mask2former_transformer_decoder]: Weight format of MultiScaleMaskedTransformerDecoder have changed! Please upgrade your models. Applying automatic conversion now ...
[02/01 14:43:51 d2.engine.train_loop]: Starting training from iteration 0
Traceback (most recent call last):
File "", line 1, in
File "/opt/conda/lib/python3.9/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/opt/conda/lib/python3.9/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
_pickle.UnpicklingError: pickle data was truncated
Traceback (most recent call last):
File "/home/xt.xie/workspace/code/Mask2Former-main/train_net.py", line 321, in
launch(
File "/home/xt.xie/.local/lib/python3.9/site-packages/detectron2/engine/launch.py", line 67, in launch
mp.spawn(
File "/opt/conda/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/opt/conda/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/opt/conda/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 4 terminated with the following error:
Traceback (most recent call last):
File "/opt/conda/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/xt.xie/.local/lib/python3.9/site-packages/detectron2/engine/launch.py", line 126, in _distributed_worker
main_func(*args)
File "/home/xt.xie/workspace/code/Mask2Former-main/train_net.py", line 315, in main
return trainer.train()
File "/home/xt.xie/.local/lib/python3.9/site-packages/detectron2/engine/defaults.py", line 484, in train
super().train(self.start_iter, self.max_iter)
File "/home/xt.xie/.local/lib/python3.9/site-packages/detectron2/engine/train_loop.py", line 149, in train
self.run_step()
File "/home/xt.xie/.local/lib/python3.9/site-packages/detectron2/engine/defaults.py", line 494, in run_step
self._trainer.run_step()
File "/home/xt.xie/.local/lib/python3.9/site-packages/detectron2/engine/train_loop.py", line 395, in run_step
loss_dict = self.model(data)
File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/torch/nn/parallel/distributed.py", line 705, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/xt.xie/workspace/code/Mask2Former-main/mask2former/maskformer_model.py", line 209, in forward
losses = self.criterion(outputs, targets)
File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/xt.xie/workspace/code/Mask2Former-main/mask2former/modeling/criterion.py", line 222, in forward
indices = self.matcher(outputs_without_aux, targets)
File "/opt/conda/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/xt.xie/workspace/code/Mask2Former-main/mask2former/modeling/matcher.py", line 179, in forward
return self.memory_efficient_forward(outputs, targets)
File "/opt/conda/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/xt.xie/workspace/code/Mask2Former-main/mask2former/modeling/matcher.py", line 122, in memory_efficient_forward
tgt_mask = point_sample(
File "/home/xt.xie/.local/lib/python3.9/site-packages/detectron2/projects/point_rend/point_features.py", line 39, in point_sample
output = F.grid_sample(input, 2.0 * point_coords - 1.0, **kwargs)
File "/opt/conda/lib/python3.9/site-packages/torch/nn/functional.py", line 3836, in grid_sample
return torch.grid_sampler(input, grid, mode_enum, padding_mode_enum, align_corners)
RuntimeError: grid_sampler(): expected input and grid to have same dtype, but input has c10::Half and grid has float
`
Appreciate to your excellent work! I wonder whether you have tried some testing skills like multi-scale testing which may boost the final performance. Or does the implementation of Mask2Former in this repo support multi-scale testing on ytvis 2019/2021?
HI, thanks for your code.
I used instance segmentation to train custom data, but I don't have bbox and bbox score.
I demo my test picture have no bbox too.
Why I just have mask and mask score.
thank you!
Hi, I am following up on your work on video instance segmentation and trying to run experiments on the ytvis_2021 dataset. The original data downloaded from the link is organized as:
{train/valid/test}/
JPEGImages/
instance.json
How should I convert it to the structure you used here? I just copied the instance.json
as train/valid/test.json
, evaluation could run correctly, but there were some file-not-found errors during training. Looks like some videos listed in train/instance.json
are not included in train/JPEGImages/
. What should I do?
Thanks a lot!
I notice that you use the standard ce loss instead of focal loss. Does it have some influence on the result?
Where is the specific position of the formula (2) (page4 of paper) in the code ?
Xl=softmax(Ml-1+QlKl)Vl+Xl-1
I would love to try out this model but I am struggling with installation. The Colab demo does not work either and gives the error:
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
/content/Mask2Former/mask2former/modeling/pixel_decoder/ops/functions/ms_deform_attn_func.py in <module>()
21 try:
---> 22 import MultiScaleDeformableAttention as MSDA
23 except ModuleNotFoundError as e:
ModuleNotFoundError: No module named 'MultiScaleDeformableAttention'
During handling of the above exception, another exception occurred:
ModuleNotFoundError Traceback (most recent call last)
7 frames
/content/Mask2Former/mask2former/modeling/pixel_decoder/ops/functions/ms_deform_attn_func.py in <module>()
27 "\t`sh make.sh`\n"
28 )
---> 29 raise ModuleNotFoundError(info_string)
30
31
ModuleNotFoundError:
Please compile MultiScaleDeformableAttention CUDA op with the following commands:
`cd mask2former/modeling/pixel_decoder/ops`
`sh make.sh`
---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.
To view examples of installing some common dependencies, click the
"Open Examples" button below.
---------------------------------------------------------------------------
When I run demo_video/demo.py to infer my video, it shows "CUDA out of memory". I try to reduce the input size, but it doesn't work, can you tell me how to solve this problem. Thanks!
can you please explain why do we need to compile to cuda kernel for MSDeformAttn?
I see we have a python file for it, I am not understanding why the compilation is needed?
Sorry I am not very familiar with the concept of why it would not work if we use the py functions without compiling
In what scenario I should generally compile the cuda kernel? because I never paid attention to it :/
really appreciate it if you can please explain the reason behind it.
Thanks a ton!
Hi,
Is there a way to get masks in the output of only 1 or 2 specified classes from ADE20k or COCO?
using /data
Preparation done. Between equal marks is user's output:
/root/conda/bin/python
running build
running build_py
running build_ext
building 'MultiScaleDeformableAttention' extension
Emitting ninja build file /workspace/mask2former/modeling/pixel_decoder/ops/build/temp.linux-x86_64-3.7/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
g++ -pthread -shared -B /root/conda/compiler_compat -L/root/conda/lib -Wl,-rpath=/root/conda/lib -Wl,--no-as-needed -Wl,--sysroot=/ /workspace/mask2former/modeling/pixel_decoder/ops/build/temp.linux-x86_64-3.7/workspace/mask2former/modeling/pixel_decoder/ops/src/vision.o /workspace/mask2former/modeling/pixel_decoder/ops/build/temp.linux-x86_64-3.7/workspace/mask2former/modeling/pixel_decoder/ops/src/cpu/ms_deform_attn_cpu.o /workspace/mask2former/modeling/pixel_decoder/ops/build/temp.linux-x86_64-3.7/workspace/mask2former/modeling/pixel_decoder/ops/src/cuda/ms_deform_attn_cuda.o -L/root/conda/lib/python3.7/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda_cu -ltorch_cuda_cpp -o build/lib.linux-x86_64-3.7/MultiScaleDeformableAttention.cpython-37m-x86_64-linux-gnu.so
running install
running bdist_egg
running egg_info
writing MultiScaleDeformableAttention.egg-info/PKG-INFO
writing dependency_links to MultiScaleDeformableAttention.egg-info/dependency_links.txt
writing top-level names to MultiScaleDeformableAttention.egg-info/top_level.txt
reading manifest file 'MultiScaleDeformableAttention.egg-info/SOURCES.txt'
writing manifest file 'MultiScaleDeformableAttention.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
creating build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/functions
copying build/lib.linux-x86_64-3.7/functions/init.py -> build/bdist.linux-x86_64/egg/functions
copying build/lib.linux-x86_64-3.7/functions/ms_deform_attn_func.py -> build/bdist.linux-x86_64/egg/functions
copying build/lib.linux-x86_64-3.7/MultiScaleDeformableAttention.cpython-37m-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/egg
creating build/bdist.linux-x86_64/egg/modules
copying build/lib.linux-x86_64-3.7/modules/ms_deform_attn.py -> build/bdist.linux-x86_64/egg/modules
copying build/lib.linux-x86_64-3.7/modules/init.py -> build/bdist.linux-x86_64/egg/modules
byte-compiling build/bdist.linux-x86_64/egg/functions/init.py to init.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/functions/ms_deform_attn_func.py to ms_deform_attn_func.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/modules/ms_deform_attn.py to ms_deform_attn.cpython-37.pyc
byte-compiling build/bdist.linux-x86_64/egg/modules/init.py to init.cpython-37.pyc
creating stub loader for MultiScaleDeformableAttention.cpython-37m-x86_64-linux-gnu.so
byte-compiling build/bdist.linux-x86_64/egg/MultiScaleDeformableAttention.py to MultiScaleDeformableAttention.cpython-37.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying MultiScaleDeformableAttention.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying MultiScaleDeformableAttention.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying MultiScaleDeformableAttention.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying MultiScaleDeformableAttention.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
writing build/bdist.linux-x86_64/egg/EGG-INFO/native_libs.txt
zip_safe flag not set; analyzing archive contents...
pycache.MultiScaleDeformableAttention.cpython-37: module references file
creating 'dist/MultiScaleDeformableAttention-1.0-py3.7-linux-x86_64.egg' and adding 'build/bdist.linux-x86_64/egg' to it
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
Processing MultiScaleDeformableAttention-1.0-py3.7-linux-x86_64.egg
removing '/root/conda/lib/python3.7/site-packages/MultiScaleDeformableAttention-1.0-py3.7-linux-x86_64.egg' (and everything under it)
creating /root/conda/lib/python3.7/site-packages/MultiScaleDeformableAttention-1.0-py3.7-linux-x86_64.egg
Extracting MultiScaleDeformableAttention-1.0-py3.7-linux-x86_64.egg to /root/conda/lib/python3.7/site-packages
MultiScaleDeformableAttention 1.0 is already the active version in easy-install.pth
Installed /root/conda/lib/python3.7/site-packages/MultiScaleDeformableAttention-1.0-py3.7-linux-x86_64.egg
Processing dependencies for MultiScaleDeformableAttention==1.0
Finished processing dependencies for MultiScaleDeformableAttention==1.0
run on: autodrive
DETECTRON2_DATASETS: /data/bolu.ldz/DATASET
Command Line Args: Namespace(config_file='configs/youtubevis_2019/video_maskformer2_R50_bs16_8ep.yaml', dist_url='tcp://127.0.0.1:49152', eval_only=False, machine_rank=0, num_gpus=8, num_machines=1, opts=[], resume=False)
run on: autodrive
DETECTRON2_DATASETS: /data/bolu.ldz/DATASET
[02/22 03:41:38 detectron2]: Rank of current process: 0. World size: 8
[02/22 03:41:40 detectron2]: Environment info:
sys.platform linux
Python 3.7.7 (default, May 7 2020, 21:25:33) [GCC 7.3.0]
numpy 1.19.2
detectron2 0.6 @/root/conda/lib/python3.7/site-packages/detectron2
Compiler GCC 7.3
CUDA compiler CUDA 11.1
detectron2 arch flags 3.7, 5.0, 5.2, 6.0, 6.1, 7.0, 7.5, 8.0, 8.6
DETECTRON2_ENV_MODULE
PyTorch 1.9.0 @/root/conda/lib/python3.7/site-packages/torch
PyTorch debug build False
GPU available Yes
GPU 0,1,2,3,4,5,6,7 GeForce RTX 3090 (arch=8.6)
Driver version 460.73.01
CUDA_HOME /usr/local/cuda
TORCH_CUDA_ARCH_LIST 6.0;6.1;6.2;7.0;7.5
Pillow 8.0.1
torchvision 0.10.0 @/root/conda/lib/python3.7/site-packages/torchvision
torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6
fvcore 0.1.5.post20220212
iopath 0.1.9
cv2 4.1.2
PyTorch built with:
[02/22 03:41:40 detectron2]: Command line arguments: Namespace(config_file='configs/youtubevis_2019/video_maskformer2_R50_bs16_8ep.yaml', dist_url='tcp://127.0.0.1:49152', eval_only=False, machine_rank=0, num_gpus=8, num_machines=1, opts=[], resume=False)
[02/22 03:41:40 detectron2]: Contents of args.config_file=configs/youtubevis_2019/video_maskformer2_R50_bs16_8ep.yaml:
BASE: Base-YouTubeVIS-VideoInstanceSegmentation.yaml
MODEL:
WEIGHTS: 186m"186mmodel_final_3c8ec9.pkl186m"
META_ARCHITECTURE: 186m"186mVideoMaskFormer186m"
SEM_SEG_HEAD:
NAME: 186m"186mMaskFormerHead186m"
IGNORE_VALUE: 255
NUM_CLASSES: 40
LOSS_WEIGHT: 1.0
CONVS_DIM: 256
MASK_DIM: 256
NORM: 186m"186mGN186m"
242m# pixel decoder
PIXEL_DECODER_NAME: 186m"186mMSDeformAttnPixelDecoder186m"
IN_FEATURES: [186m"186mres2186m", 186m"186mres3186m", 186m"186mres4186m", 186m"186mres5186m"]
DEFORMABLE_TRANSFORMER_ENCODER_IN_FEATURES: [186m"186mres3186m", 186m"186mres4186m", 186m"186mres5186m"]
COMMON_STRIDE: 4
TRANSFORMER_ENC_LAYERS: 6
MASK_FORMER:
TRANSFORMER_DECODER_NAME: 186m"186mVideoMultiScaleMaskedTransformerDecoder186m"
TRANSFORMER_IN_FEATURE: 186m"186mmulti_scale_pixel_decoder186m"
DEEP_SUPERVISION: True
NO_OBJECT_WEIGHT: 0.1
CLASS_WEIGHT: 2.0
MASK_WEIGHT: 5.0
DICE_WEIGHT: 5.0
HIDDEN_DIM: 256
NUM_OBJECT_QUERIES: 100
NHEADS: 8
DROPOUT: 0.0
DIM_FEEDFORWARD: 2048
ENC_LAYERS: 0
PRE_NORM: False
ENFORCE_INPUT_PROJ: False
SIZE_DIVISIBILITY: 32
DEC_LAYERS: 10 242m# 9 decoder layers, add one for the loss on learnable query
TRAIN_NUM_POINTS: 12544
OVERSAMPLE_RATIO: 3.0
IMPORTANCE_SAMPLE_RATIO: 0.75
TEST:
SEMANTIC_ON: False
INSTANCE_ON: True
PANOPTIC_ON: False
OVERLAP_THRESHOLD: 0.8
OBJECT_MASK_THRESHOLD: 0.8
[02/22 03:41:40 detectron2]: Running with full config:
CUDNN_BENCHMARK: false
DATALOADER:
ASPECT_RATIO_GROUPING: true
FILTER_EMPTY_ANNOTATIONS: false
NUM_WORKERS: 4
REPEAT_THRESHOLD: 0.0
SAMPLER_TRAIN: TrainingSampler
DATASETS:
PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000
PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000
PROPOSAL_FILES_TEST: []
PROPOSAL_FILES_TRAIN: []
TEST:
[02/22 03:41:40 detectron2]: Full config saved to /summary/config.yaml
[02/22 03:41:40 d2.utils.env]: Using a generated random seed 40230477
[02/22 03:41:45 d2.engine.defaults]: Model:
VideoMaskFormer(
(backbone): ResNet(
(stem): BasicStem(
(conv1): Conv2d(
3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
)
(res2): Sequential(
(0): BottleneckBlock(
(shortcut): Conv2d(
64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv1): Conv2d(
64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
(conv2): Conv2d(
64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
(conv3): Conv2d(
64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
)
(1): BottleneckBlock(
(conv1): Conv2d(
256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
(conv2): Conv2d(
64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
(conv3): Conv2d(
64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
)
(2): BottleneckBlock(
(conv1): Conv2d(
256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
(conv2): Conv2d(
64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=64, eps=1e-05)
)
(conv3): Conv2d(
64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
)
)
(res3): Sequential(
(0): BottleneckBlock(
(shortcut): Conv2d(
256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv1): Conv2d(
256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv2): Conv2d(
128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv3): Conv2d(
128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
)
(1): BottleneckBlock(
(conv1): Conv2d(
512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv2): Conv2d(
128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv3): Conv2d(
128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
)
(2): BottleneckBlock(
(conv1): Conv2d(
512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv2): Conv2d(
128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv3): Conv2d(
128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
)
(3): BottleneckBlock(
(conv1): Conv2d(
512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv2): Conv2d(
128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=128, eps=1e-05)
)
(conv3): Conv2d(
128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
)
)
(res4): Sequential(
(0): BottleneckBlock(
(shortcut): Conv2d(
512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
(conv1): Conv2d(
512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(1): BottleneckBlock(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(2): BottleneckBlock(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(3): BottleneckBlock(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(4): BottleneckBlock(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
(5): BottleneckBlock(
(conv1): Conv2d(
1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv2): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=256, eps=1e-05)
)
(conv3): Conv2d(
256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=1024, eps=1e-05)
)
)
)
(res5): Sequential(
(0): BottleneckBlock(
(shortcut): Conv2d(
1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
(conv1): Conv2d(
1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv2): Conv2d(
512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv3): Conv2d(
512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(1): BottleneckBlock(
(conv1): Conv2d(
2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv2): Conv2d(
512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv3): Conv2d(
512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
(2): BottleneckBlock(
(conv1): Conv2d(
2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv2): Conv2d(
512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=512, eps=1e-05)
)
(conv3): Conv2d(
512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): FrozenBatchNorm2d(num_features=2048, eps=1e-05)
)
)
)
)
(sem_seg_head): MaskFormerHead(
(pixel_decoder): MSDeformAttnPixelDecoder(
(input_proj): ModuleList(
(0): Sequential(
(0): Conv2d(2048, 256, kernel_size=(1, 1), stride=(1, 1))
(1): GroupNorm(32, 256, eps=1e-05, affine=True)
)
(1): Sequential(
(0): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
(1): GroupNorm(32, 256, eps=1e-05, affine=True)
)
(2): Sequential(
(0): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1))
(1): GroupNorm(32, 256, eps=1e-05, affine=True)
)
)
(transformer): MSDeformAttnTransformerEncoderOnly(
(encoder): MSDeformAttnTransformerEncoder(
(layers): ModuleList(
(0): MSDeformAttnTransformerEncoderLayer(
(self_attn): MSDeformAttn(
(sampling_offsets): Linear(in_features=256, out_features=192, bias=True)
(attention_weights): Linear(in_features=256, out_features=96, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
(dropout1): Dropout(p=0.0, inplace=False)
(norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(linear1): Linear(in_features=256, out_features=1024, bias=True)
(dropout2): Dropout(p=0.0, inplace=False)
(linear2): Linear(in_features=1024, out_features=256, bias=True)
(dropout3): Dropout(p=0.0, inplace=False)
(norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
(1): MSDeformAttnTransformerEncoderLayer(
(self_attn): MSDeformAttn(
(sampling_offsets): Linear(in_features=256, out_features=192, bias=True)
(attention_weights): Linear(in_features=256, out_features=96, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
(dropout1): Dropout(p=0.0, inplace=False)
(norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(linear1): Linear(in_features=256, out_features=1024, bias=True)
(dropout2): Dropout(p=0.0, inplace=False)
(linear2): Linear(in_features=1024, out_features=256, bias=True)
(dropout3): Dropout(p=0.0, inplace=False)
(norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
(2): MSDeformAttnTransformerEncoderLayer(
(self_attn): MSDeformAttn(
(sampling_offsets): Linear(in_features=256, out_features=192, bias=True)
(attention_weights): Linear(in_features=256, out_features=96, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
(dropout1): Dropout(p=0.0, inplace=False)
(norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(linear1): Linear(in_features=256, out_features=1024, bias=True)
(dropout2): Dropout(p=0.0, inplace=False)
(linear2): Linear(in_features=1024, out_features=256, bias=True)
(dropout3): Dropout(p=0.0, inplace=False)
(norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
(3): MSDeformAttnTransformerEncoderLayer(
(self_attn): MSDeformAttn(
(sampling_offsets): Linear(in_features=256, out_features=192, bias=True)
(attention_weights): Linear(in_features=256, out_features=96, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
(dropout1): Dropout(p=0.0, inplace=False)
(norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(linear1): Linear(in_features=256, out_features=1024, bias=True)
(dropout2): Dropout(p=0.0, inplace=False)
(linear2): Linear(in_features=1024, out_features=256, bias=True)
(dropout3): Dropout(p=0.0, inplace=False)
(norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
(4): MSDeformAttnTransformerEncoderLayer(
(self_attn): MSDeformAttn(
(sampling_offsets): Linear(in_features=256, out_features=192, bias=True)
(attention_weights): Linear(in_features=256, out_features=96, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
(dropout1): Dropout(p=0.0, inplace=False)
(norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(linear1): Linear(in_features=256, out_features=1024, bias=True)
(dropout2): Dropout(p=0.0, inplace=False)
(linear2): Linear(in_features=1024, out_features=256, bias=True)
(dropout3): Dropout(p=0.0, inplace=False)
(norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
(5): MSDeformAttnTransformerEncoderLayer(
(self_attn): MSDeformAttn(
(sampling_offsets): Linear(in_features=256, out_features=192, bias=True)
(attention_weights): Linear(in_features=256, out_features=96, bias=True)
(value_proj): Linear(in_features=256, out_features=256, bias=True)
(output_proj): Linear(in_features=256, out_features=256, bias=True)
)
(dropout1): Dropout(p=0.0, inplace=False)
(norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(linear1): Linear(in_features=256, out_features=1024, bias=True)
(dropout2): Dropout(p=0.0, inplace=False)
(linear2): Linear(in_features=1024, out_features=256, bias=True)
(dropout3): Dropout(p=0.0, inplace=False)
(norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
)
)
)
(pe_layer): Positional encoding PositionEmbeddingSine
num_pos_feats: 128
temperature: 10000
normalize: True
scale: 6.283185307179586
(mask_features): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1))
(adapter_1): Conv2d(
256, 256, kernel_size=(1, 1), stride=(1, 1), bias=False
(norm): GroupNorm(32, 256, eps=1e-05, affine=True)
)
(layer_1): Conv2d(
256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False
(norm): GroupNorm(32, 256, eps=1e-05, affine=True)
)
)
(predictor): VideoMultiScaleMaskedTransformerDecoder(
(pe_layer): PositionEmbeddingSine3D()
(transformer_self_attention_layers): ModuleList(
(0): SelfAttentionLayer(
(self_attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.0, inplace=False)
)
(1): SelfAttentionLayer(
(self_attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.0, inplace=False)
)
(2): SelfAttentionLayer(
(self_attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.0, inplace=False)
)
(3): SelfAttentionLayer(
(self_attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.0, inplace=False)
)
(4): SelfAttentionLayer(
(self_attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.0, inplace=False)
)
(5): SelfAttentionLayer(
(self_attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.0, inplace=False)
)
(6): SelfAttentionLayer(
(self_attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.0, inplace=False)
)
(7): SelfAttentionLayer(
(self_attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.0, inplace=False)
)
(8): SelfAttentionLayer(
(self_attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.0, inplace=False)
)
)
(transformer_cross_attention_layers): ModuleList(
(0): CrossAttentionLayer(
(multihead_attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.0, inplace=False)
)
(1): CrossAttentionLayer(
(multihead_attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.0, inplace=False)
)
(2): CrossAttentionLayer(
(multihead_attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.0, inplace=False)
)
(3): CrossAttentionLayer(
(multihead_attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.0, inplace=False)
)
(4): CrossAttentionLayer(
(multihead_attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.0, inplace=False)
)
(5): CrossAttentionLayer(
(multihead_attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.0, inplace=False)
)
(6): CrossAttentionLayer(
(multihead_attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.0, inplace=False)
)
(7): CrossAttentionLayer(
(multihead_attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.0, inplace=False)
)
(8): CrossAttentionLayer(
(multihead_attn): MultiheadAttention(
(out_proj): NonDynamicallyQuantizableLinear(in_features=256, out_features=256, bias=True)
)
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.0, inplace=False)
)
)
(transformer_ffn_layers): ModuleList(
(0): FFNLayer(
(linear1): Linear(in_features=256, out_features=2048, bias=True)
(dropout): Dropout(p=0.0, inplace=False)
(linear2): Linear(in_features=2048, out_features=256, bias=True)
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
(1): FFNLayer(
(linear1): Linear(in_features=256, out_features=2048, bias=True)
(dropout): Dropout(p=0.0, inplace=False)
(linear2): Linear(in_features=2048, out_features=256, bias=True)
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
(2): FFNLayer(
(linear1): Linear(in_features=256, out_features=2048, bias=True)
(dropout): Dropout(p=0.0, inplace=False)
(linear2): Linear(in_features=2048, out_features=256, bias=True)
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
(3): FFNLayer(
(linear1): Linear(in_features=256, out_features=2048, bias=True)
(dropout): Dropout(p=0.0, inplace=False)
(linear2): Linear(in_features=2048, out_features=256, bias=True)
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
(4): FFNLayer(
(linear1): Linear(in_features=256, out_features=2048, bias=True)
(dropout): Dropout(p=0.0, inplace=False)
(linear2): Linear(in_features=2048, out_features=256, bias=True)
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
(5): FFNLayer(
(linear1): Linear(in_features=256, out_features=2048, bias=True)
(dropout): Dropout(p=0.0, inplace=False)
(linear2): Linear(in_features=2048, out_features=256, bias=True)
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
(6): FFNLayer(
(linear1): Linear(in_features=256, out_features=2048, bias=True)
(dropout): Dropout(p=0.0, inplace=False)
(linear2): Linear(in_features=2048, out_features=256, bias=True)
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
(7): FFNLayer(
(linear1): Linear(in_features=256, out_features=2048, bias=True)
(dropout): Dropout(p=0.0, inplace=False)
(linear2): Linear(in_features=2048, out_features=256, bias=True)
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
(8): FFNLayer(
(linear1): Linear(in_features=256, out_features=2048, bias=True)
(dropout): Dropout(p=0.0, inplace=False)
(linear2): Linear(in_features=2048, out_features=256, bias=True)
(norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
)
)
(decoder_norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
(query_feat): Embedding(100, 256)
(query_embed): Embedding(100, 256)
(level_embed): Embedding(3, 256)
(input_proj): ModuleList(
(0): Sequential()
(1): Sequential()
(2): Sequential()
)
(class_embed): Linear(in_features=256, out_features=41, bias=True)
(mask_embed): MLP(
(layers): ModuleList(
(0): Linear(in_features=256, out_features=256, bias=True)
(1): Linear(in_features=256, out_features=256, bias=True)
(2): Linear(in_features=256, out_features=256, bias=True)
)
)
)
)
(criterion): Criterion VideoSetCriterion
matcher: Matcher VideoHungarianMatcher
cost_class: 2.0
cost_mask: 5.0
cost_dice: 5.0
losses: ['labels', 'masks']
weight_dict: {'loss_ce': 2.0, 'loss_mask': 5.0, 'loss_dice': 5.0, 'loss_ce_0': 2.0, 'loss_mask_0': 5.0, 'loss_dice_0': 5.0, 'loss_ce_1': 2.0, 'loss_mask_1': 5.0, 'loss_dice_1': 5.0, 'loss_ce_2': 2.0, 'loss_mask_2': 5.0, 'loss_dice_2': 5.0, 'loss_ce_3': 2.0, 'loss_mask_3': 5.0, 'loss_dice_3': 5.0, 'loss_ce_4': 2.0, 'loss_mask_4': 5.0, 'loss_dice_4': 5.0, 'loss_ce_5': 2.0, 'loss_mask_5': 5.0, 'loss_dice_5': 5.0, 'loss_ce_6': 2.0, 'loss_mask_6': 5.0, 'loss_dice_6': 5.0, 'loss_ce_7': 2.0, 'loss_mask_7': 5.0, 'loss_dice_7': 5.0, 'loss_ce_8': 2.0, 'loss_mask_8': 5.0, 'loss_dice_8': 5.0}
num_classes: 40
eos_coef: 0.1
num_points: 12544
oversample_ratio: 3.0
importance_sample_ratio: 0.75
)
[02/22 03:41:45 mask2former_video.data_video.dataset_mapper]: [DatasetMapper] Augmentations used in training: [ResizeShortestEdge(short_edge_length=(360, 480), max_size=1333, sample_style='choice_by_clip', clip_frame_cnt=2), RandomFlip(clip_frame_cnt=2)]
[02/22 03:41:57 mask2former_video.data_video.datasets.ytvis]: Loading /data/bolu.ldz/DATASET/YoutubeVOS2019/train.json takes 12.59 seconds.
[02/22 03:41:57 mask2former_video.data_video.datasets.ytvis]: Loaded 2238 videos in YTVIS format from /data/bolu.ldz/DATASET/YoutubeVOS2019/train.json
[02/22 03:42:05 mask2former_video.data_video.build]: Using training sampler TrainingSampler
[02/22 03:42:19 d2.data.common]: Serializing 2238 elements to byte tensors and concatenating them all ...
[02/22 03:42:19 d2.data.common]: Serialized dataset takes 151.32 MiB
[02/22 03:42:20 fvcore.common.checkpoint]: [Checkpointer] Loading from /data/bolu.ldz/PRETRAINED_WEIGHTS/mask2former/model_final_3c8ec9.pkl ...
[02/22 03:42:22 fvcore.common.checkpoint]: Reading a file from 'MaskFormer Model Zoo'
WARNING [02/22 03:42:22 mask2former_video.modeling.transformer_decoder.video_mask2former_transformer_decoder]: Weight format of VideoMultiScaleMaskedTransformerDecoder have changed! Please upgrade your models. Applying automatic conversion now ...
WARNING [02/22 03:42:22 fvcore.common.checkpoint]: Skip loading parameter 'sem_seg_head.predictor.class_embed.weight' to the model due to incompatible shapes: (81, 256) in the checkpoint but (41, 256) in the model! You might want to double check if this is expected.
WARNING [02/22 03:42:22 fvcore.common.checkpoint]: Skip loading parameter 'sem_seg_head.predictor.class_embed.bias' to the model due to incompatible shapes: (81,) in the checkpoint but (41,) in the model! You might want to double check if this is expected.
WARNING [02/22 03:42:22 fvcore.common.checkpoint]: Skip loading parameter 'criterion.empty_weight' to the model due to incompatible shapes: (81,) in the checkpoint but (41,) in the model! You might want to double check if this is expected.
WARNING [02/22 03:42:22 fvcore.common.checkpoint]: Some model parameters or buffers are not found in the checkpoint:
criterion.empty_weight
sem_seg_head.predictor.class_embed.{bias, weight}
[02/22 03:42:22 d2.engine.train_loop]: Starting training from iteration 0
run on: autodrive
DETECTRON2_DATASETS: /data/bolu.ldz/DATASET
run on: autodrive
DETECTRON2_DATASETS: /data/bolu.ldz/DATASET
run on: autodrive
DETECTRON2_DATASETS: /data/bolu.ldz/DATASET
run on: autodrive
DETECTRON2_DATASETS: /data/bolu.ldz/DATASET
run on: autodrive
DETECTRON2_DATASETS: /data/bolu.ldz/DATASET
run on: autodrive
DETECTRON2_DATASETS: /data/bolu.ldz/DATASET
run on: autodrive
DETECTRON2_DATASETS: /data/bolu.ldz/DATASET
error in ms_deformable_im2col_cuda: no kernel image is available for execution on the device
error in ms_deformable_im2col_cuda: no kernel image is available for execution on the device
Hi,
I successfully followed the installation instructions in INSTALL.md, namely:
conda create --name mask2former python=3.8 -y
conda activate mask2former
conda install pytorch==1.9.0 torchvision==0.10.0 cudatoolkit=11.1 -c pytorch -c nvidia
pip install -U opencv-python
# under your working directory
git clone [email protected]:facebookresearch/detectron2.git
cd detectron2
pip install -e .
pip install git+https://github.com/cocodataset/panopticapi.git
pip install git+https://github.com/mcordts/cityscapesScripts.git
cd ..
git clone [email protected]:facebookresearch/Mask2Former.git
cd Mask2Former
pip install -r requirements.txt
cd mask2former/modeling/pixel_decoder/ops
sh make.sh
However, when running the demo I get the following:
[02/23 09:54:12 detectron2]: Arguments: Namespace(confidence_threshold=0.5, config_file='../configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml', input=['/home/weber/Pictures/man.png'], opts=['MODEL.WEIGHTS', '/media/weber/Ubuntu2/ubuntu2/Human_Pose/code-from-source/Mask2Former/model_final_94dc52.pkl'], output=None, video_input=None, webcam=False)
[02/23 09:54:14 fvcore.common.checkpoint]: [Checkpointer] Loading from /media/weber/Ubuntu2/ubuntu2/Human_Pose/code-from-source/Mask2Former/model_final_94dc52.pkl ...
[02/23 09:54:16 fvcore.common.checkpoint]: Reading a file from 'MaskFormer Model Zoo'
Weight format of MultiScaleMaskedTransformerDecoder have changed! Please upgrade your models. Applying automatic conversion now ...
/mnt/c7dd8318-a1d3-4622-a5fb-3fc2d8819579/CORSMAL/envs/detectron2/lib/python3.8/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /opt/conda/conda-bld/pytorch_1623448278899/work/c10/core/TensorImpl.h:1156.)
return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
/mnt/c7dd8318-a1d3-4622-a5fb-3fc2d8819579/CORSMAL/envs/detectron2/lib/python3.8/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at /opt/conda/conda-bld/pytorch_1623448278899/work/aten/src/ATen/native/BinaryOps.cpp:467.)
return torch.floor_divide(self, other)
error in ms_deformable_im2col_cuda: invalid device function
error in ms_deformable_im2col_cuda: invalid device function
error in ms_deformable_im2col_cuda: invalid device function
error in ms_deformable_im2col_cuda: invalid device function
error in ms_deformable_im2col_cuda: invalid device function
error in ms_deformable_im2col_cuda: invalid device function
[02/23 09:54:17 detectron2]: /home/weber/Pictures/man.png: detected 56 instances in 1.09s
From a web search, it seems that this error occurs when the wrong CUDA is installed. However, I correctly installed cudatoolkit 11.1 from the installation procedure above. What else could be the issue?
FYI - the demo runs fine if I run it on my CPU (using the MODEL.DEVICE cpu)
Hello! Thank you for sharing.
I found some error in inference code (demo.py).
There is non-existent key, --output.
I modified the code for saving, you need to modify the code for saving.
(detectron) hello96min@rvi-node001:~/minseok/Mask2Former/demo$ python3 demo.py --config-file /home/hello96min/minseok/Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_100ep.yaml --input ~/minseok/image-inpainting/datasets_dgm/sample_data/images/images_0.png --opts MODEL.WEIGHTS /home/hello96min/minseok/Mask2Former/configs/coco/panoptic-segmentation/model_final_f07440.pkl --output ./1.png
[12/14 22:37:14 detectron2]: Arguments: Namespace(confidence_threshold=0.5, config_file='/home/hello96min/minseok/Mask2Former/configs/coco/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_100ep.yaml', input=['/home/hello96min/minseok/image-inpainting/datasets_dgm/sample_data/images/images_0.png'], opts=['MODEL.WEIGHTS', '/home/hello96min/minseok/Mask2Former/configs/coco/panoptic-segmentation/model_final_f07440.pkl', '--output', './1.png'], output=None, video_input=None, webcam=False)
Traceback (most recent call last):
File "demo.py", line 106, in <module>
cfg = setup_cfg(args)
File "demo.py", line 40, in setup_cfg
cfg.merge_from_list(args.opts)
File "/home/hello96min/yes/envs/detectron/lib/python3.8/site-packages/fvcore/common/config.py", line 143, in merge_from_list
return super().merge_from_list(cfg_list)
File "/home/hello96min/yes/envs/detectron/lib/python3.8/site-packages/yacs/config.py", line 243, in merge_from_list
_assert_with_logging(subkey in d, "Non-existent key: {}".format(full_key))
File "/home/hello96min/yes/envs/detectron/lib/python3.8/site-packages/yacs/config.py", line 545, in _assert_with_logging
assert cond, msg
AssertionError: Non-existent key: --output
The custom dataset only has one class, so I set the MODEL.ROI_HEADS.NUM_CLASSES and MODEL.RETINANET.NUM_CLASSES both as 1. However, when I evaluate the trained model, an error happened:
File "train_net.py", line 411, in main
res = Trainer.test(cfg, model)
File "/home/chengzhi/PROGRAM/COW_GAME/detectron2/detectron2/engine/defaults.py", line 617, in test
results_i = inference_on_dataset(model, data_loader, evaluator)
File "/home/chengzhi/PROGRAM/COW_GAME/detectron2/detectron2/evaluation/evaluator.py", line 205, in inference_on_dataset
results = evaluator.evaluate()
File "/home/chengzhi/PROGRAM/COW_GAME/detectron2/detectron2/evaluation/coco_evaluation.py", line 206, in evaluate
self._eval_predictions(predictions, img_ids=img_ids)
File "/home/chengzhi/PROGRAM/COW_GAME/detectron2/detectron2/evaluation/coco_evaluation.py", line 241, in _eval_predictions
f"A prediction has class={category_id}, "
AssertionError: A prediction has class=24, but the dataset only has 1 classes and predicted class id should be in [0, 0].
Hi, Can model convert to torchscript?
I try to do, but I got the error.
RuntimeError:
Could not export Python function call 'MSDeformAttnFunction'. Remove calls to Python functions before export. Did you forget to add @script or @script_method annotation? If this is a nn.ModuleList, add it to __constants__:
/home/ubuntu/PycharmProjects/mask2former/venv/Mask2Former/mask2former/modeling/pixel_decoder/ops/modules/ms_deform_attn.py(117): forward
/home/ubuntu/PycharmProjects/mask2former/venv/lib/python3.6/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/home/ubuntu/PycharmProjects/mask2former/venv/lib/python3.6/site-packages/torch/nn/modules/module.py(1102): _call_impl
/home/ubuntu/PycharmProjects/mask2former/venv/Mask2Former/mask2former/modeling/pixel_decoder/msdeformattn.py(124): forward
/home/ubuntu/PycharmProjects/mask2former/venv/lib/python3.6/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/home/ubuntu/PycharmProjects/mask2former/venv/lib/python3.6/site-packages/torch/nn/modules/module.py(1102): _call_impl
/home/ubuntu/PycharmProjects/mask2former/venv/Mask2Former/mask2former/modeling/pixel_decoder/msdeformattn.py(159): forward
/home/ubuntu/PycharmProjects/mask2former/venv/lib/python3.6/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/home/ubuntu/PycharmProjects/mask2former/venv/lib/python3.6/site-packages/torch/nn/modules/module.py(1102): _call_impl
/home/ubuntu/PycharmProjects/mask2former/venv/Mask2Former/mask2former/modeling/pixel_decoder/msdeformattn.py(87): forward
/home/ubuntu/PycharmProjects/mask2former/venv/lib/python3.6/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/home/ubuntu/PycharmProjects/mask2former/venv/lib/python3.6/site-packages/torch/nn/modules/module.py(1102): _call_impl
/home/ubuntu/PycharmProjects/mask2former/venv/Mask2Former/mask2former/modeling/pixel_decoder/msdeformattn.py(324): forward_features
/home/ubuntu/PycharmProjects/mask2former/venv/lib/python3.6/site-packages/torch/autocast_mode.py(198): decorate_autocast
/home/ubuntu/PycharmProjects/mask2former/venv/Mask2Former/mask2former/modeling/meta_arch/mask_former_head.py(119): layers
/home/ubuntu/PycharmProjects/mask2former/venv/Mask2Former/mask2former/modeling/meta_arch/mask_former_head.py(116): forward
/home/ubuntu/PycharmProjects/mask2former/venv/lib/python3.6/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/home/ubuntu/PycharmProjects/mask2former/venv/lib/python3.6/site-packages/torch/nn/modules/module.py(1102): _call_impl
/home/ubuntu/PycharmProjects/mask2former/venv/Mask2Former/mask2former/maskformer_model.py(198): forward
/home/ubuntu/PycharmProjects/mask2former/venv/lib/python3.6/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/home/ubuntu/PycharmProjects/mask2former/venv/lib/python3.6/site-packages/torch/nn/modules/module.py(1102): _call_impl
/home/ubuntu/PycharmProjects/mask2former/venv/detectron2/detectron2/export/flatten.py(259): <lambda>
/home/ubuntu/PycharmProjects/mask2former/venv/detectron2/detectron2/export/flatten.py(294): forward
/home/ubuntu/PycharmProjects/mask2former/venv/lib/python3.6/site-packages/torch/nn/modules/module.py(1090): _slow_forward
/home/ubuntu/PycharmProjects/mask2former/venv/lib/python3.6/site-packages/torch/nn/modules/module.py(1102): _call_impl
/home/ubuntu/PycharmProjects/mask2former/venv/lib/python3.6/site-packages/torch/jit/_trace.py(965): trace_module
/home/ubuntu/PycharmProjects/mask2former/venv/lib/python3.6/site-packages/torch/jit/_trace.py(750): trace
/home/ubuntu/PycharmProjects/mask2former/venv/Mask2Former/toTorchScript.py(44): export_tracing
/home/ubuntu/PycharmProjects/mask2former/venv/Mask2Former/toTorchScript.py(115): <module>
I tried to run pip install git+https://github.com/facebookresearch/Mask2Former
, but terminal throws a bunch of errors (probably because lack of setup.py in repo).
I really don't like using conda so Is there any way to install it with pip or should I build it from source?
Thanks for your great work!
For more general model, I think the model can infer bounding box too.
So, I have two questions about your work.
Thanks a lot!
Hi,
Thanks for your wonderful repo. I follow the steps in preparing datasets, but it seems that datasets/prepare_mapillary_vistas_ins_seg.py is not provided. Could you pls check it out?
Hi,
Thanks for your wonderful work and repo.
Could you please provide the instructions on how to visualize the video instance segmentation results on images or videos? Thanks!
when i finetuning on my own ytv2021 dataset, i always got "detected 10 instances per frame",why???? trained model?
Hi, thanks for your great work.
Currently I'm using coco maskformer2_swin_large_IN21k_384_bs16_100ep.yaml configuration with pretrained model.
I'm trying to convert this model to onnx format. But it gives me segmentation fault error.
Could you please share the converted model or inform about how to do it?
There are 100 instances per image, how to post-process these instances? Is there any threshold or NMS? Where is the code for this part? Thanks.
Hello, thank you for timely sharing the source code.
But I found that the training speed was very slow on my server.
Could u plz tell me the length of your training time on 8 V100?
configs/coco/instance-segmentation/maskformer2_R50_bs16_50ep.yaml
configs/coco/instance-segmentation/maskformer2_R101_bs16_50ep.yaml
What confused me a lot is the pixel decoder as you mentioned briefly in the paper in one of the experiments said “Swin-L-FaPN uses FaPN as pixel decoder”, I have the strong desire to know exactly how do you use the FaPN as the pixel decoder as FaPN itself is the complete model? If you incorporate the FaPN components——FeatureAlign and FeatureSelectionModule to the pixel decoder of Mask2Former in one of the experiments?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.