Git Product home page Git Product logo

mdetr's Introduction

MDETR: Modulated Detection for End-to-End Multi-Modal Understanding

WebsiteColabPaper

This repository contains code and links to pre-trained models for MDETR (Modulated DETR) for pre-training on data having aligned text and images with box annotations, as well as fine-tuning on tasks requiring fine grained understanding of image and text.

We show big gains on the phrase grounding task (Flickr30k), Referring Expression Comprehension (RefCOCO, RefCOCO+ and RefCOCOg) as well as Referring Expression Segmentation (PhraseCut, CLEVR Ref+). We also achieve competitive performance on visual question answering (GQA, CLEVR).

MDETR

TL;DR. We depart from the fixed frozen object detector approach of several popular vision + language pre-trained models and achieve true end-to-end multi-modal understanding by training our detector in the loop. In addition, we only detect objects that are relevant to the given text query, where the class labels for the objects are just the relevant words in the text query. This allows us to expand our vocabulary to anything found in free form text, making it possible to detect and reason over novel combination of object classes and attributes.

For details, please see the paper: MDETR - Modulated Detection for End-to-End Multi-Modal Understanding by Aishwarya Kamath, Mannat Singh, Yann LeCun, Ishan Misra, Gabriel Synnaeve and Nicolas Carion.

Aishwarya Kamath and Nicolas Carion made equal contributions to this codebase.

Usage

The requirements file has all the dependencies that are needed by MDETR.

We provide instructions how to install dependencies via conda. First, clone the repository locally:

git clone https://github.com/ashkamath/mdetr.git

Make a new conda env and activate it:

conda create -n mdetr_env python=3.8
conda activate mdetr_env

Install the the packages in the requirements.txt:

pip install -r requirements.txt

Multinode training

Distributed training is available via Slurm and submitit:

pip install submitit

Pre-training

The links to data, steps for data preparation and script for running finetuning can be found in Pretraining Instructions We also provide the pre-trained model weights for MDETR trained on our combined aligned dataset of 1.3 million images paired with text.

The models are summarized in the following table. Note that the performance reported is "raw", without any fine-tuning. For each dataset, we report the class-agnostic box AP@50, which measures how well the model finds the boxes mentioned in the text. All performances are reported on the respective validation sets of each dataset.

Backbone GQA Flickr Refcoco Url
Size
AP AP R@1 AP Refcoco R@1 Refcoco+ R@1 Refcocog R@1
1 R101 58.9 75.6 82.5 60.3 72.1 58.0 55.7 model 3GB
2 ENB3 59.5 76.6 82.9 57.6 70.2 56.7 53.8 model 2.4GB
3 ENB5 59.9 76.4 83.7 61.8 73.4 58.8 57.1 model 2.7GB

Downstream tasks

Phrase grounding on Flickr30k

Instructions for data preparation and script to run evaluation can be found at Flickr30k Instructions

AnyBox protocol

Backbone Pre-training Image Data Val R@1 Val R@5 Val R@10 Test R@1 Test R@5 Test R@10 url size
Resnet-101 COCO+VG+Flickr 82.5 92.9 94.9 83.4 93.5 95.3 model 3GB
EfficientNet-B3 COCO+VG+Flickr 82.9 93.2 95.2 84.0 93.8 95.6 model 2.4GB
EfficientNet-B5 COCO+VG+Flickr 83.6 93.4 95.1 84.3 93.9 95.8 model 2.7GB

MergedBox protocol

Backbone Pre-training Image Data Val R@1 Val R@5 Val R@10 Test R@1 Test R@5 Test R@10 url size
Resnet-101 COCO+VG+Flickr 82.3 91.8 93.7 83.8 92.7 94.4 model 3GB

Referring expression comprehension on RefCOCO, RefCOCO+, RefCOCOg

Instructions for data preparation and script to run finetuning and evaluation can be found at Referring Expression Instructions

RefCOCO

Backbone Pre-training Image Data Val TestA TestB url size
Resnet-101 COCO+VG+Flickr 86.75 89.58 81.41 model 3GB
EfficientNet-B3 COCO+VG+Flickr 87.51 90.40 82.67 model 2.4GB

RefCOCO+

Backbone Pre-training Image Data Val TestA TestB url size
Resnet-101 COCO+VG+Flickr 79.52 84.09 70.62 model 3GB
EfficientNet-B3 COCO+VG+Flickr 81.13 85.52 72.96 model 2.4GB

RefCOCOg

Backbone Pre-training Image Data Val Test url size
Resnet-101 COCO+VG+Flickr 81.64 80.89 model 3GB
EfficientNet-B3 COCO+VG+Flickr 83.35 83.31 model 2.4GB

Referring expression segmentation on PhraseCut

Instructions for data preparation and script to run finetuning and evaluation can be found at PhraseCut Instructions

Backbone M-IoU Precision @0.5 Precision @0.7 Precision @0.9 url size
Resnet-101 53.1 56.1 38.9 11.9 model 1.5GB
EfficientNet-B3 53.7 57.5 39.9 11.9 model 1.2GB

Visual question answering on GQA

Instructions for data preparation and scripts to run finetuning and evaluation can be found at GQA Instructions

Backbone Test-dev Test-std url size
Resnet-101 62.48 61.99 model 3GB
EfficientNet-B5 62.95 62.45 model 2.7GB

Long-tailed few-shot object detection

Instructions for data preparation and scripts to run finetuning and evaluation can be found at LVIS Instructions

Data AP AP 50 AP r APc AP f url size
1% 16.7 25.8 11.2 14.6 19.5 model 3GB
10% 24.2 38.0 20.9 24.9 24.3 model 3GB
100% 22.5 35.2 7.4 22.7 25.0 model 3GB

Synthetic datasets

Instructions to reproduce our results on CLEVR-based datasets are available at CLEVR instructions

Overall Accuracy Count Exist
Compare Number Query Attribute Compare Attribute Url Size
99.7 99.3 99.9 99.4 99.9 99.9 model 446MB

License

MDETR is released under the Apache 2.0 license. Please see the LICENSE file for more information.

Citation

If you find this repository useful please give it a star and cite as follows! :) :

    @article{kamath2021mdetr,
      title={MDETR--Modulated Detection for End-to-End Multi-Modal Understanding},
      author={Kamath, Aishwarya and Singh, Mannat and LeCun, Yann and Misra, Ishan and Synnaeve, Gabriel and Carion, Nicolas},
      journal={arXiv preprint arXiv:2104.12763},
      year={2021}
    }

mdetr's People

Contributors

alcinos avatar ashkamath avatar nguyeho7 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mdetr's Issues

Issues about flickr30k images dataset

I follow the flickr evalution instructions, but something wrong with the downloaded "flickr30k-images" dataset, reported error is "No such file or directory: '/data/flickr30k_images/val/100652400.jpg". Actually the "flickr30k-images" dataset I downloaded has no "val" folder.
Same goes to the fine-tuning, there is no folder in flickr30k-images dataset named "train".

Why GQA instructions are different from the paper?

Hi! Thank you for MDETR, it is an amazing idea and special thanks for documenting your code. It helped me a lot in understanding MDETR.

I have a question about the GQA dataset fine-tuning for visual question answering.

The paper says

[we] fine-tune first for 5 epochs on the unbalanced all GQA split, followed by 10 epochs on the balanced split similar to what is done in prior work [28, 5]. During the first 5 epochs, we train the modulated detection losses along with the question answering, but put a weight on question answering loss that encourages the model to focus more on this task. For the balanced split fine-tuning, we only use the question answering loss.

However, "reproduce results" instructions in gqa.md suggest fine-tuning for 125 epochs on either "all" or "balanced" split. It also seem to use detection loss for all of the epochs.

Could you clarify why these instructions are different and what kind of the results one can expect from the github instructions vs paper instructions. Currently, after training for 25 epochs with github instructions (balanced dataset) I have about 53% validation accuracy (gqa_accuracy_answer_total_unscaled) which is much lower than the number in the paper (62%).

Also, how important have you found the detection objective for this task?

what missing here?

Traceback (most recent call last):
  File "gradio/demo.py", line 107, in <module>
    model, postprocessor = torch.hub.load('ashkamath/mdetr:main', 'mdetr_efficientnetB5', pretrained=True, return_postprocessor=True)
  File "/usr/local/lib/python3.8/dist-packages/torch/hub.py", line 364, in load
    model = _load_local(repo_or_dir, model, *args, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torch/hub.py", line 390, in _load_local
    hub_module = import_module(MODULE_HUBCONF, hubconf_path)
  File "/usr/local/lib/python3.8/dist-packages/torch/hub.py", line 75, in import_module
    spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 848, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/root/.cache/torch/hub/ashkamath_mdetr_main/hubconf.py", line 4, in <module>
    from models.backbone import Backbone, Joiner, TimmBackbone
  File "/root/.cache/torch/hub/ashkamath_mdetr_main/models/__init__.py", line 3, in <module>
    from .mdetr import build
  File "/root/.cache/torch/hub/ashkamath_mdetr_main/models/mdetr.py", line 16, in <module>
    from util.misc import NestedTensor, interpolate
  File "/root/.cache/torch/hub/ashkamath_mdetr_main/util/misc.py", line 18, in <module>
    from torchvision.ops import _new_empty_tensor
ImportError: cannot import name '_new_empty_tensor' from 'torchvision.ops' (/usr/local/lib/python3.8/dist-packages/torchvision/ops/__init__.py)

CUDA error: out of memory when synchronize between processes on refexp at evaluation stage

Hi,

Thanks for your great work.

I met the OOM error at the evaluation stage after first epoch pretraining. The log is

Test: Total time: 0:01:42 (0.2476 s / it)
Averaged stats: loss: 113.5333 (96.1866)  loss_bbox: 0.5625 (0.5173)  loss_bbox_0: 0.6505 (0.5977)  loss_bbox_1: 0.5762 (0.5255)  loss_bbox_2: 0.5703 (0.5273)  loss_bbox_3: 0.5712 (0.5109)  loss_bbox_4: 0.5651 (0.5138)  loss_ce: 11.5826 (9.1250)  loss_ce_0: 11.4480 (9.4460)  loss_ce_1: 11.7980 (9.5058)  loss_ce_2: 11.8104 (9.4749)  loss_ce_3: 11.6550 (9.2512)  loss_ce_4: 11.5774 (9.0949)  loss_contrastive_align: 6.1482 (5.6187)  loss_contrastive_align_0: 6.1950 (5.8909)  loss_contrastive_align_1: 6.1946 (5.7864)  loss_contrastive_align_2: 6.1133 (5.7674)  loss_contrastive_align_3: 6.1261 (5.6713)  loss_contrastive_align_4: 6.0199 (5.5644)  loss_giou: 0.4890 (0.4578)  loss_giou_0: 0.5642 (0.5090)  loss_giou_1: 0.5024 (0.4579)  loss_giou_2: 0.4965 (0.4619)  loss_giou_3: 0.5086 (0.4525)  loss_giou_4: 0.4900 (0.4579)  cardinality_error_unscaled: 8.3906 (4.8554)  cardinality_error_0_unscaled: 6.5000 (4.3573)  cardinality_error_1_unscaled: 9.4062 (5.9682)  cardinality_error_2_unscaled: 10.3125 (6.3725)  cardinality_error_3_unscaled: 9.2969 (5.2416)  cardinality_error_4_unscaled: 8.8281 (5.0047)  loss_bbox_unscaled: 0.1125 (0.1035)  loss_bbox_0_unscaled: 0.1301 (0.1195)  loss_bbox_1_unscaled: 0.1152 (0.1051)  loss_bbox_2_unscaled: 0.1141 (0.1055)  loss_bbox_3_unscaled: 0.1142 (0.1022)  loss_bbox_4_unscaled: 0.1130 (0.1028)  loss_ce_unscaled: 11.5826 (9.1250)  loss_ce_0_unscaled: 11.4480 (9.4460)  loss_ce_1_unscaled: 11.7980 (9.5058)  loss_ce_2_unscaled: 11.8104 (9.4749)  loss_ce_3_unscaled: 11.6550 (9.2512)  loss_ce_4_unscaled: 11.5774 (9.0949)  loss_contrastive_align_unscaled: 6.1482 (5.6187)  loss_contrastive_align_0_unscaled: 6.1950 (5.8909)  loss_contrastive_align_1_unscaled: 6.1946 (5.7864)  loss_contrastive_align_2_unscaled: 6.1133 (5.7674)  loss_contrastive_align_3_unscaled: 6.1261 (5.6713)  loss_contrastive_align_4_unscaled: 6.0199 (5.5644)  loss_giou_unscaled: 0.2445 (0.2289)  loss_giou_0_unscaled: 0.2821 (0.2545)  loss_giou_1_unscaled: 0.2512 (0.2289)  loss_giou_2_unscaled: 0.2483 (0.2309)  loss_giou_3_unscaled: 0.2543 (0.2263)  loss_giou_4_unscaled: 0.2450 (0.2289)
gathering on cpu
gathering on cpu
gathering on cpu
Traceback (most recent call last):
  File \"main.py\", line 655, in <module>
    main(args)
  File \"main.py\", line 598, in main
    curr_test_stats = evaluate(
  File \"/usr/local/lib/python3.8/site-packages/torch/autograd/grad_mode.py\", line 26, in decorate_context
    return func(*args, **kwargs)
  File \"/worksapce/mdetr/trainer/engine.py\", line 230, in evaluate
    evaluator.synchronize_between_processes()
  File \"/worksapce/mdetr/trainer/datasets/refexp.py\", line 38, in synchronize_between_processes
    all_predictions = dist.all_gather(self.predictions)
  File \"/worksapce/mdetr/trainer/util/dist.py\", line 86, in all_gather
    obj = torch.load(buffer)
  File \"/usr/local/lib/python3.8/site-packages/torch/serialization.py\", line 594, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File \"/usr/local/lib/python3.8/site-packages/torch/serialization.py\", line 853, in _load
    result = unpickler.load()
  File \"/usr/local/lib/python3.8/site-packages/torch/serialization.py\", line 845, in persistent_load
    load_tensor(data_type, size, key, _maybe_decode_ascii(location))
  File \"/usr/local/lib/python3.8/site-packages/torch/serialization.py\", line 834, in load_tensor
    loaded_storages[key] = restore_location(storage, location)
  File \"/usr/local/lib/python3.8/site-packages/torch/serialization.py\", line 175, in default_restore_location
    result = fn(storage, location)
  File \"/usr/local/lib/python3.8/site-packages/torch/serialization.py\", line 157, in _cuda_deserialize
    return obj.cuda(device)
  File \"/usr/local/lib/python3.8/site-packages/torch/_utils.py\", line 79, in _cuda
    return new_type(self.size()).copy_(self, non_blocking)
  File \"/usr/local/lib/python3.8/site-packages/torch/cuda/__init__.py\", line 462, in _lazy_new
    return super(_CudaBase, cls).__new__(cls, *args, **kwargs)
RuntimeError: CUDA error: out of memory

I use 32G V100 GPUs, with 2 samples per GPU following default settings.
I also set CUBLAS_WORKSPACE_CONFIG=:4096:8 MDETR_CPU_REDUCE=1.

loading lvis model in colab

Hi,

Is there a sample on how to load the lvis model in colab, similar to what is currently done with torch.hub?

Error While Pretraining

Hi,

Thank you for the great work and for providing the pre-trained models. I was trying to run pre-training following the instructions at pretrain.md. I am getting the attached error. I am listing my environment details below. Any help would be appreciated.
image

PyTorch: 1.9.0+cu11.1
TorchVision: 0.10.0
Transformers: 4.5.1
Hardware: A single machine with 4xRTX A6000

Colab issue

Hi,
The colab book demo using current torchvision version (0.10.0) is not working
https://colab.research.google.com/github/ashkamath/mdetr/blob/colab/notebooks/MDETR_demo.ipynb

/root/.cache/torch/hub/ashkamath_mdetr_main/util/misc.py in <module>()
     16 # needed due to empty tensor bug in pytorch and torchvision 0.5
     17 if float(torchvision.__version__[:3]) < 0.7:
---> 18     from torchvision.ops import _new_empty_tensor
     19     from torchvision.ops.misc import _output_size
     20 

ImportError: cannot import name '_new_empty_tensor' from 'torchvision.ops' (/usr/local/lib/python3.7/dist-packages/torchvision/ops/__init__.py)

I think this happens because it takes the version number like 0.1 and this number is less than 0.7 entering the loop when it shouldn't

Negative and positive tokens in the annotation dataset.

Hi, thanks for the open-source annotation dataset.

I am confused about the meaning of token_negative in the annotation. For example, in

"file_name": "COCO_train2014_000000581857.jpg} ", "height": 640, "width": 427, "id": 3, "original_id": 581857, "caption": "woman in gray shirt facing camera on right", "dataset_name": "refcoco", "tokens_negative": [[0, 5], [6, 8], [20, 26], [34, 36], [37, 42]]"

I couldn't understand the meaning of "[[0, 5], [6, 8], [20, 26], [34, 36], [37, 42]]" these pairs of numbers.

I will be very grateful if you could help me to understand!

cannot do distributed training

Hello,

Thanks for open sourcing!

I try to run distributed training for pretraining. Without distributed training, it works fine.

I get the below error. I tried with pytorch versions 1.7.0, 1.7.1 and 1.8.0 They get below error. Version 1.9 gets ImportError: cannot import name '_new_empty_tensor' from 'torchvision.ops' **(/work/vcirik/anaconda3/envs/mdetr/lib/python3.8/site-packages/torchvision/ops/__init__.py)** ``

I tried changing this line to losses.backward(retain_graph=True), it did not fix.
Let me know if you have any suggestions on how to address this issue.

Traceback (most recent call last):
  File "main.py", line 643, in <module>
    main(args)
  File "main.py", line 546, in main
    train_stats = train_one_epoch(
  File "/work/vcirik/mdetr/engine.py", line 100, in train_one_epoch
    losses.backward()
  File "/work/vcirik/anaconda3/envs/mdetr/lib/python3.8/site-packages/torch/tensor.py", line 221, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/work/vcirik/anaconda3/envs/mdetr/lib/python3.8/site-packages/torch/autograd/__init__.py", line 130, in backward
    Variable._execution_engine.run_backward(
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.LongTensor [1, 10]] is at version 3; expected version 2 instead. Hint: enable anomaly detection to find the operati\
on that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

The evaluation of referring expression

Hi,

I followed the instruction of evaluation for referring expression on COCO dataset train2014. But when I passed the args for test "!python run_with_submitit.py --dataset_config configs/refcoco.json --batch_size 4 --resume https://zenodo.org/record/4721981/files/refcoco_resnet101_checkpoint.pth --ngpus 1 --nodes 1 --ema --test --test_type testA", I didn't get any result of precision or recall, only:
"Start training
Training time 0:00:00
submitit INFO (2021-07-17 11:30:17,492) - Job completed successfully"

I also downloaded the coco dataset of val2014 and test2014 but I am not sure if I need to use that because it gave me error when I pass these dataset.

Thanks a lot in advance!

Best,

Missing key(s) in state_dict: "contrastive_align_projection_image.weight", "contrastive_align_projection_image.bias", "contrastive_align_projection_text.weight", "contrastive_align_projection_text.bias".

Hello,
Thanks for open sourcing!
I try to run evaluation on GQA but failed.

python main.py  --dataset_config configs/gqa.json --ema --eval --do_qa --split_qa_heads --resume https://zenodo.org/record/4721981/files/gqa_resnet101_checkpoint.pth

the following error occurred:

Traceback (most recent call last):
  File "main.py", line 649, in <module>
    main(args)
  File "main.py", line 465, in main
    model_without_ddp.load_state_dict(checkpoint["model"])
  File "/home/data/anaconda3/envs/qxy_mdetr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1223, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for MDETR:
        Missing key(s) in state_dict: "contrastive_align_projection_image.weight", "contrastive_align_projection_image.bias", "contrastive_align_projection_text.weight", "contrastive_align_projection_text.bias".

How to deal with it?

Loss increases during pretraining

Hi @alcinos, @ashkamath, @nguyeho7,

I hope you are doing good.

I was trying to pretrain MDETR using the provided instructions. What I noticed is that loss started increasing during the 20th epoch. It kept decreasing to around 39 till the 19th epoch and jumped to around 77 after the 20th epoch. What could be the reason for this? Note that I am using the EfficientNetB5 backbone. The log.txt is attached.

Thanks

log.txt

How to run fine-tuning on VQA2 dataset?

Experiments with VQA v.2 dataset are described in Appendix E of the article. But it's not clear from main.py and run_with_submitit.py files how to run the fine-tuning (I've tried to write the same command that is used for fine-tuning on CLEVR). I've also found vqa_coco_format.py but it seems like preparation of the data, not fine-tuning itself. Also, I've encountered using build_dataset function in main.py and I don't see VQA v2 in the function :(
Could you please explain how to do so?

UPD 1 (09.27.21):
I've downloaded COCO and VQA v2 datasets and ran

python scripts/fine-tuning/vqa_coco_format.py --data_path VQA_v2_dataset/ --img_path COCO_dataset/images/ --coco_path COCO_dataset/

And the processing has finished correctly. Now I'm thinking how to write VQA v2 dataset script...

UPD 2 (10.03.21)
It seems I managed to implement all the necessary classes and fix the code. I'm currently doing an experiment eval -> train on vqa2 -> eval. As soon as it successfully finishes I'll push the code into my fork of the repo.

UPD 3 (10.03.21)
Yeah, it works! Here is the link: https://github.com/TopCoder2K/mdetr. I haven't written any documentation because I'm not sure that fine-tuning on VQA is useful to anybody.)) If you have any question, please ask here :)

Questions about the experiment

hi, i've got 2 questions

  1. in the few-shot transfer learning experiment on LVIS, the 100%-data's performance is not better than the 10%-data's for MDETR over all metrics, why is that ? ( i only found the small object detection drop being mentioned)

  2. for referring image segmentation task, have you tested on the UNC/UNC+/G-ref/ReferIt, these're frequently mentioned tasks for the referring image segmentation. They're based on MS COCO, but i noticed that you excluded the val/test set of COCO in the pretraining

thanks

Meaning of 'tokens_positive' and 'positive_map'

Hello, thank you for this great work, and I have a question about the meaning of 'tokens_positive' and 'positive_map' in finetune_lvis.json file. I am confused about why the positive_map generated from tokens_positive is a 256-length vector and how this can help the model to learn the text of category.

Contrastive loss implementation discrepancy between the paper and codebase

Hello,

This is in relation to the losses described in the paper and implemented in the codebase. Need your help in understanding the following:

  1. The 4th Page in the paper reads that: "the contrastive alignment loss enforces alignment between the embedded representations of the object at the output of the decoder, and the text representation at the output of the cross encoder." However, in the code transformer.py, the following snippet is being used for the loss calculations:

"text_pooled_op": encoded_text.pooler_output if self.CLS is not None else None,

"img_pooled_op": img_memory[0] if self.CLS is not None else None, # Return the CLS token

which essentially means that we are deriving the embedded representation of the text from the BERT-based text backbone encoder's classification token and the image embedded representation is being derived from the output of the transformer encoder. Is this genuinely a discrepancy? If not, can you kindly point towards the snippet for these loss calculations where you are tapping in the decoder output?

  1. Also, is the following understanding correct: The 'Soft token prediction' loss from the paper is actually called 'contrastive_align_loss' in the codebase and the 'Contrastive alignment' loss from the paper is actually named 'contrastive_loss' in the codebase.

Thank you.

Missing finetune_phrasecut_test.json

when testing on phrasecut dataset, met a bug:

FileNotFoundErrorFileNotFoundError: [Errno 2] No such file or directory: '/data16t/data/referring-segmentation/Pre-processed-annotations/finetune_phrasecut_test.json':

There is no such file in provided mdetr_annotations.tar.gz. Really hope you can open this file. Thank you very much.

Typo in scripts/eval_lvis.py

Hello there, thank you for the high-quality code. Though I find a typo in the scripts/eval_lvis.py line 43 utils.init_distributed_mode(args), I think that should be dist.init_distributed_mode(args), isn't it?

Please let me know if I misunderstand the code and run the code in the wrong way.

colab out of ram session crashed

after running a couple times on different images and changing

standard PyTorch mean-std input image normalization

transform = T.Compose([
T.Resize(800),
T.ToTensor(),
T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

to

standard PyTorch mean-std input image normalization

transform = T.Compose([
T.Resize(500),
T.ToTensor(),
T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

I get session crashed

MDETR pretraining crashes with 8GPU on one instance, inplace modification error

Hello,
I tried pretraining MDETR using your guide on a 8 GPU Volta instance.
I modified the config to only contain flickr30k (removed mixed) and ran the following command:

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --dataset_config configs/pretrain.json --ema

However, it throws an error in Roberta text encoder (it's 8 times, I truncated it):

[W python_anomaly_mode.cpp:104] Warning: Error detected in EmbeddingBackward. Traceback of forward call that caused the error:
  File "main.py", line 648, in <module>
    main(args)
  File "main.py", line 563, in main
    model_ema=model_ema,
  File "/task_runtime/mdetr_2/mdetr/engine.py", line 68, in train_one_epoch
    memory_cache = model(samples, captions, encode_and_save=True)
  File "/miniconda/envs/iris/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/miniconda/envs/iris/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 799, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/miniconda/envs/iris/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/task_runtime/mdetr_2/mdetr/models/mdetr.py", line 143, in forward
    text_attention_mask=None,
  File "/miniconda/envs/iris/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/task_runtime/mdetr_2/mdetr/models/transformer.py", line 121, in forward
    encoded_text = self.text_encoder(**tokenized)
  File "/miniconda/envs/iris/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/miniconda/envs/iris/lib/python3.7/site-packages/transformers/models/roberta/modeling_roberta.py", line 842, in forward
    past_key_values_length=past_key_values_length,
  File "/miniconda/envs/iris/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/miniconda/envs/iris/lib/python3.7/site-packages/transformers/models/roberta/**modeling_roberta.py**", line 132, in forward
    token_type_embeddings = self.token_type_embeddings(token_type_ids)
  File "/miniconda/envs/iris/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/miniconda/envs/iris/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 160, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/miniconda/envs/iris/lib/python3.7/site-packages/torch/nn/functional.py", line 2043, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
 (function _print_stack)


Traceback (most recent call last):
  File "main.py", line 648, in <module>
    main(args)
  File "main.py", line 563, in main
    model_ema=model_ema,
  File "/task_runtime/mdetr_2/mdetr/engine.py", line 100, in train_one_epoch
    losses.backward()
  File "/miniconda/envs/iris/lib/python3.7/site-packages/torch/_tensor.py", line 255, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/miniconda/envs/iris/lib/python3.7/site-packages/torch/autograd/__init__.py", line 149, in backward
    allow_unreachable=True, accumulate_grad=True)  # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.LongTensor [2, 20]] is at version 3; expected version 2 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!


pretrain performance

Hi,

Thanks for your great work.

When I tried to reproduce the pretrained performance, I found the results were mismatch with the paper, especially for Refcoco.

Any help would be much appreciated.

  GQA AP Flickr AP Flickr R@1 Refcoco AP Refcoco R@1 Refcoco+ R@1 Refcocog R@1
Res101 58.9 75.6 82.5 60.3 72.1 58.0 55.7
reprodude 58.6 75.7 82.9 56.5 70.2 55.3 54.2

Error when I run on a single node with 8 gpus

(Minseok) ubuntu@DESKTOP-SMIU2JP:~/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main$ python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --dataset_config configs/pretrain.json --ema --backbone timm_tf_efficientnet_b3_ns --lr_backbone 5e-5 /home/ubuntu/anaconda3/envs/Minseok/lib/python3.9/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects --local_rankargument to be set, please change it to read fromos.environ['LOCAL_RANK']` instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions

warnings.warn(
WARNING:torch.distributed.run:*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


| distributed init (rank 0): env://
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/main.py", line 643, in
File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/main.py", line 643, in
File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/main.py", line 643, in
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/main.py", line 643, in
main(args) main(args)main(args)

File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/main.py", line 281, in main
File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/main.py", line 281, in main
File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/main.py", line 281, in main
Traceback (most recent call last):
main(args)Traceback (most recent call last):

File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/main.py", line 643, in
File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/main.py", line 643, in
File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/main.py", line 281, in main
dist.init_distributed_mode(args)dist.init_distributed_mode(args)
File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/util/dist.py", line 220, in init_distributed_mode

dist.init_distributed_mode(args)Traceback (most recent call last):
dist.init_distributed_mode(args)
File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/util/dist.py", line 220, in init_distributed_mode

File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/main.py", line 643, in
File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/util/dist.py", line 220, in init_distributed_mode
File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/util/dist.py", line 220, in init_distributed_mode
torch.cuda.set_device(args.gpu)
File "/home/ubuntu/anaconda3/envs/Minseok/lib/python3.9/site-packages/torch/cuda/init.py", line 311, in set_device
torch.cuda.set_device(args.gpu)torch.cuda.set_device(args.gpu)main(args)main(args)torch.cuda.set_device(args.gpu)

File "/home/ubuntu/anaconda3/envs/Minseok/lib/python3.9/site-packages/torch/cuda/init.py", line 311, in set_device
File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/main.py", line 281, in main
File "/home/ubuntu/anaconda3/envs/Minseok/lib/python3.9/site-packages/torch/cuda/init.py", line 311, in set_device
File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/main.py", line 281, in main
File "/home/ubuntu/anaconda3/envs/Minseok/lib/python3.9/site-packages/torch/cuda/init.py", line 311, in set_device
main(args)torch._C._cuda_setDevice(device)

File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/main.py", line 281, in main
RuntimeError torch._C._cuda_setDevice(device): torch._C._cuda_setDevice(device)torch._C._cuda_setDevice(device)CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
dist.init_distributed_mode(args)dist.init_distributed_mode(args)

RuntimeError
RuntimeError
RuntimeErrordist.init_distributed_mode(args): : File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/util/dist.py", line 220, in init_distributed_mode
CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.: File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/util/dist.py", line 220, in init_distributed_mode

CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/util/dist.py", line 220, in init_distributed_mode
torch.cuda.set_device(args.gpu)torch.cuda.set_device(args.gpu)torch.cuda.set_device(args.gpu)

File "/home/ubuntu/anaconda3/envs/Minseok/lib/python3.9/site-packages/torch/cuda/init.py", line 311, in set_device
File "/home/ubuntu/anaconda3/envs/Minseok/lib/python3.9/site-packages/torch/cuda/init.py", line 311, in set_device
File "/home/ubuntu/anaconda3/envs/Minseok/lib/python3.9/site-packages/torch/cuda/init.py", line 311, in set_device
torch._C._cuda_setDevice(device)torch._C._cuda_setDevice(device)

RuntimeErrorRuntimeErrortorch._C._cuda_setDevice(device): 

: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.RuntimeError

: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 8606 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 8607) of binary: /home/ubuntu/anaconda3/envs/Minseok/bin/python
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/Minseok/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/ubuntu/anaconda3/envs/Minseok/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/ubuntu/anaconda3/envs/Minseok/lib/python3.9/site-packages/torch/distributed/launch.py", line 193, in
main()
File "/home/ubuntu/anaconda3/envs/Minseok/lib/python3.9/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/home/ubuntu/anaconda3/envs/Minseok/lib/python3.9/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/home/ubuntu/anaconda3/envs/Minseok/lib/python3.9/site-packages/torch/distributed/run.py", line 689, in run
elastic_launch(
File "/home/ubuntu/anaconda3/envs/Minseok/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/ubuntu/anaconda3/envs/Minseok/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 259, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:


        main.py FAILED           

=====================================
Root Cause:
[0]:
time: 2021-09-26_17:57:42
rank: 1 (local_rank: 1)
exitcode: 1 (pid: 8607)
error_file: <N/A>
msg: Process failed with exitcode 1

Other Failures:
[1]:
time: 2021-09-26_17:57:42
rank: 2 (local_rank: 2)
exitcode: 1 (pid: 8608)
error_file: <N/A>
msg: Process failed with exitcode 1
[2]:
time: 2021-09-26_17:57:42
rank: 3 (local_rank: 3)
exitcode: 1 (pid: 8609)
error_file: <N/A>
msg: Process failed with exitcode 1
[3]:
time: 2021-09-26_17:57:42
rank: 4 (local_rank: 4)
exitcode: 1 (pid: 8610)
error_file: <N/A>
msg: Process failed with exitcode 1
[4]:
time: 2021-09-26_17:57:42
rank: 5 (local_rank: 5)
exitcode: 1 (pid: 8611)
error_file: <N/A>
msg: Process failed with exitcode 1
[5]:
time: 2021-09-26_17:57:42
rank: 6 (local_rank: 6)
exitcode: 1 (pid: 8612)
error_file: <N/A>
msg: Process failed with exitcode 1
[6]:
time: 2021-09-26_17:57:42
rank: 7 (local_rank: 7)
exitcode: 1 (pid: 8613)
error_file: <N/A>
msg: Process failed with exitcode 1
*************************************`
I just find this error when I try to run on a single node with 8 gpus in Renet101 with Mdetr Model
it seems I got multiple errors in once and I have no idea how to fix these errors.
Can anyone help me to fix?

Training and Finetuning time of MDetr

Hi,

Thank you for your awesome work. I was wondering about the pre-training time (GPU hours) of your model and fine-tuning hours on maybe Refcoco or something similar. Could you please let me know about that? (It would be very helpful if you could possibly add this information for all the datasets when you find the time. Thanks!)

Details in loss_ce

Hi, thank you for this great work.

Though I have a question about the loss_ce. As far as I know, this loss is to train the model output the bounding boxes with the same category. And I find that the ground truth is made by using the 'positive_map' instead of the 'label' from the annotation. I am wondering why the choice here and is it possible to make the ground truth simply using the 'label'.

Best,

Kun

Typo in the command

It seems there is a typo in the command for the training on CLEVR-Medium.

mkdir step1
python main.py --dataset_config configs/clevr_pretrain.json --backbone "resnet18" --num_queries 25 --batch_size 64  --schedule linear_with_warmup --text_encoder_type distilroberta-base --output_dir step1 --epochs 30 --lr_drop 20

because in the main.py file we can see 'output-dir' not 'output_dir'
image
And then I run the command, I got an unrecognized option error. And the same for the command for training on CLEVR-full.

UPD 1:
Also, I want to add a comment about running finetuning on the "all" split on the GQA dataset. It seems that there should be --resume, not --load, because according to main.py load just uses torch.load, so it doesn't understand url and triggers FileNotFoundError. Or it's possible to use --load pretrained_resnet101_checkpoint.pth instead.

The pretrained mode for refClevr+

Hi, thanks for providing plenty of pre-trained models.

I am using the dataset of RefClevr+ for some experiments. But I could not find the pre-trained model for this dataset but only Clevr dataset in the Synthetic Dataset. Is there a chance that the pre-trained model is provided on your web?

Thanks a lot in advance!
Best regards

KeyError: 'gqa_accuracy_answer_total_unscaled'

This mistake is really strange... I follow the readme for training MDETR on CLEVR.
Firstly, I've ran the following command:

python run_with_submitit.py --dataset_config configs/clevr_pretrain.json --backbone "resnet18" --num_queries 25 --batch_size 64  --schedule linear_with_warmup --text_encoder_type distilroberta-base --output-dir step1 --epochs 5 --lr_drop 20 --nodes 1 --ngpus 1

The only difference with the one in the readme is that I've used run_with_submitit.py and added --nodes 1 --ngpus 1 parameters.
The training has gone well and the job has finished successfully. Then I've ran

python run_with_submitit.py --dataset_config configs/clevr.json --backbone "resnet18" --num_queries 25 --batch_size 64  --schedule linear_with_warmup --text_encoder_type distilroberta-base --output-dir step2 --load ~/MDETR/mdetr/checkpoint/pchelintsev/experiments/19906/BEST_checkpoint.pth --epochs 5 --lr_drop 20 --nodes 1 --ngpus 1

And after the first epoch and testing I've gotten the following in 28574_0_log.err file (warnings were deleted):

submitit ERROR (2021-09-27 13:01:24,999) - Submitted job triggered an exception
Traceback (most recent call last):
  File "/home/pchelintsev/anaconda3/envs/mdetr_env/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/pchelintsev/anaconda3/envs/mdetr_env/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/pchelintsev/anaconda3/envs/mdetr_env/lib/python3.8/site-packages/submitit/core/_submit.py", line 11, in <module>
    submitit_main()
  File "/home/pchelintsev/anaconda3/envs/mdetr_env/lib/python3.8/site-packages/submitit/core/submission.py", line 71, in submitit_main
    process_job(args.folder)
  File "/home/pchelintsev/anaconda3/envs/mdetr_env/lib/python3.8/site-packages/submitit/core/submission.py", line 64, in process_job
    raise error
  File "/home/pchelintsev/anaconda3/envs/mdetr_env/lib/python3.8/site-packages/submitit/core/submission.py", line 53, in process_job
    result = delayed.result()
  File "/home/pchelintsev/anaconda3/envs/mdetr_env/lib/python3.8/site-packages/submitit/core/utils.py", line 128, in result
    self._result = self.function(*self.args, **self.kwargs)
  File "run_with_submitit.py", line 98, in __call__
    detection.main(self.args)
  File "/home/pchelintsev/MDETR/mdetr/main.py", line 614, in main
    metric = test_stats["gqa_accuracy_answer_total_unscaled"]
KeyError: 'gqa_accuracy_answer_total_unscaled'

Why the loss is missing?((
Also, here is the end of 28574_0_log.out file:

Accumulating evaluation results...
DONE (t=70.57s).
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.581
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.893
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.660
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.374
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.578
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.768
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.302
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.729
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.741
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.637
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.741
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.842
submitit ERROR (2021-09-27 13:01:24,999) - Submitted job triggered an exception

OOM error when evaluate detection on lvis minival

Hello, I am facing an out of memory problem when testing with the eval_lvis.py file.
My hardware setup is 8*1080 Ti GPU with pytorch 1.5. I have managed to successfully run the training code with batch size as 1, but when I try to test the detection on lvis performance, there is an out of memory error as following: line 76 at util/dist.py.
Can you help me with this problem? Thank you in advance.

Why the model size on PhraseCut is small than others?

It really confuses me that:

  • According to the paper "In the first step, we take our pre-trained model after 40 epochs and fine-tune it for 5 epochs on this dataset, supervising the model to output correct boxes for the referred expressions"

  • Since the referring expression segmentation model also uses the pre-trained model, why is it only 1.2GB smaller than the pre-trained model which is 2.4GB?

missing final_mixed_train.json

Thanks for your code.

After running your code, I run into the following error:

FileNotFoundError: [Errno 2] No such file or directory: '../OpenSource/final_mixed_train.json'

Where to find final_mixed _train.json?

RuntimeError: No shared folder available

When I run python run_with_submitit.py --dataset_config configs/refcoco.json --batch_size 4 --load https://zenodo.org/record/4721981/files/pretrained_resnet101_checkpoint.pth?download=1 --ngpus 1 --nodes 2 --ema --text_encoder_lr 1e-5 --lr 5e-5, the following error occurred:
Traceback (most recent call last):
File "run_with_submitit.py", line 171, in
main()
File "run_with_submitit.py", line 130, in main
args.job_dir = get_shared_folder(args) / "%j"
File "run_with_submitit.py", line 41, in get_shared_folder
raise RuntimeError("No shared folder available")
RuntimeError: No shared folder available

How to deal with it?

Error with Runtime

캡처
I just got RuntimeError with no shared folder available
Do you have any ideas to solve this error?

How to train DETR on provided LVIS 1%/10%/100% dtaset

Hi,
Thank you for the great work. Though I am wondering how to train the DETR using the dataset you provided. When I check the caption for each bounding box, I find that the 'salmon_(fish)' and 'salmon_(food)' are merged into 'salmon' using the clean_name() function, resulting in 1199 classes smaller than 1203 classes in the original LVIS dataset.

I am wondering if there is a way to train the conventional DETR using the provided .json file.

Thanks in advance!

What are tokens?

I have seen the clevr annotation file, and am able to understand the fields and entries except the tokens in annotations. I would very much like to know what tokens are.
Could you help me out? Thank you

Resnet-50 backbone mdetr model

Hi,

Thanks for sharing the code for the paper. I am looking into phrase grounding on Flicker30k entities dataset. The backbones available for this setting are Resnet101. Do you have a Resnet50 trained model also?

Thanks

Submitit detection: error: unrecognized arguments: 4

When I run finetuning on the "all" split:

python run_with_submitit.py --dataset_config configs/gqa.json --ngpus 8 --ema --epochs 125 --epoch_chunks 25 --do_qa --split_qa_heads --lr_drop 150 --load https://zenodo.org/record/4721981/files/pretrained_resnet101_checkpoint.pth--nodes 4 --batch_size 4 --no_aux_loss --qa_loss_coef 25 --lr 1.4e-4 --lr_backbone 1.4e-5 --text_encoder_lr 7e-5

I got the following:

usage: Submitit detection [-h] [--run_name RUN_NAME] --dataset_config DATASET_CONFIG [--do_qa] [--predict_final] [--no_detection] [--split_qa_heads]
                          [--combine_datasets COMBINE_DATASETS [COMBINE_DATASETS ...]] [--combine_datasets_val COMBINE_DATASETS_VAL [COMBINE_DATASETS_VAL ...]] [--coco_path COCO_PATH]
                          [--vg_img_path VG_IMG_PATH] [--vg_ann_path VG_ANN_PATH] [--clevr_img_path CLEVR_IMG_PATH] [--clevr_ann_path CLEVR_ANN_PATH] [--phrasecut_ann_path PHRASECUT_ANN_PATH]
                          [--phrasecut_orig_ann_path PHRASECUT_ORIG_ANN_PATH] [--modulated_lvis_ann_path MODULATED_LVIS_ANN_PATH] [--lr LR] [--lr_backbone LR_BACKBONE]
                          [--text_encoder_lr TEXT_ENCODER_LR] [--batch_size BATCH_SIZE] [--weight_decay WEIGHT_DECAY] [--epochs EPOCHS] [--lr_drop LR_DROP] [--epoch_chunks EPOCH_CHUNKS]
                          [--optimizer OPTIMIZER] [--clip_max_norm CLIP_MAX_NORM] [--eval_skip EVAL_SKIP] [--schedule {step,multistep,linear_with_warmup,all_linear_with_warmup}] [--ema]
                          [--ema_decay EMA_DECAY] [--fraction_warmup_steps FRACTION_WARMUP_STEPS] [--frozen_weights FROZEN_WEIGHTS] [--freeze_text_encoder]
                          [--text_encoder_type {roberta-base,distilroberta-base,roberta-large}] [--backbone BACKBONE] [--dilation] [--position_embedding {sine,learned}] [--enc_layers ENC_LAYERS]
                          [--dec_layers DEC_LAYERS] [--dim_feedforward DIM_FEEDFORWARD] [--hidden_dim HIDDEN_DIM] [--dropout DROPOUT] [--nheads NHEADS] [--num_queries NUM_QUERIES] [--pre_norm]
                          [--no_pass_pos_and_query] [--mask_model {none,smallconv,v2}] [--remove_difficult] [--masks] [--no_aux_loss] [--set_loss {sequential,hungarian,lexicographical}]
                          [--contrastive_loss] [--no_contrastive_align_loss] [--contrastive_loss_hdim CONTRASTIVE_LOSS_HDIM] [--temperature_NCE TEMPERATURE_NCE] [--set_cost_class SET_COST_CLASS]
                          [--set_cost_bbox SET_COST_BBOX] [--set_cost_giou SET_COST_GIOU] [--ce_loss_coef CE_LOSS_COEF] [--mask_loss_coef MASK_LOSS_COEF] [--dice_loss_coef DICE_LOSS_COEF]
                          [--bbox_loss_coef BBOX_LOSS_COEF] [--giou_loss_coef GIOU_LOSS_COEF] [--qa_loss_coef QA_LOSS_COEF] [--eos_coef EOS_COEF] [--contrastive_loss_coef CONTRASTIVE_LOSS_COEF]
                          [--contrastive_align_loss_coef CONTRASTIVE_ALIGN_LOSS_COEF] [--test] [--test_type {testA,testB,test}] [--output-dir OUTPUT_DIR] [--device DEVICE] [--seed SEED]
                          [--resume RESUME] [--load LOAD] [--start-epoch N] [--eval] [--num_workers NUM_WORKERS] [--world-size WORLD_SIZE] [--dist-url DIST_URL] [--partition PARTITION] [--ngpus NGPUS]
                          [--nodes NODES] [--timeout TIMEOUT] [--job_dir JOB_DIR] [--mail MAIL]
Submitit detection: error: unrecognized arguments: 4

It seems this is because there is no space between --load and --nodes options

A pink elephant

Hi , thanks for this great work
i would like to know what did you use for changing the color of the elephant , what model or framework ?
thank you ,

MDETR for conditional generation

Hi! Thank you for your amazing paper and work!!

I looked at your code and I guess the "decoder" in MDETR is different from the ordinary concept of decoder that people think in NLP area, which is able to generate answer in autoregressive manner, like GPT-2 or BART.

I wonder if you tried implementing MDETR with autoregressive decoder, because I'm about to do it.
If you have not, could you give me your thought whether this will work well or not?

Thank you :)

No such file or directory: 'mdetr_annotations/finetune_phrasecut_train.json'

Hi, thank you for such good job.
According to the guide in https://github.com/ashkamath/mdetr/blob/main/.github/phrasecut.md, we use your model MDETR training on the PhraseCut dataset. When we download the pre-process dataset in https://zenodo.org/record/4729015/files/mdetr_annotations.tar.gz?download=1, we find that the "finetune_phrasecut_train.json" not be included in these files, can you provide this json file ?
thanks~

windows10

It's hard to run on windows10 anaconda, too many errors to give up... :(

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.