ashkamath / mdetr Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
Hi,
Thank you for your awesome work. I was wondering about the pre-training time (GPU hours) of your model and fine-tuning hours on maybe Refcoco or something similar. Could you please let me know about that? (It would be very helpful if you could possibly add this information for all the datasets when you find the time. Thanks!)
Hi,
Thanks for sharing the code for the paper. I am looking into phrase grounding on Flicker30k entities dataset. The backbones available for this setting are Resnet101. Do you have a Resnet50 trained model also?
Thanks
It seems there is a typo in the command for the training on CLEVR-Medium.
mkdir step1
python main.py --dataset_config configs/clevr_pretrain.json --backbone "resnet18" --num_queries 25 --batch_size 64 --schedule linear_with_warmup --text_encoder_type distilroberta-base --output_dir step1 --epochs 30 --lr_drop 20
because in the main.py file we can see 'output-dir' not 'output_dir'
And then I run the command, I got an unrecognized option error. And the same for the command for training on CLEVR-full.
UPD 1:
Also, I want to add a comment about running finetuning on the "all" split on the GQA dataset. It seems that there should be --resume
, not --load
, because according to main.py load
just uses torch.load, so it doesn't understand url and triggers FileNotFoundError
. Or it's possible to use --load pretrained_resnet101_checkpoint.pth
instead.
Hello, thank you for this great work, and I have a question about the meaning of 'tokens_positive' and 'positive_map' in finetune_lvis.json file. I am confused about why the positive_map generated from tokens_positive is a 256-length vector and how this can help the model to learn the text of category.
Hi , thanks for this great work
i would like to know what did you use for changing the color of the elephant , what model or framework ?
thank you ,
Hi,
Is there a sample on how to load the lvis model in colab, similar to what is currently done with torch.hub?
I follow the flickr evalution instructions, but something wrong with the downloaded "flickr30k-images" dataset, reported error is "No such file or directory: '/data/flickr30k_images/val/100652400.jpg". Actually the "flickr30k-images" dataset I downloaded has no "val" folder.
Same goes to the fine-tuning, there is no folder in flickr30k-images dataset named "train".
Hello,
Thanks for open sourcing!
I try to run distributed training for pretraining. Without distributed training, it works fine.
I get the below error. I tried with pytorch versions 1.7.0
, 1.7.1
and 1.8.0
They get below error. Version 1.9
gets ImportError: cannot import name '_new_empty_tensor' from 'torchvision.ops' **(/work/vcirik/anaconda3/envs/mdetr/lib/python3.8/site-packages/torchvision/ops/__init__.py)
** ``
I tried changing this line to losses.backward(retain_graph=True)
, it did not fix.
Let me know if you have any suggestions on how to address this issue.
Traceback (most recent call last):
File "main.py", line 643, in <module>
main(args)
File "main.py", line 546, in main
train_stats = train_one_epoch(
File "/work/vcirik/mdetr/engine.py", line 100, in train_one_epoch
losses.backward()
File "/work/vcirik/anaconda3/envs/mdetr/lib/python3.8/site-packages/torch/tensor.py", line 221, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/work/vcirik/anaconda3/envs/mdetr/lib/python3.8/site-packages/torch/autograd/__init__.py", line 130, in backward
Variable._execution_engine.run_backward(
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.LongTensor [1, 10]] is at version 3; expected version 2 instead. Hint: enable anomaly detection to find the operati\
on that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).
Hello, I am wondering if there is log file available for the fine tuning on 1% LVIS few shot detection.
Hi! Thank you for your amazing paper and work!!
I looked at your code and I guess the "decoder" in MDETR is different from the ordinary concept of decoder that people think in NLP area, which is able to generate answer in autoregressive manner, like GPT-2 or BART.
I wonder if you tried implementing MDETR with autoregressive decoder, because I'm about to do it.
If you have not, could you give me your thought whether this will work well or not?
Thank you :)
Hi, Thanks for your excellent work, it inspires me a lot. I want to reproduce your result for learning, but I got some problems. It seems that the link of pre-process annotations :https://zenodo.org/record/4729015/files/mdetr_annotations.tar.gz?download=1
can't open now, that I can't download the annotations files, so could you provide a new link?
I saw that MDETR uses top-k boxes for evaluation and computes the max IoU as the final results. Is this a common method in the referring expression comprehension task?
This mistake is really strange... I follow the readme for training MDETR on CLEVR.
Firstly, I've ran the following command:
python run_with_submitit.py --dataset_config configs/clevr_pretrain.json --backbone "resnet18" --num_queries 25 --batch_size 64 --schedule linear_with_warmup --text_encoder_type distilroberta-base --output-dir step1 --epochs 5 --lr_drop 20 --nodes 1 --ngpus 1
The only difference with the one in the readme is that I've used run_with_submitit.py
and added --nodes 1 --ngpus 1
parameters.
The training has gone well and the job has finished successfully. Then I've ran
python run_with_submitit.py --dataset_config configs/clevr.json --backbone "resnet18" --num_queries 25 --batch_size 64 --schedule linear_with_warmup --text_encoder_type distilroberta-base --output-dir step2 --load ~/MDETR/mdetr/checkpoint/pchelintsev/experiments/19906/BEST_checkpoint.pth --epochs 5 --lr_drop 20 --nodes 1 --ngpus 1
And after the first epoch and testing I've gotten the following in 28574_0_log.err
file (warnings were deleted):
submitit ERROR (2021-09-27 13:01:24,999) - Submitted job triggered an exception
Traceback (most recent call last):
File "/home/pchelintsev/anaconda3/envs/mdetr_env/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/pchelintsev/anaconda3/envs/mdetr_env/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/pchelintsev/anaconda3/envs/mdetr_env/lib/python3.8/site-packages/submitit/core/_submit.py", line 11, in <module>
submitit_main()
File "/home/pchelintsev/anaconda3/envs/mdetr_env/lib/python3.8/site-packages/submitit/core/submission.py", line 71, in submitit_main
process_job(args.folder)
File "/home/pchelintsev/anaconda3/envs/mdetr_env/lib/python3.8/site-packages/submitit/core/submission.py", line 64, in process_job
raise error
File "/home/pchelintsev/anaconda3/envs/mdetr_env/lib/python3.8/site-packages/submitit/core/submission.py", line 53, in process_job
result = delayed.result()
File "/home/pchelintsev/anaconda3/envs/mdetr_env/lib/python3.8/site-packages/submitit/core/utils.py", line 128, in result
self._result = self.function(*self.args, **self.kwargs)
File "run_with_submitit.py", line 98, in __call__
detection.main(self.args)
File "/home/pchelintsev/MDETR/mdetr/main.py", line 614, in main
metric = test_stats["gqa_accuracy_answer_total_unscaled"]
KeyError: 'gqa_accuracy_answer_total_unscaled'
Why the loss is missing?((
Also, here is the end of 28574_0_log.out
file:
Accumulating evaluation results...
DONE (t=70.57s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.581
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.893
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.660
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.374
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.578
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.768
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.302
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.729
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.741
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.637
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.741
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.842
submitit ERROR (2021-09-27 13:01:24,999) - Submitted job triggered an exception
As the title implies
when testing on phrasecut dataset, met a bug:
FileNotFoundErrorFileNotFoundError: [Errno 2] No such file or directory: '/data16t/data/referring-segmentation/Pre-processed-annotations/finetune_phrasecut_test.json':
There is no such file in provided mdetr_annotations.tar.gz
. Really hope you can open this file. Thank you very much.
It really confuses me that:
According to the paper "In the first step, we take our pre-trained model after 40 epochs and fine-tune it for 5 epochs on this dataset, supervising the model to output correct boxes for the referred expressions"
Since the referring expression segmentation model also uses the pre-trained model, why is it only 1.2GB smaller than the pre-trained model which is 2.4GB?
Hi ,
Thanks for the amazing work.
May I ask during inference, if there is any fundamental difference in the coco_img['tokens_positive_eval'] and coco_img['tokens_positive_eval'], it looks like they have the same word span but in a different order?
Hi,
Thank you for the great work. Though I am wondering how to train the DETR using the dataset you provided. When I check the caption for each bounding box, I find that the 'salmon_(fish)' and 'salmon_(food)' are merged into 'salmon' using the clean_name() function, resulting in 1199 classes smaller than 1203 classes in the original LVIS dataset.
I am wondering if there is a way to train the conventional DETR using the provided .json file.
Thanks in advance!
Hello there, thank you for the high-quality code. Though I find a typo in the scripts/eval_lvis.py line 43 utils.init_distributed_mode(args)
, I think that should be dist.init_distributed_mode(args)
, isn't it?
Please let me know if I misunderstand the code and run the code in the wrong way.
after running a couple times on different images and changing
transform = T.Compose([
T.Resize(800),
T.ToTensor(),
T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
to
transform = T.Compose([
T.Resize(500),
T.ToTensor(),
T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])
I get session crashed
It's hard to run on windows10 anaconda, too many errors to give up... :(
Thanks for your code.
After running your code, I run into the following error:
FileNotFoundError: [Errno 2] No such file or directory: '../OpenSource/final_mixed_train.json'
Where to find final_mixed _train.json?
Hi, thank you for this great work.
Though I have a question about the loss_ce. As far as I know, this loss is to train the model output the bounding boxes with the same category. And I find that the ground truth is made by using the 'positive_map' instead of the 'label' from the annotation. I am wondering why the choice here and is it possible to make the ground truth simply using the 'label'.
Best,
Kun
Hi @alcinos, @ashkamath, @nguyeho7,
I hope you are doing good.
I was trying to pretrain MDETR using the provided instructions. What I noticed is that loss started increasing during the 20th epoch. It kept decreasing to around 39 till the 19th epoch and jumped to around 77 after the 20th epoch. What could be the reason for this? Note that I am using the EfficientNetB5 backbone. The log.txt is attached.
Thanks
Hello,
I tried pretraining MDETR using your guide on a 8 GPU Volta instance.
I modified the config to only contain flickr30k (removed mixed) and ran the following command:
python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --dataset_config configs/pretrain.json --ema
However, it throws an error in Roberta text encoder (it's 8 times, I truncated it):
[W python_anomaly_mode.cpp:104] Warning: Error detected in EmbeddingBackward. Traceback of forward call that caused the error:
File "main.py", line 648, in <module>
main(args)
File "main.py", line 563, in main
model_ema=model_ema,
File "/task_runtime/mdetr_2/mdetr/engine.py", line 68, in train_one_epoch
memory_cache = model(samples, captions, encode_and_save=True)
File "/miniconda/envs/iris/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/miniconda/envs/iris/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 799, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/miniconda/envs/iris/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/task_runtime/mdetr_2/mdetr/models/mdetr.py", line 143, in forward
text_attention_mask=None,
File "/miniconda/envs/iris/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/task_runtime/mdetr_2/mdetr/models/transformer.py", line 121, in forward
encoded_text = self.text_encoder(**tokenized)
File "/miniconda/envs/iris/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/miniconda/envs/iris/lib/python3.7/site-packages/transformers/models/roberta/modeling_roberta.py", line 842, in forward
past_key_values_length=past_key_values_length,
File "/miniconda/envs/iris/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/miniconda/envs/iris/lib/python3.7/site-packages/transformers/models/roberta/**modeling_roberta.py**", line 132, in forward
token_type_embeddings = self.token_type_embeddings(token_type_ids)
File "/miniconda/envs/iris/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/miniconda/envs/iris/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 160, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "/miniconda/envs/iris/lib/python3.7/site-packages/torch/nn/functional.py", line 2043, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
(function _print_stack)
Traceback (most recent call last):
File "main.py", line 648, in <module>
main(args)
File "main.py", line 563, in main
model_ema=model_ema,
File "/task_runtime/mdetr_2/mdetr/engine.py", line 100, in train_one_epoch
losses.backward()
File "/miniconda/envs/iris/lib/python3.7/site-packages/torch/_tensor.py", line 255, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/miniconda/envs/iris/lib/python3.7/site-packages/torch/autograd/__init__.py", line 149, in backward
allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.LongTensor [2, 20]] is at version 3; expected version 2 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
Hi,
Thanks for your great work.
When I tried to reproduce the pretrained performance, I found the results were mismatch with the paper, especially for Refcoco.
Any help would be much appreciated.
GQA AP | Flickr AP | Flickr R@1 | Refcoco AP | Refcoco R@1 | Refcoco+ R@1 | Refcocog R@1 | |
---|---|---|---|---|---|---|---|
Res101 | 58.9 | 75.6 | 82.5 | 60.3 | 72.1 | 58.0 | 55.7 |
reprodude | 58.6 | 75.7 | 82.9 | 56.5 | 70.2 | 55.3 | 54.2 |
I have seen the clevr annotation file, and am able to understand the fields and entries except the tokens in annotations. I would very much like to know what tokens are.
Could you help me out? Thank you
When I run finetuning on the "all" split:
python run_with_submitit.py --dataset_config configs/gqa.json --ngpus 8 --ema --epochs 125 --epoch_chunks 25 --do_qa --split_qa_heads --lr_drop 150 --load https://zenodo.org/record/4721981/files/pretrained_resnet101_checkpoint.pth--nodes 4 --batch_size 4 --no_aux_loss --qa_loss_coef 25 --lr 1.4e-4 --lr_backbone 1.4e-5 --text_encoder_lr 7e-5
I got the following:
usage: Submitit detection [-h] [--run_name RUN_NAME] --dataset_config DATASET_CONFIG [--do_qa] [--predict_final] [--no_detection] [--split_qa_heads]
[--combine_datasets COMBINE_DATASETS [COMBINE_DATASETS ...]] [--combine_datasets_val COMBINE_DATASETS_VAL [COMBINE_DATASETS_VAL ...]] [--coco_path COCO_PATH]
[--vg_img_path VG_IMG_PATH] [--vg_ann_path VG_ANN_PATH] [--clevr_img_path CLEVR_IMG_PATH] [--clevr_ann_path CLEVR_ANN_PATH] [--phrasecut_ann_path PHRASECUT_ANN_PATH]
[--phrasecut_orig_ann_path PHRASECUT_ORIG_ANN_PATH] [--modulated_lvis_ann_path MODULATED_LVIS_ANN_PATH] [--lr LR] [--lr_backbone LR_BACKBONE]
[--text_encoder_lr TEXT_ENCODER_LR] [--batch_size BATCH_SIZE] [--weight_decay WEIGHT_DECAY] [--epochs EPOCHS] [--lr_drop LR_DROP] [--epoch_chunks EPOCH_CHUNKS]
[--optimizer OPTIMIZER] [--clip_max_norm CLIP_MAX_NORM] [--eval_skip EVAL_SKIP] [--schedule {step,multistep,linear_with_warmup,all_linear_with_warmup}] [--ema]
[--ema_decay EMA_DECAY] [--fraction_warmup_steps FRACTION_WARMUP_STEPS] [--frozen_weights FROZEN_WEIGHTS] [--freeze_text_encoder]
[--text_encoder_type {roberta-base,distilroberta-base,roberta-large}] [--backbone BACKBONE] [--dilation] [--position_embedding {sine,learned}] [--enc_layers ENC_LAYERS]
[--dec_layers DEC_LAYERS] [--dim_feedforward DIM_FEEDFORWARD] [--hidden_dim HIDDEN_DIM] [--dropout DROPOUT] [--nheads NHEADS] [--num_queries NUM_QUERIES] [--pre_norm]
[--no_pass_pos_and_query] [--mask_model {none,smallconv,v2}] [--remove_difficult] [--masks] [--no_aux_loss] [--set_loss {sequential,hungarian,lexicographical}]
[--contrastive_loss] [--no_contrastive_align_loss] [--contrastive_loss_hdim CONTRASTIVE_LOSS_HDIM] [--temperature_NCE TEMPERATURE_NCE] [--set_cost_class SET_COST_CLASS]
[--set_cost_bbox SET_COST_BBOX] [--set_cost_giou SET_COST_GIOU] [--ce_loss_coef CE_LOSS_COEF] [--mask_loss_coef MASK_LOSS_COEF] [--dice_loss_coef DICE_LOSS_COEF]
[--bbox_loss_coef BBOX_LOSS_COEF] [--giou_loss_coef GIOU_LOSS_COEF] [--qa_loss_coef QA_LOSS_COEF] [--eos_coef EOS_COEF] [--contrastive_loss_coef CONTRASTIVE_LOSS_COEF]
[--contrastive_align_loss_coef CONTRASTIVE_ALIGN_LOSS_COEF] [--test] [--test_type {testA,testB,test}] [--output-dir OUTPUT_DIR] [--device DEVICE] [--seed SEED]
[--resume RESUME] [--load LOAD] [--start-epoch N] [--eval] [--num_workers NUM_WORKERS] [--world-size WORLD_SIZE] [--dist-url DIST_URL] [--partition PARTITION] [--ngpus NGPUS]
[--nodes NODES] [--timeout TIMEOUT] [--job_dir JOB_DIR] [--mail MAIL]
Submitit detection: error: unrecognized arguments: 4
It seems this is because there is no space between --load and --nodes options
Hi,
Thank you for the great work and for providing the pre-trained models. I was trying to run pre-training following the instructions at pretrain.md. I am getting the attached error. I am listing my environment details below. Any help would be appreciated.
PyTorch: 1.9.0+cu11.1
TorchVision: 0.10.0
Transformers: 4.5.1
Hardware: A single machine with 4xRTX A6000
Hi,
The colab book demo using current torchvision version (0.10.0) is not working
https://colab.research.google.com/github/ashkamath/mdetr/blob/colab/notebooks/MDETR_demo.ipynb
/root/.cache/torch/hub/ashkamath_mdetr_main/util/misc.py in <module>()
16 # needed due to empty tensor bug in pytorch and torchvision 0.5
17 if float(torchvision.__version__[:3]) < 0.7:
---> 18 from torchvision.ops import _new_empty_tensor
19 from torchvision.ops.misc import _output_size
20
ImportError: cannot import name '_new_empty_tensor' from 'torchvision.ops' (/usr/local/lib/python3.7/dist-packages/torchvision/ops/__init__.py)
I think this happens because it takes the version number like 0.1 and this number is less than 0.7 entering the loop when it shouldn't
Traceback (most recent call last):
File "gradio/demo.py", line 107, in <module>
model, postprocessor = torch.hub.load('ashkamath/mdetr:main', 'mdetr_efficientnetB5', pretrained=True, return_postprocessor=True)
File "/usr/local/lib/python3.8/dist-packages/torch/hub.py", line 364, in load
model = _load_local(repo_or_dir, model, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/torch/hub.py", line 390, in _load_local
hub_module = import_module(MODULE_HUBCONF, hubconf_path)
File "/usr/local/lib/python3.8/dist-packages/torch/hub.py", line 75, in import_module
spec.loader.exec_module(module)
File "<frozen importlib._bootstrap_external>", line 848, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/root/.cache/torch/hub/ashkamath_mdetr_main/hubconf.py", line 4, in <module>
from models.backbone import Backbone, Joiner, TimmBackbone
File "/root/.cache/torch/hub/ashkamath_mdetr_main/models/__init__.py", line 3, in <module>
from .mdetr import build
File "/root/.cache/torch/hub/ashkamath_mdetr_main/models/mdetr.py", line 16, in <module>
from util.misc import NestedTensor, interpolate
File "/root/.cache/torch/hub/ashkamath_mdetr_main/util/misc.py", line 18, in <module>
from torchvision.ops import _new_empty_tensor
ImportError: cannot import name '_new_empty_tensor' from 'torchvision.ops' (/usr/local/lib/python3.8/dist-packages/torchvision/ops/__init__.py)
Experiments with VQA v.2 dataset are described in Appendix E of the article. But it's not clear from main.py and run_with_submitit.py files how to run the fine-tuning (I've tried to write the same command that is used for fine-tuning on CLEVR). I've also found vqa_coco_format.py but it seems like preparation of the data, not fine-tuning itself. Also, I've encountered using build_dataset function in main.py and I don't see VQA v2 in the function :(
Could you please explain how to do so?
UPD 1 (09.27.21):
I've downloaded COCO and VQA v2 datasets and ran
python scripts/fine-tuning/vqa_coco_format.py --data_path VQA_v2_dataset/ --img_path COCO_dataset/images/ --coco_path COCO_dataset/
And the processing has finished correctly. Now I'm thinking how to write VQA v2 dataset script...
UPD 2 (10.03.21)
It seems I managed to implement all the necessary classes and fix the code. I'm currently doing an experiment eval -> train on vqa2 -> eval. As soon as it successfully finishes I'll push the code into my fork of the repo.
UPD 3 (10.03.21)
Yeah, it works! Here is the link: https://github.com/TopCoder2K/mdetr. I haven't written any documentation because I'm not sure that fine-tuning on VQA is useful to anybody.)) If you have any question, please ask here :)
Hello, I am facing an out of memory problem when testing with the eval_lvis.py file.
My hardware setup is 8*1080 Ti GPU with pytorch 1.5. I have managed to successfully run the training code with batch size as 1, but when I try to test the detection on lvis performance, there is an out of memory error as following: line 76 at util/dist.py.
Can you help me with this problem? Thank you in advance.
Hello,
Thanks for open sourcing!
I try to run evaluation on GQA but failed.
python main.py --dataset_config configs/gqa.json --ema --eval --do_qa --split_qa_heads --resume https://zenodo.org/record/4721981/files/gqa_resnet101_checkpoint.pth
the following error occurred:
Traceback (most recent call last):
File "main.py", line 649, in <module>
main(args)
File "main.py", line 465, in main
model_without_ddp.load_state_dict(checkpoint["model"])
File "/home/data/anaconda3/envs/qxy_mdetr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1223, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for MDETR:
Missing key(s) in state_dict: "contrastive_align_projection_image.weight", "contrastive_align_projection_image.bias", "contrastive_align_projection_text.weight", "contrastive_align_projection_text.bias".
How to deal with it?
cv newcomer, I would like to ask one thing I don't understand, that is, I see that in the transformer class, pictures and text are concat together according to the token dimension. How to ensure that each patch of the image and the token of the text are semantically corresponding.
Hi, thanks for the open-source annotation dataset.
I am confused about the meaning of token_negative in the annotation. For example, in
"file_name": "COCO_train2014_000000581857.jpg} ", "height": 640, "width": 427, "id": 3, "original_id": 581857, "caption": "woman in gray shirt facing camera on right", "dataset_name": "refcoco", "tokens_negative": [[0, 5], [6, 8], [20, 26], [34, 36], [37, 42]]"
I couldn't understand the meaning of "[[0, 5], [6, 8], [20, 26], [34, 36], [37, 42]]" these pairs of numbers.
I will be very grateful if you could help me to understand!
is it possible to do inference on cpu?
Hello,
This is in relation to the losses described in the paper and implemented in the codebase. Need your help in understanding the following:
"text_pooled_op": encoded_text.pooler_output if self.CLS is not None else None,
"img_pooled_op": img_memory[0] if self.CLS is not None else None, # Return the CLS token
which essentially means that we are deriving the embedded representation of the text from the BERT-based text backbone encoder's classification token and the image embedded representation is being derived from the output of the transformer encoder. Is this genuinely a discrepancy? If not, can you kindly point towards the snippet for these loss calculations where you are tapping in the decoder output?
Thank you.
Hi, thanks for providing plenty of pre-trained models.
I am using the dataset of RefClevr+ for some experiments. But I could not find the pre-trained model for this dataset but only Clevr dataset in the Synthetic Dataset. Is there a chance that the pre-trained model is provided on your web?
Thanks a lot in advance!
Best regards
It is giving me "Error 429: too many requests" every time i try to download the models. Kindly help.
(Minseok) ubuntu@DESKTOP-SMIU2JP:~/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main$ python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --dataset_config configs/pretrain.json --ema --backbone timm_tf_efficientnet_b3_ns --lr_backbone 5e-5 /home/ubuntu/anaconda3/envs/Minseok/lib/python3.9/site-packages/torch/distributed/launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects
--local_rankargument to be set, please change it to read from
os.environ['LOCAL_RANK']` instead. See
https://pytorch.org/docs/stable/distributed.html#launch-utility for
further instructions
warnings.warn(
WARNING:torch.distributed.run:*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
| distributed init (rank 0): env://
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/main.py", line 643, in
File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/main.py", line 643, in
File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/main.py", line 643, in
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/main.py", line 643, in
main(args) main(args)main(args)
File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/main.py", line 281, in main
File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/main.py", line 281, in main
File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/main.py", line 281, in main
Traceback (most recent call last):
main(args)Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/main.py", line 643, in
File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/main.py", line 643, in
File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/main.py", line 281, in main
dist.init_distributed_mode(args)dist.init_distributed_mode(args)
File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/util/dist.py", line 220, in init_distributed_mode
dist.init_distributed_mode(args)Traceback (most recent call last):
dist.init_distributed_mode(args)
File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/util/dist.py", line 220, in init_distributed_mode
File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/main.py", line 643, in
File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/util/dist.py", line 220, in init_distributed_mode
File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/util/dist.py", line 220, in init_distributed_mode
torch.cuda.set_device(args.gpu)
File "/home/ubuntu/anaconda3/envs/Minseok/lib/python3.9/site-packages/torch/cuda/init.py", line 311, in set_device
torch.cuda.set_device(args.gpu)torch.cuda.set_device(args.gpu)main(args)main(args)torch.cuda.set_device(args.gpu)
File "/home/ubuntu/anaconda3/envs/Minseok/lib/python3.9/site-packages/torch/cuda/init.py", line 311, in set_device
File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/main.py", line 281, in main
File "/home/ubuntu/anaconda3/envs/Minseok/lib/python3.9/site-packages/torch/cuda/init.py", line 311, in set_device
File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/main.py", line 281, in main
File "/home/ubuntu/anaconda3/envs/Minseok/lib/python3.9/site-packages/torch/cuda/init.py", line 311, in set_device
main(args)torch._C._cuda_setDevice(device)
File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/main.py", line 281, in main
RuntimeError torch._C._cuda_setDevice(device): torch._C._cuda_setDevice(device)torch._C._cuda_setDevice(device)CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
dist.init_distributed_mode(args)dist.init_distributed_mode(args)
RuntimeError
RuntimeError
RuntimeErrordist.init_distributed_mode(args): : File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/util/dist.py", line 220, in init_distributed_mode
CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.: File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/util/dist.py", line 220, in init_distributed_mode
CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
File "/home/ubuntu/anaconda3/envs/Minseok/Portfolio/TeamProject/mdetr-main/util/dist.py", line 220, in init_distributed_mode
torch.cuda.set_device(args.gpu)torch.cuda.set_device(args.gpu)torch.cuda.set_device(args.gpu)
File "/home/ubuntu/anaconda3/envs/Minseok/lib/python3.9/site-packages/torch/cuda/init.py", line 311, in set_device
File "/home/ubuntu/anaconda3/envs/Minseok/lib/python3.9/site-packages/torch/cuda/init.py", line 311, in set_device
File "/home/ubuntu/anaconda3/envs/Minseok/lib/python3.9/site-packages/torch/cuda/init.py", line 311, in set_device
torch._C._cuda_setDevice(device)torch._C._cuda_setDevice(device)
RuntimeErrorRuntimeErrortorch._C._cuda_setDevice(device):
: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.RuntimeError
: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 8606 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 8607) of binary: /home/ubuntu/anaconda3/envs/Minseok/bin/python
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/Minseok/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/ubuntu/anaconda3/envs/Minseok/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/ubuntu/anaconda3/envs/Minseok/lib/python3.9/site-packages/torch/distributed/launch.py", line 193, in
main()
File "/home/ubuntu/anaconda3/envs/Minseok/lib/python3.9/site-packages/torch/distributed/launch.py", line 189, in main
launch(args)
File "/home/ubuntu/anaconda3/envs/Minseok/lib/python3.9/site-packages/torch/distributed/launch.py", line 174, in launch
run(args)
File "/home/ubuntu/anaconda3/envs/Minseok/lib/python3.9/site-packages/torch/distributed/run.py", line 689, in run
elastic_launch(
File "/home/ubuntu/anaconda3/envs/Minseok/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 131, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/ubuntu/anaconda3/envs/Minseok/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 259, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
main.py FAILED
Other Failures:
[1]:
time: 2021-09-26_17:57:42
rank: 2 (local_rank: 2)
exitcode: 1 (pid: 8608)
error_file: <N/A>
msg: Process failed with exitcode 1
[2]:
time: 2021-09-26_17:57:42
rank: 3 (local_rank: 3)
exitcode: 1 (pid: 8609)
error_file: <N/A>
msg: Process failed with exitcode 1
[3]:
time: 2021-09-26_17:57:42
rank: 4 (local_rank: 4)
exitcode: 1 (pid: 8610)
error_file: <N/A>
msg: Process failed with exitcode 1
[4]:
time: 2021-09-26_17:57:42
rank: 5 (local_rank: 5)
exitcode: 1 (pid: 8611)
error_file: <N/A>
msg: Process failed with exitcode 1
[5]:
time: 2021-09-26_17:57:42
rank: 6 (local_rank: 6)
exitcode: 1 (pid: 8612)
error_file: <N/A>
msg: Process failed with exitcode 1
[6]:
time: 2021-09-26_17:57:42
rank: 7 (local_rank: 7)
exitcode: 1 (pid: 8613)
error_file: <N/A>
msg: Process failed with exitcode 1
*************************************`
I just find this error when I try to run on a single node with 8 gpus in Renet101 with Mdetr Model
it seems I got multiple errors in once and I have no idea how to fix these errors.
Can anyone help me to fix?
hi, i've got 2 questions
in the few-shot transfer learning experiment on LVIS, the 100%-data's performance is not better than the 10%-data's for MDETR over all metrics, why is that ? ( i only found the small object detection drop being mentioned)
for referring image segmentation task, have you tested on the UNC/UNC+/G-ref/ReferIt, these're frequently mentioned tasks for the referring image segmentation. They're based on MS COCO, but i noticed that you excluded the val/test set of COCO in the pretraining
thanks
Hi! Thank you for MDETR, it is an amazing idea and special thanks for documenting your code. It helped me a lot in understanding MDETR.
I have a question about the GQA dataset fine-tuning for visual question answering.
The paper says
[we] fine-tune first for 5 epochs on the unbalanced all GQA split, followed by 10 epochs on the balanced split similar to what is done in prior work [28, 5]. During the first 5 epochs, we train the modulated detection losses along with the question answering, but put a weight on question answering loss that encourages the model to focus more on this task. For the balanced split fine-tuning, we only use the question answering loss.
However, "reproduce results" instructions in gqa.md suggest fine-tuning for 125 epochs on either "all" or "balanced" split. It also seem to use detection loss for all of the epochs.
Could you clarify why these instructions are different and what kind of the results one can expect from the github instructions vs paper instructions. Currently, after training for 25 epochs with github instructions (balanced dataset) I have about 53% validation accuracy (gqa_accuracy_answer_total_unscaled
) which is much lower than the number in the paper (62%).
Also, how important have you found the detection objective for this task?
Hi,
I followed the instruction of evaluation for referring expression on COCO dataset train2014. But when I passed the args for test "!python run_with_submitit.py --dataset_config configs/refcoco.json --batch_size 4 --resume https://zenodo.org/record/4721981/files/refcoco_resnet101_checkpoint.pth --ngpus 1 --nodes 1 --ema --test --test_type testA", I didn't get any result of precision or recall, only:
"Start training
Training time 0:00:00
submitit INFO (2021-07-17 11:30:17,492) - Job completed successfully"
I also downloaded the coco dataset of val2014 and test2014 but I am not sure if I need to use that because it gave me error when I pass these dataset.
Thanks a lot in advance!
Best,
Hi, thank you for such good job.
According to the guide in https://github.com/ashkamath/mdetr/blob/main/.github/phrasecut.md, we use your model MDETR training on the PhraseCut dataset. When we download the pre-process dataset in https://zenodo.org/record/4729015/files/mdetr_annotations.tar.gz?download=1, we find that the "finetune_phrasecut_train.json" not be included in these files, can you provide this json file ?
thanks~
When I run python run_with_submitit.py --dataset_config configs/refcoco.json --batch_size 4 --load https://zenodo.org/record/4721981/files/pretrained_resnet101_checkpoint.pth?download=1 --ngpus 1 --nodes 2 --ema --text_encoder_lr 1e-5 --lr 5e-5, the following error occurred:
Traceback (most recent call last):
File "run_with_submitit.py", line 171, in
main()
File "run_with_submitit.py", line 130, in main
args.job_dir = get_shared_folder(args) / "%j"
File "run_with_submitit.py", line 41, in get_shared_folder
raise RuntimeError("No shared folder available")
RuntimeError: No shared folder available
How to deal with it?
Hi,
Thanks for your great work.
I met the OOM error at the evaluation stage after first epoch pretraining. The log is
Test: Total time: 0:01:42 (0.2476 s / it)
Averaged stats: loss: 113.5333 (96.1866) loss_bbox: 0.5625 (0.5173) loss_bbox_0: 0.6505 (0.5977) loss_bbox_1: 0.5762 (0.5255) loss_bbox_2: 0.5703 (0.5273) loss_bbox_3: 0.5712 (0.5109) loss_bbox_4: 0.5651 (0.5138) loss_ce: 11.5826 (9.1250) loss_ce_0: 11.4480 (9.4460) loss_ce_1: 11.7980 (9.5058) loss_ce_2: 11.8104 (9.4749) loss_ce_3: 11.6550 (9.2512) loss_ce_4: 11.5774 (9.0949) loss_contrastive_align: 6.1482 (5.6187) loss_contrastive_align_0: 6.1950 (5.8909) loss_contrastive_align_1: 6.1946 (5.7864) loss_contrastive_align_2: 6.1133 (5.7674) loss_contrastive_align_3: 6.1261 (5.6713) loss_contrastive_align_4: 6.0199 (5.5644) loss_giou: 0.4890 (0.4578) loss_giou_0: 0.5642 (0.5090) loss_giou_1: 0.5024 (0.4579) loss_giou_2: 0.4965 (0.4619) loss_giou_3: 0.5086 (0.4525) loss_giou_4: 0.4900 (0.4579) cardinality_error_unscaled: 8.3906 (4.8554) cardinality_error_0_unscaled: 6.5000 (4.3573) cardinality_error_1_unscaled: 9.4062 (5.9682) cardinality_error_2_unscaled: 10.3125 (6.3725) cardinality_error_3_unscaled: 9.2969 (5.2416) cardinality_error_4_unscaled: 8.8281 (5.0047) loss_bbox_unscaled: 0.1125 (0.1035) loss_bbox_0_unscaled: 0.1301 (0.1195) loss_bbox_1_unscaled: 0.1152 (0.1051) loss_bbox_2_unscaled: 0.1141 (0.1055) loss_bbox_3_unscaled: 0.1142 (0.1022) loss_bbox_4_unscaled: 0.1130 (0.1028) loss_ce_unscaled: 11.5826 (9.1250) loss_ce_0_unscaled: 11.4480 (9.4460) loss_ce_1_unscaled: 11.7980 (9.5058) loss_ce_2_unscaled: 11.8104 (9.4749) loss_ce_3_unscaled: 11.6550 (9.2512) loss_ce_4_unscaled: 11.5774 (9.0949) loss_contrastive_align_unscaled: 6.1482 (5.6187) loss_contrastive_align_0_unscaled: 6.1950 (5.8909) loss_contrastive_align_1_unscaled: 6.1946 (5.7864) loss_contrastive_align_2_unscaled: 6.1133 (5.7674) loss_contrastive_align_3_unscaled: 6.1261 (5.6713) loss_contrastive_align_4_unscaled: 6.0199 (5.5644) loss_giou_unscaled: 0.2445 (0.2289) loss_giou_0_unscaled: 0.2821 (0.2545) loss_giou_1_unscaled: 0.2512 (0.2289) loss_giou_2_unscaled: 0.2483 (0.2309) loss_giou_3_unscaled: 0.2543 (0.2263) loss_giou_4_unscaled: 0.2450 (0.2289)
gathering on cpu
gathering on cpu
gathering on cpu
Traceback (most recent call last):
File \"main.py\", line 655, in <module>
main(args)
File \"main.py\", line 598, in main
curr_test_stats = evaluate(
File \"/usr/local/lib/python3.8/site-packages/torch/autograd/grad_mode.py\", line 26, in decorate_context
return func(*args, **kwargs)
File \"/worksapce/mdetr/trainer/engine.py\", line 230, in evaluate
evaluator.synchronize_between_processes()
File \"/worksapce/mdetr/trainer/datasets/refexp.py\", line 38, in synchronize_between_processes
all_predictions = dist.all_gather(self.predictions)
File \"/worksapce/mdetr/trainer/util/dist.py\", line 86, in all_gather
obj = torch.load(buffer)
File \"/usr/local/lib/python3.8/site-packages/torch/serialization.py\", line 594, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File \"/usr/local/lib/python3.8/site-packages/torch/serialization.py\", line 853, in _load
result = unpickler.load()
File \"/usr/local/lib/python3.8/site-packages/torch/serialization.py\", line 845, in persistent_load
load_tensor(data_type, size, key, _maybe_decode_ascii(location))
File \"/usr/local/lib/python3.8/site-packages/torch/serialization.py\", line 834, in load_tensor
loaded_storages[key] = restore_location(storage, location)
File \"/usr/local/lib/python3.8/site-packages/torch/serialization.py\", line 175, in default_restore_location
result = fn(storage, location)
File \"/usr/local/lib/python3.8/site-packages/torch/serialization.py\", line 157, in _cuda_deserialize
return obj.cuda(device)
File \"/usr/local/lib/python3.8/site-packages/torch/_utils.py\", line 79, in _cuda
return new_type(self.size()).copy_(self, non_blocking)
File \"/usr/local/lib/python3.8/site-packages/torch/cuda/__init__.py\", line 462, in _lazy_new
return super(_CudaBase, cls).__new__(cls, *args, **kwargs)
RuntimeError: CUDA error: out of memory
I use 32G V100 GPUs, with 2 samples per GPU following default settings.
I also set CUBLAS_WORKSPACE_CONFIG=:4096:8 MDETR_CPU_REDUCE=1
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.