Git Product home page Git Product logo

vilbert_beta's Issues

performance of Image retrieval on flivkr30k

The best finetune result(finetune from the pretrain model you published) I get is 56.62,83.96,90.56 which is still 1.6 lower than your reported result, furthermore, the zero-shot evaluation result from your public conceptual pretrained model is only 26.83,56.43,68.92, is it the right one? I wish to know more training details, like the command and out file of each pretrained and finetune model, and I have some questions:

  1. Do you use freeze param during finetune or only pretrain?
  2. How do you calculate the hard negative? What kind of image feature do you use to calculate the similarity(ROI feature or other features like resnet or dense net for full image)?
  3. How do you set the LR decay epoch for finetuning? I see from the out file that looks like 0.2 with [11,13,15,17]
  4. The pretrain code output log shows LR=0, is it normal? (because I see that you set different decay weight for different params)

Thanks a lot!

By the way, I finetuned image retrieval on flickr 30k with these params:
Namespace(baseline=False, bert_model='bert-base-uncased', compact=False, config_file='config/bert_base_6layer_6conect.json', do_lower_case=True, evaluation_interval=1, fp16=False, freeze=-1, from_pretrained='/conceptual_pretrained_bert_base_6_layer_6_connect_freeze_0/pytorch_model_9.bin', gradient_accumulation_steps=1, in_memory=False, learning_rate=2e-05, local_rank=0, loss_scale=0, lr_scheduler='mannul', no_cuda=False, num_train_epochs=20, num_workers=9, optimizer='BertAdam', output_dir='/model/vilbert/', save_name='finetune_retrieval_2', seed=0, tasks='3', use_chunk=0, vision_scratch=False, warmup_proportion=0.1)

{
"attention_probs_dropout_prob": 0.1,
"bi_attention_type": 1,
"bi_hidden_size": 1024,
"bi_intermediate_size": 1024,
"bi_num_attention_heads": 8,
"fast_mode": false,
"fixed_t_layer": 0,
"fixed_v_layer": 0,
"fusion_method": "mul",
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"in_batch_pairs": false,
"initializer_range": 0.02,
"intermediate_size": 3072,
"intra_gate": false,
"max_position_embeddings": 512,
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pooling_method": "mul",
"predict_feature": false,
"t_biattention_id": [
6,
7,
8,
9,
10,
11
],
"type_vocab_size": 2,
"v_attention_probs_dropout_prob": 0.1,
"v_biattention_id": [
0,
1,
2,
3,
4,
5
],
"v_feature_size": 2048,
"v_hidden_act": "gelu",
"v_hidden_dropout_prob": 0.1,
"v_hidden_size": 1024,
"v_initializer_range": 0.02,
"v_intermediate_size": 1024,
"v_num_attention_heads": 8,
"v_num_hidden_layers": 6,
"v_target_size": 1601,
"vocab_size": 30522,
"with_coattention": true
}

Fine-tune VilBERT on a different VQA dataset

Hello,

I am trying to fine-tune and evaluate the pre-trained VilBERT model on a different VQA dataset, but I am finding it a bit difficult to quickly do it. Could you please briefly describe the steps needed to fine-tune the model on a different VQA dataset?

Thank you!

Best,
Claudio

subprocess.CalledProcessError

Hi,
i want to use the pretrained model and fine-tune for VQA and i just run the commands as you provide :

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 train_tasks.py --bert_model bert-base-uncased --from_pretrained save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin --config_file config/bert_base_6layer_6conect.json --learning_rate 4e-5 --num_workers 16 --tasks 0 --save_name pretrained

but an error appears:

Traceback (most recent call last):
File "/cluster/home/chenjinjie/.conda/envs/vilbert/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/cluster/home/chenjinjie/.conda/envs/vilbert/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/cluster/home/chenjinjie/.conda/envs/vilbert/lib/python3.6/site-packages/torch/distributed/launch.py", line 263, in
main()
File "/cluster/home/chenjinjie/.conda/envs/vilbert/lib/python3.6/site-packages/torch/distributed/launch.py", line 259, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/cluster/home/chenjinjie/.conda/envs/vilbert/bin/python', '-u', 'train_tasks.py', '--local_rank=0', '--bert_model', 'bert-base-uncased', '--from_pretrained', 'save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin', '--config_file', 'config/bert_base_6layer_6conect.json', '--learning_rate', '4e-5', '--num_workers', '16', '--tasks', '0', '--save_name', 'pretrained']' died with <Signals.SIGABRT: 6>.

could you help? thanks

Referring Expression Evaluation

Trying to evaluate referring expressions shows the following error:
lmdb.Error: data/referExpression/refcoco+_resnet101_faster_rcnn_genome.lmdb: No such file or directory

Could you let me know how to get the file refcoco+_resnet101_faster_rcnn_genome.lmdb. thanks!

Recommend pytorch version?

Hello, I'm wondering what version of pytorch you used to run the code? I am running into some issues when I install with the latest version of pytorch (1.3)

image captioning with ViLBERT

Figure 5 in the paper shows samples of generated image descriptions, but I couldn't reproduce similar results using the pretrained ViLBERT. I have used the BertForMultiModalPreTraining and supplied as features the features of the image which seem to be OK, given that the prediction_scores_v (that is the hv vector in the paper) seeems to reflect what is in the picture. As the "question", I have supplied a tensor with 30 [MASK] tokens.
Then I have been, following the paper, passing that through the model 30 times and at each iteration setting ith token of the "question" (text stream) to the text token with the highest score at the ith position.
I have also tried repeating the procedure multiple times, but it didn't change much. This results in very poor captions, such as "the a man is a man who is a man who is a man ...".

Could you please elaborate on the captioning method you've presented in the publication?

i got an error for training VCR task!!!

i want to train vcr task.
i got an error like this.

(vilbert) ailab@ailab:~/vilbert_beta$ python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 train_tasks.py --bert_model bert-base-uncased --from_pretrained save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin  --config_file config/bert_base_6layer_6conect.json  --learning_rate 2e-5 --num_workers 16 --tasks 1-2 --save_name pretrained
2020-01-26 19:03:18.956063: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:18.956235: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:18.956249: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2020-01-26 19:03:19.029627: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.029808: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.029834: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2020-01-26 19:03:19.099775: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.099913: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.099927: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2020-01-26 19:03:19.122913: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.123137: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.123151: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2020-01-26 19:03:19.268342: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.268462: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.268473: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2020-01-26 19:03:19.272649: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.272879: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.272908: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2020-01-26 19:03:19.364081: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.364204: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.364222: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2020-01-26 19:03:19.481052: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.481345: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.481381: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
train_tasks.py:158: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  task_cfg = edict(yaml.load(f))
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1556653183467/work/torch/csrc/cuda/Module.cpp line=33 error=10 : invalid device ordinal
Traceback (most recent call last):
  File "train_tasks.py", line 434, in <module>
    main()
  File "train_tasks.py", line 201, in main
    torch.cuda.set_device(args.local_rank)
  File "/home/ailab/anaconda3/envs/vilbert/lib/python3.6/site-packages/torch/cuda/__init__.py", line 265, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: cuda runtime error (10) : invalid device ordinal at /opt/conda/conda-bld/pytorch_1556653183467/work/torch/csrc/cuda/Module.cpp:33
train_tasks.py:158: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  task_cfg = edict(yaml.load(f))
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1556653183467/work/torch/csrc/cuda/Module.cpp line=33 error=10 : invalid device ordinal
Traceback (most recent call last):
  File "train_tasks.py", line 434, in <module>
    main()
  File "train_tasks.py", line 201, in main
    torch.cuda.set_device(args.local_rank)
  File "/home/ailab/anaconda3/envs/vilbert/lib/python3.6/site-packages/torch/cuda/__init__.py", line 265, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: cuda runtime error (10) : invalid device ordinal at /opt/conda/conda-bld/pytorch_1556653183467/work/torch/csrc/cuda/Module.cpp:33
train_tasks.py:158: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  task_cfg = edict(yaml.load(f))
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1556653183467/work/torch/csrc/cuda/Module.cpp line=33 error=10 : invalid device ordinal
Traceback (most recent call last):
  File "train_tasks.py", line 434, in <module>
    main()
  File "train_tasks.py", line 201, in main
    torch.cuda.set_device(args.local_rank)
  File "/home/ailab/anaconda3/envs/vilbert/lib/python3.6/site-packages/torch/cuda/__init__.py", line 265, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: cuda runtime error (10) : invalid device ordinal at /opt/conda/conda-bld/pytorch_1556653183467/work/torch/csrc/cuda/Module.cpp:33
train_tasks.py:158: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  task_cfg = edict(yaml.load(f))
train_tasks.py:158: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  task_cfg = edict(yaml.load(f))
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1556653183467/work/torch/csrc/cuda/Module.cpp line=33 error=10 : invalid device ordinal
Traceback (most recent call last):
  File "train_tasks.py", line 434, in <module>
    main()
  File "train_tasks.py", line 201, in main
    torch.cuda.set_device(args.local_rank)
  File "/home/ailab/anaconda3/envs/vilbert/lib/python3.6/site-packages/torch/cuda/__init__.py", line 265, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: cuda runtime error (10) : invalid device ordinal at /opt/conda/conda-bld/pytorch_1556653183467/work/torch/csrc/cuda/Module.cpp:33
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1556653183467/work/torch/csrc/cuda/Module.cpp line=33 error=10 : invalid device ordinal
Traceback (most recent call last):
  File "train_tasks.py", line 434, in <module>
    main()
  File "train_tasks.py", line 201, in main
    torch.cuda.set_device(args.local_rank)
  File "/home/ailab/anaconda3/envs/vilbert/lib/python3.6/site-packages/torch/cuda/__init__.py", line 265, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: cuda runtime error (10) : invalid device ordinal at /opt/conda/conda-bld/pytorch_1556653183467/work/torch/csrc/cuda/Module.cpp:33
train_tasks.py:158: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  task_cfg = edict(yaml.load(f))
Traceback (most recent call last):
  File "train_tasks.py", line 434, in <module>
    main()
  File "train_tasks.py", line 205, in main
    torch.distributed.init_process_group(backend="nccl")
  File "/home/ailab/anaconda3/envs/vilbert/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 406, in init_process_group
    store, rank, world_size = next(rendezvous(url))
  File "/home/ailab/anaconda3/envs/vilbert/lib/python3.6/site-packages/torch/distributed/rendezvous.py", line 143, in _env_rendezvous_handler
    store = TCPStore(master_addr, master_port, world_size, start_daemon)
RuntimeError: Address already in use
train_tasks.py:158: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  task_cfg = edict(yaml.load(f))
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1556653183467/work/torch/csrc/cuda/Module.cpp line=33 error=10 : invalid device ordinal
Traceback (most recent call last):
  File "train_tasks.py", line 434, in <module>
    main()
  File "train_tasks.py", line 201, in main
    torch.cuda.set_device(args.local_rank)
  File "/home/ailab/anaconda3/envs/vilbert/lib/python3.6/site-packages/torch/cuda/__init__.py", line 265, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: cuda runtime error (10) : invalid device ordinal at /opt/conda/conda-bld/pytorch_1556653183467/work/torch/csrc/cuda/Module.cpp:33
train_tasks.py:158: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  task_cfg = edict(yaml.load(f))
01/26/2020 19:03:20 - INFO - __main__ -   device: cuda:1 n_gpu: 1, distributed training: True, 16-bits training: False
Traceback (most recent call last):
  File "/home/ailab/anaconda3/envs/vilbert/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/ailab/anaconda3/envs/vilbert/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/ailab/anaconda3/envs/vilbert/lib/python3.6/site-packages/torch/distributed/launch.py", line 235, in <module>
    main()
  File "/home/ailab/anaconda3/envs/vilbert/lib/python3.6/site-packages/torch/distributed/launch.py", line 231, in main
    cmd=process.args)
subprocess.CalledProcessError: Command '['/home/ailab/anaconda3/envs/vilbert/bin/python', '-u', 'train_tasks.py', '--local_rank=0', '--bert_model', 'bert-base-uncased', '--from_pretrained', 'save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin', '--config_file', 'config/bert_base_6layer_6conect.json', '--learning_rate', '2e-5', '--num_workers', '16', '--tasks', '1-2', '--save_name', 'pretrained']' returned non-zero exit status 1.
(vilbert) ailab@ailab:~/vilbert_beta$ 01/26/2020 19:03:21 - INFO - pytorch_pretrained_bert.tokenization -   loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at /home/ailab/.pytorch_pretrained_bert/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
01/26/2020 19:03:21 - INFO - vilbert.task_utils -   Loading VCR_Q-A Dataset with batch size 8
01/26/2020 19:03:35 - INFO - vilbert.task_utils -   Loading VCR_QA-R Dataset with batch size 8
01/26/2020 19:03:49 - INFO - vilbert.utils -   logging file at: VCR_Q-A-VCR_QA-R_bert_base_6layer_6conect-pretrained
01/26/2020 19:03:49 - ERROR - vilbert.vilbert -   Model name 'save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin' was not found in model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese). We assumed 'save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin' was a path or url but couldn't find any file associated to this path or url.
Traceback (most recent call last):
  File "train_tasks.py", line 434, in <module>
    main()
  File "train_tasks.py", line 263, in main
    model.to(device)
AttributeError: 'NoneType' object has no attribute 'to'

how do i do? help T^T

Is python-prctl required?

Hi there - thanks for your great work on vision and language pre-training! I'm trying to run the codebase, but I am running into issues install python-prctl, since I do not have sudo access. Is the package required? I do not see it used or imported in the codebase.

Thanks for your help!

pretrained vilbert model not found

Hi Jiasen,
I am trying to train (fine-tune) the downstream task for refcoco+... The code throws an error saying that pretrained vilbert model is not found. Is there a link where I can download the pretrained vilbert model.

The error that I get is the following:
ERROR - vilbert.vilbert - Model name 'save/bert_base_6_layer_6_connect_
freeze_0/pytorch_model_8.bin' was not found in model name list...

Thanks a lot!

Error when running the VQA task

I get the following error when running the VQA task:

Traceback (most recent call last):
  File "eval_tasks.py", line 228, in <module>
    main()
  File "eval_tasks.py", line 209, in main
    task_id, batch, model, task_dataloader_val, task_losses, results, others)
  File "/home/tobias/vilbert_beta/vilbert/task_utils.py", line 353, in EvaluatingModel
    question = question.view(-1, question.size(2))
IndexError: Dimension out of range (expected to be in range of [-2, 1], but got 2)

I am using the downloadable coco resnet features as data and the following command to run the script:

python eval_tasks.py --bert_model bert-base-uncased --from_pretrained \
	save/VQA_bert_base_6layer_6conect-pretrained/pytorch_model_19.bin \
	--config_file config/bert_base_6layer_6conect.json --task 0 --split test --batch_size 100

It seems that the tensors loaded from the data do not have the right dimensions?

Any help is appreciated! Thanks!

got bugs when evaluating the pre-trained VCR model

I got this wierd error message on a GPU cluster...
Please help me if you know.

_PyFunction_FastCallDict
_PyObject_FastCallDict
_PyObject_Call_Prepend
PyObject_Call

    _PyObject_FastCallDict

    _PyEval_EvalFrameDefault



    _PyEval_EvalFrameDefault


    _PyEval_EvalFrameDefault
    PyEval_EvalCodeEx
    PyEval_EvalCode

    PyRun_FileExFlags
    PyRun_SimpleFileExFlags
    Py_Main
    main
    __libc_start_main

*** End stack trace ***
Aborted

Would you release the multi-task fine-tuning codes for ViL-BERT?

Hi, I have read your new paper "12-in-1: Multi-Task Vision and Language Representation Learning" on Arxiv, which utilizes multi-task fine-tuning to boost the performance of Vil-BERT. May I ask whether you will release this part of code in this repo or in some other places? Thank you very much!

Pretrained models from bottom-up-attention not available anymore.

Dear Jiasen Lu,
first of all, I would like to congratulate you on your fine work!

Since I want to use ViLBERT on my own dataset, I would like to extract visual features from them, but with the same model as you used, so I do not need to train the whole model again. You described in "vilbert_beta" how you used "bottom-up-attention" for this. So going to that repository and following the installation guide, it says:
"Download pretrained model, and put it under data\faster_rcnn_models.", where the link to the pretrained model is the following: https://www.dropbox.com/s/tr24q7h0zm2wnjv/resnet101_faster_rcnn_final.caffemodel?dl=1

Unfortunately, the link is broken and I can not download the file. Could you point to me how to extract features for ViLBERT then? Thank you!

Features for visual genome

Hi,
Thanks for your work.
In addition to COCO features, could you also provide visual genome features?

Error when running zero-shot retrieval

Num Iters: {'TASK3': 10000}
Batch size: {'TASK3': 1}
Traceback (most recent call last):
File "eval_retrieval.py", line 275, in
main()
File "eval_retrieval.py", line 230, in main
score_matrix[caption_idx, image_idx*500:(image_idx+1)*500] = torch.softmax(vil_logit, dim=1)[:,0].view(-1).cpu().numpy()
ValueError: could not broadcast input array from shape (125) into shape (500)

Has anyone encountered that? Thank you!

Train ViLBERT for DownStream Tasks for VCR

I encountered an issue in:
File "/home/XXX/vilbert_beta/vilbert/vilbert.py", line 322, in forward
position_ids = position_ids.unsqueeze(0).expand_as(input_ids)
RuntimeError: The expanded size of the tensor (60) must match the existing size (4) at non-singleton dimension 2. Target sizes: [16, 4, 60]. Tensor sizes: [1, 4]

From the code we can see:
seq_length = input_ids.size(1)
position_ids = torch.arange(
seq_length, dtype=torch.long, device=input_ids.device
)
The input_ids is [16, 4, 60]
position_ids is [4]
How do we fit the 60?
Thanks!

image features of conceptual caption

Could you release Conceptual Caption features? These features maybe so heavy to upload. But I really want to retrain based on your code.

By the way I have a question about your number of streams study.
For your two-stream version, I found text stream used 12 bert layers and image stream used 6 image bert layers. These two streams would pass through a connection module with 6 layers.
For the single stream version, two streams would share 12 bert layers for encoding.
I don't think these two models are comparable.

Thanks a lot!

Model performance on VCR val split

I download the processed VCR data and achieve Q-->A performance about 72.217 with the shared checkpoint, which is slightly lower than the paper claimed. I have used my own processed data and achieve similar performance on both the fine-tuned checkpoint and the pretrained checkpoint after fine tuning on my processed data.

Feature Extraction of VCR

Can you provide us the data of global feature images and the detail of VCR's feature extraction?
I would be really greatfully if you can help me!

The dropbox link might require permissions from you. (It doesn't open.)

The dropbox link might require permissions from you. (It doesn't open.)

Also, we would like to train the model on RefCOCOg. Would it be possible to get some pointers on how to use your code to train the model on RefCOCOg (for example, how to map/format the MattNet features to your features features_h5path1 and features_h5path2). This would be very helpful. (Did you also try RefCOCOg in addition to RefCOCO+). Thanks a lot!

Originally posted by @arjunakula in #9 (comment)

Error when extract feature from faster rcnn

when I use generate_tsv to extract feature from pretrained caffe model, I got error
"Check failed: error == cudaSuccess (8 vs. 0) invalid device function"
which happend in this layer:
caffe::PoolingLayer<>::Forward_gpu().
I've checked my caffe version is able to run with gpu, I have no idea what's wrong with it.

Evaluation on multi-gpus

While evaluating on multiple gpus, we need to explicily add "-m torch.distributed.launch --nproc_per_node=1 --nnodes=1 --node_rank=0 " to the command. Otherwise, the evaluation takes a lot of time. (the evaluation script do not handle multiple gpus).

Example:
Instead of running as "python eval_tasks.py ....", run as "python -m torch.distributed.launch --nproc_per_node=1 --nnodes=1 --node_rank=0 eval_tasks.py"

Port to Pytorch Lightning Format

Hello,
We love your repo and thanks for open sourcing it. Are there plans to port this to Pytorch lightning ? We are trying to build upon Vilbert for the CVR task and wanted to know what we will need to change to run a pretrained model on the validation set of VCR.

Thanks !

how do you extract the feature of the whole image by Faster-RCNN?

Hi, @jiasenlu jiasenlu, I am new to object recognition, I have read your paper and, it's mentioned that you have used whole image as a box and extracted its feature by Faster-RCNN. How do you achieve that?
The RPN and Fast-RCNN are built into one caffe net model, do you use a customized Faster-RCNN?
Can anyone help?

The required pre-trained vilbert checkpoint is not released

I found that you have released the checkpoint bert_base_6_layer_6_connect/pytorch_model_9.bin, which should be the checkpoint of VilBert after pretraining. However, in the fine tuning phase, the pretrain parameter wants the save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin checkpoint rather than model_9. Are these two checkpoints the same one?

about make

When I use make in tools/refer I got the follow:

# install pycocotools/mask locally process_begin: CreateProcess(NULL, # install pycocotools/mask locally, ...) failed. make (e=2): 系统找不到指定的文件。 make: *** [all] Error 2

the documentation is not clear enough, and it is hard to replicate the results.

For the VCR project, can you show us how to organize the data/VCR folder.
I've tried to organize this folder many times in different ways, but I got errors each time.

In the vlbert.yml file, you claim
features_h5path1: data/VCR/VCR_resnet101_faster_rcnn_genome.lmdb
features_h5path2: data/VCR/VCR_gt_resnet101_faster_rcnn_genome.lmdb

I got the error: c
Traceback (most recent call last):
File "eval_tasks.py", line 228, in
main()
File "eval_tasks.py", line 169, in main
= LoadDatasetEval(args, task_cfg, args.tasks.split('-'))
File "/hpchome/carin/ss1043/vilbert_beta/vilbert/task_utils.py", line 279, in
args.in_memory)
File "/hpchome/carin/ss1043/vilbert_beta/vilbert/datasets/_image_features_rea
lock=False, readahead=False, meminit=False)
lmdb.Error: data/VCR/VCR_resnet101_faster_rcnn_genome.lmdb: Not a directory

In the dropbox data folder:
https://www.dropbox.com/sh/9pgxc3njd3iq03o/AADXgnT1HmEdrds7aujTncBGa?dl=0
If i organize data this way, I got another error:
Traceback (most recent call last):
File "eval_tasks.py", line 228, in
main()
File "eval_tasks.py", line 169, in main
= LoadDatasetEval(args, task_cfg, args.tasks.split('-'))
File "/hpchome/carin/ss1043/vilbert_beta/vilbert/task_utils.py", line 283, in LoadDatasetEval
task_feature_reader2[features_h5path] = ImageFeaturesH5Reader(features_h5path, args.in_memory)
File "/hpchome/carin/ss1043/vilbert_beta/vilbert/datasets/_image_features_reader.py", line 43, in init
lock=False, readahead=False, meminit=False)
lmdb.InvalidError: data/VCR/VCR_gt_resnet101_faster_rcnn_genome.lmdb: MDB_INVALID: File is not an LMDB file

Nan Loss when test VCR Q->A

After I configure all the environments and downloaded the data and pretrained model for VCR.
I followed your github to test VCR Q-A
python eval_tasks.py --bert_model bert-base-uncased --from_pretrained save/VCR_Q-A-VCR_QA-R_bert_base_6layer_6conect-pretrained/pytorch_model_19.bin --config_file config/bert_base_6layer_6conect.json --task 1 --split val

and I got nan loss:
Validation [VCR_Q-A]: loss nan score 24.715

then I print the loss in every batch:
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]]), tensor([[[0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]]), tensor([[[[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]], [[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]],

[[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]], `` [[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]]]]), tensor([1000007])] nan

My cuda version is 9.0 with pytorch 1.1
have you meet this kind of error? how did you solve it?
I am not sure whether this error is caused by apex.
I would be appreciate if you can solve my problem.

evaluation for image retrieval does not work. numpy array size mismatch

I get the the tensor size error at the end. The command I am running is this:
python eval_retrieval.py --bert_model bert-base-uncased --from_pretrained save/RetrievalFlickr30k_bert_base_6layer_6conect-pretrained/pytorch_model_19.bin --config_file config/bert_base_6layer_6conect.json --task 3 --split test --batch_size 1

I get the same error if I try the zeroshot also

python3 ./eval_retrieval.py --bert_model bert-base-uncased --from_pretrained save/bert_base_6_layer_6_connect/pytorch_model_9.bin --config_file config/bert_base_6layer_6conect.json --task 3 --split test --batch_size 1 --zero_shot


11/20/2019 16:54:03 - INFO - vilbert.vilbert -   Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex .
11/20/2019 16:54:03 - INFO - vilbert.basebert -   Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex .
11/20/2019 16:54:03 - INFO - __main__ -   device: cuda n_gpu: 2, distributed training: False, 16-bits training: False
11/20/2019 16:54:03 - INFO - pytorch_pretrained_bert.tokenization -   loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at /home/sadali/.pytorch_pretrained_bert/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
11/20/2019 16:54:03 - INFO - vilbert.task_utils -   Loading RetrievalFlickr30k Dataset with batch size 1
11/20/2019 16:54:07 - INFO - vilbert.vilbert -   loading archive file save/RetrievalFlickr30k_bert_base_6layer_6conect-pretrained/pytorch_model_19.bin
11/20/2019 16:54:07 - INFO - vilbert.vilbert -   Model config {
  "attention_probs_dropout_prob": 0.1,
  "bi_attention_type": 1,
  "bi_hidden_size": 1024,
  "bi_intermediate_size": 1024,
  "bi_num_attention_heads": 8,
  "fast_mode": true,
  "fixed_t_layer": 0,
  "fixed_v_layer": 0,
  "fusion_method": "mul",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "in_batch_pairs": false,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "intra_gate": false,
  "max_position_embeddings": 512,
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pooling_method": "mul",
  "predict_feature": false,
  "t_biattention_id": [
    6,
    7,
    8,
    9,
    10,
    11
  ],
  "type_vocab_size": 2,
  "v_attention_probs_dropout_prob": 0.1,
  "v_biattention_id": [
    0,
    1,
    2,
    3,
    4,
    5
  ],
  "v_feature_size": 2048,
  "v_hidden_act": "gelu",
  "v_hidden_dropout_prob": 0.1,
  "v_hidden_size": 1024,
  "v_initializer_range": 0.02,
  "v_intermediate_size": 1024,
  "v_num_attention_heads": 8,
  "v_num_hidden_layers": 6,
  "v_target_size": 1601,
  "vocab_size": 30522,
  "with_coattention": true
}

  Num Iters:  {'TASK3': 10000}
  Batch size:  {'TASK3': 1}
Traceback (most recent call last):
  File "eval_retrieval.py", line 275, in <module>
    main()
  File "eval_retrieval.py", line 235, in main
    score_matrix[caption_idx, image_idx*500:(image_idx+1)*500] = vil_logit.view(-1).cpu().numpy()
ValueError: could not broadcast input array from shape (250) into shape (500)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.