jiasenlu / vilbert_beta Goto Github PK

Python 40.72% Makefile 0.01% Jupyter Notebook 59.27%

vilbert_beta's Issues

performance of Image retrieval on flivkr30k

The best finetune result(finetune from the pretrain model you published) I get is 56.62,83.96,90.56 which is still 1.6 lower than your reported result, furthermore, the zero-shot evaluation result from your public conceptual pretrained model is only 26.83,56.43,68.92, is it the right one? I wish to know more training details, like the command and out file of each pretrained and finetune model, and I have some questions:

Do you use freeze param during finetune or only pretrain?
How do you calculate the hard negative? What kind of image feature do you use to calculate the similarity(ROI feature or other features like resnet or dense net for full image)?
How do you set the LR decay epoch for finetuning? I see from the out file that looks like 0.2 with [11,13,15,17]
The pretrain code output log shows LR=0, is it normal? (because I see that you set different decay weight for different params)

Thanks a lot!

By the way, I finetuned image retrieval on flickr 30k with these params:
Namespace(baseline=False, bert_model='bert-base-uncased', compact=False, config_file='config/bert_base_6layer_6conect.json', do_lower_case=True, evaluation_interval=1, fp16=False, freeze=-1, from_pretrained='/conceptual_pretrained_bert_base_6_layer_6_connect_freeze_0/pytorch_model_9.bin', gradient_accumulation_steps=1, in_memory=False, learning_rate=2e-05, local_rank=0, loss_scale=0, lr_scheduler='mannul', no_cuda=False, num_train_epochs=20, num_workers=9, optimizer='BertAdam', output_dir='/model/vilbert/', save_name='finetune_retrieval_2', seed=0, tasks='3', use_chunk=0, vision_scratch=False, warmup_proportion=0.1)

{
"attention_probs_dropout_prob": 0.1,
"bi_attention_type": 1,
"bi_hidden_size": 1024,
"bi_intermediate_size": 1024,
"bi_num_attention_heads": 8,
"fast_mode": false,
"fixed_t_layer": 0,
"fixed_v_layer": 0,
"fusion_method": "mul",
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"in_batch_pairs": false,
"initializer_range": 0.02,
"intermediate_size": 3072,
"intra_gate": false,
"max_position_embeddings": 512,
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pooling_method": "mul",
"predict_feature": false,
"t_biattention_id": [
6,
7,
8,
9,
10,
11
],
"type_vocab_size": 2,
"v_attention_probs_dropout_prob": 0.1,
"v_biattention_id": [
0,
1,
2,
3,
4,
5
],
"v_feature_size": 2048,
"v_hidden_act": "gelu",
"v_hidden_dropout_prob": 0.1,
"v_hidden_size": 1024,
"v_initializer_range": 0.02,
"v_intermediate_size": 1024,
"v_num_attention_heads": 8,
"v_num_hidden_layers": 6,
"v_target_size": 1601,
"vocab_size": 30522,
"with_coattention": true
}

What's your pretraining time per epoch?

When I run my code developed from yours, it takes about 3days per epoch training in 64 batch size with 8 GPUS.

Fine-tune VilBERT on a different VQA dataset

Hello,

I am trying to fine-tune and evaluate the pre-trained VilBERT model on a different VQA dataset, but I am finding it a bit difficult to quickly do it. Could you please briefly describe the steps needed to fine-tune the model on a different VQA dataset?

Thank you!

Best,
Claudio

subprocess.CalledProcessError

Hi,
i want to use the pretrained model and fine-tune for VQA and i just run the commands as you provide :

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 train_tasks.py --bert_model bert-base-uncased --from_pretrained save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin --config_file config/bert_base_6layer_6conect.json --learning_rate 4e-5 --num_workers 16 --tasks 0 --save_name pretrained

but an error appears:

Traceback (most recent call last):
File "/cluster/home/chenjinjie/.conda/envs/vilbert/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/cluster/home/chenjinjie/.conda/envs/vilbert/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/cluster/home/chenjinjie/.conda/envs/vilbert/lib/python3.6/site-packages/torch/distributed/launch.py", line 263, in
main()
File "/cluster/home/chenjinjie/.conda/envs/vilbert/lib/python3.6/site-packages/torch/distributed/launch.py", line 259, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/cluster/home/chenjinjie/.conda/envs/vilbert/bin/python', '-u', 'train_tasks.py', '--local_rank=0', '--bert_model', 'bert-base-uncased', '--from_pretrained', 'save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin', '--config_file', 'config/bert_base_6layer_6conect.json', '--learning_rate', '4e-5', '--num_workers', '16', '--tasks', '0', '--save_name', 'pretrained']' died with <Signals.SIGABRT: 6>.

could you help? thanks

Referring Expression Evaluation

Trying to evaluate referring expressions shows the following error:
lmdb.Error: data/referExpression/refcoco+_resnet101_faster_rcnn_genome.lmdb: No such file or directory

Could you let me know how to get the file refcoco+_resnet101_faster_rcnn_genome.lmdb. thanks!

About the file

where can i download RetrievalFlickr30k_train_30.pkl

custom dataset inference

HI
is it possible to evaluate on a custom images-dataset?

Got an error when training at VQA: FileNotFoundError: [Errno 2] No such file or directory:'/. . . /VQA/cache/train_target.pkl'

Hello, I reported an error when training at VQA according to official instructions:
FileNotFoundError: [Errno 2] No such file or directory:'/data/. . . /VQA/cache/train_target.pkl'
Could you tell me how to get this pkl file? thank you very much.

Recommend pytorch version?

Hello, I'm wondering what version of pytorch you used to run the code? I am running into some issues when I install with the latest version of pytorch (1.3)

image captioning with ViLBERT

Figure 5 in the paper shows samples of generated image descriptions, but I couldn't reproduce similar results using the pretrained ViLBERT. I have used the BertForMultiModalPreTraining and supplied as features the features of the image which seem to be OK, given that the prediction_scores_v (that is the hv vector in the paper) seeems to reflect what is in the picture. As the "question", I have supplied a tensor with 30 [MASK] tokens.
Then I have been, following the paper, passing that through the model 30 times and at each iteration setting ith token of the "question" (text stream) to the text token with the highest score at the ith position.
I have also tried repeating the procedure multiple times, but it didn't change much. This results in very poor captions, such as "the a man is a man who is a man who is a man ...".

Could you please elaborate on the captioning method you've presented in the publication?

The single-stream model pre-trained checkpoint

Just as a comparison, would you like to release the pretrained single-stream bert?

i got an error for training VCR task!!!

i want to train vcr task.
i got an error like this.

(vilbert) ailab@ailab:~/vilbert_beta$ python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 train_tasks.py --bert_model bert-base-uncased --from_pretrained save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin  --config_file config/bert_base_6layer_6conect.json  --learning_rate 2e-5 --num_workers 16 --tasks 1-2 --save_name pretrained
2020-01-26 19:03:18.956063: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:18.956235: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:18.956249: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2020-01-26 19:03:19.029627: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.029808: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.029834: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2020-01-26 19:03:19.099775: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.099913: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.099927: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2020-01-26 19:03:19.122913: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.123137: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.123151: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2020-01-26 19:03:19.268342: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.268462: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.268473: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2020-01-26 19:03:19.272649: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.272879: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.272908: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2020-01-26 19:03:19.364081: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.364204: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.364222: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2020-01-26 19:03:19.481052: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.481345: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.481381: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
train_tasks.py:158: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  task_cfg = edict(yaml.load(f))
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1556653183467/work/torch/csrc/cuda/Module.cpp line=33 error=10 : invalid device ordinal
Traceback (most recent call last):
  File "train_tasks.py", line 434, in <module>
    main()
  File "train_tasks.py", line 201, in main
    torch.cuda.set_device(args.local_rank)
  File "/home/ailab/anaconda3/envs/vilbert/lib/python3.6/site-packages/torch/cuda/__init__.py", line 265, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: cuda runtime error (10) : invalid device ordinal at /opt/conda/conda-bld/pytorch_1556653183467/work/torch/csrc/cuda/Module.cpp:33
train_tasks.py:158: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  task_cfg = edict(yaml.load(f))
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1556653183467/work/torch/csrc/cuda/Module.cpp line=33 error=10 : invalid device ordinal
Traceback (most recent call last):
  File "train_tasks.py", line 434, in <module>
    main()
  File "train_tasks.py", line 201, in main
    torch.cuda.set_device(args.local_rank)
  File "/home/ailab/anaconda3/envs/vilbert/lib/python3.6/site-packages/torch/cuda/__init__.py", line 265, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: cuda runtime error (10) : invalid device ordinal at /opt/conda/conda-bld/pytorch_1556653183467/work/torch/csrc/cuda/Module.cpp:33
train_tasks.py:158: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  task_cfg = edict(yaml.load(f))
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1556653183467/work/torch/csrc/cuda/Module.cpp line=33 error=10 : invalid device ordinal
Traceback (most recent call last):
  File "train_tasks.py", line 434, in <module>
    main()
  File "train_tasks.py", line 201, in main
    torch.cuda.set_device(args.local_rank)
  File "/home/ailab/anaconda3/envs/vilbert/lib/python3.6/site-packages/torch/cuda/__init__.py", line 265, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: cuda runtime error (10) : invalid device ordinal at /opt/conda/conda-bld/pytorch_1556653183467/work/torch/csrc/cuda/Module.cpp:33
train_tasks.py:158: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  task_cfg = edict(yaml.load(f))
train_tasks.py:158: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  task_cfg = edict(yaml.load(f))
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1556653183467/work/torch/csrc/cuda/Module.cpp line=33 error=10 : invalid device ordinal
Traceback (most recent call last):
  File "train_tasks.py", line 434, in <module>
    main()
  File "train_tasks.py", line 201, in main
    torch.cuda.set_device(args.local_rank)
  File "/home/ailab/anaconda3/envs/vilbert/lib/python3.6/site-packages/torch/cuda/__init__.py", line 265, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: cuda runtime error (10) : invalid device ordinal at /opt/conda/conda-bld/pytorch_1556653183467/work/torch/csrc/cuda/Module.cpp:33
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1556653183467/work/torch/csrc/cuda/Module.cpp line=33 error=10 : invalid device ordinal
Traceback (most recent call last):
  File "train_tasks.py", line 434, in <module>
    main()
  File "train_tasks.py", line 201, in main
    torch.cuda.set_device(args.local_rank)
  File "/home/ailab/anaconda3/envs/vilbert/lib/python3.6/site-packages/torch/cuda/__init__.py", line 265, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: cuda runtime error (10) : invalid device ordinal at /opt/conda/conda-bld/pytorch_1556653183467/work/torch/csrc/cuda/Module.cpp:33
train_tasks.py:158: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  task_cfg = edict(yaml.load(f))
Traceback (most recent call last):
  File "train_tasks.py", line 434, in <module>
    main()
  File "train_tasks.py", line 205, in main
    torch.distributed.init_process_group(backend="nccl")
  File "/home/ailab/anaconda3/envs/vilbert/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 406, in init_process_group
    store, rank, world_size = next(rendezvous(url))
  File "/home/ailab/anaconda3/envs/vilbert/lib/python3.6/site-packages/torch/distributed/rendezvous.py", line 143, in _env_rendezvous_handler
    store = TCPStore(master_addr, master_port, world_size, start_daemon)
RuntimeError: Address already in use
train_tasks.py:158: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  task_cfg = edict(yaml.load(f))
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1556653183467/work/torch/csrc/cuda/Module.cpp line=33 error=10 : invalid device ordinal
Traceback (most recent call last):
  File "train_tasks.py", line 434, in <module>
    main()
  File "train_tasks.py", line 201, in main
    torch.cuda.set_device(args.local_rank)
  File "/home/ailab/anaconda3/envs/vilbert/lib/python3.6/site-packages/torch/cuda/__init__.py", line 265, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: cuda runtime error (10) : invalid device ordinal at /opt/conda/conda-bld/pytorch_1556653183467/work/torch/csrc/cuda/Module.cpp:33
train_tasks.py:158: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  task_cfg = edict(yaml.load(f))
01/26/2020 19:03:20 - INFO - __main__ -   device: cuda:1 n_gpu: 1, distributed training: True, 16-bits training: False
Traceback (most recent call last):
  File "/home/ailab/anaconda3/envs/vilbert/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/ailab/anaconda3/envs/vilbert/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/ailab/anaconda3/envs/vilbert/lib/python3.6/site-packages/torch/distributed/launch.py", line 235, in <module>
    main()
  File "/home/ailab/anaconda3/envs/vilbert/lib/python3.6/site-packages/torch/distributed/launch.py", line 231, in main
    cmd=process.args)
subprocess.CalledProcessError: Command '['/home/ailab/anaconda3/envs/vilbert/bin/python', '-u', 'train_tasks.py', '--local_rank=0', '--bert_model', 'bert-base-uncased', '--from_pretrained', 'save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin', '--config_file', 'config/bert_base_6layer_6conect.json', '--learning_rate', '2e-5', '--num_workers', '16', '--tasks', '1-2', '--save_name', 'pretrained']' returned non-zero exit status 1.
(vilbert) ailab@ailab:~/vilbert_beta$ 01/26/2020 19:03:21 - INFO - pytorch_pretrained_bert.tokenization -   loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at /home/ailab/.pytorch_pretrained_bert/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
01/26/2020 19:03:21 - INFO - vilbert.task_utils -   Loading VCR_Q-A Dataset with batch size 8
01/26/2020 19:03:35 - INFO - vilbert.task_utils -   Loading VCR_QA-R Dataset with batch size 8
01/26/2020 19:03:49 - INFO - vilbert.utils -   logging file at: VCR_Q-A-VCR_QA-R_bert_base_6layer_6conect-pretrained
01/26/2020 19:03:49 - ERROR - vilbert.vilbert -   Model name 'save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin' was not found in model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese). We assumed 'save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin' was a path or url but couldn't find any file associated to this path or url.
Traceback (most recent call last):
  File "train_tasks.py", line 434, in <module>
    main()
  File "train_tasks.py", line 263, in main
    model.to(device)
AttributeError: 'NoneType' object has no attribute 'to'

how do i do? help T^T

Is python-prctl required?

Hi there - thanks for your great work on vision and language pre-training! I'm trying to run the codebase, but I am running into issues install python-prctl, since I do not have sudo access. Is the package required? I do not see it used or imported in the codebase.

Thanks for your help!

pretrained vilbert model not found

Hi Jiasen,
I am trying to train (fine-tune) the downstream task for refcoco+... The code throws an error saying that pretrained vilbert model is not found. Is there a link where I can download the pretrained vilbert model.

The error that I get is the following:
ERROR - vilbert.vilbert - Model name 'save/bert_base_6_layer_6_connect_
freeze_0/pytorch_model_8.bin' was not found in model name list...

Thanks a lot!

Error when running the VQA task

I get the following error when running the VQA task:

Traceback (most recent call last):
  File "eval_tasks.py", line 228, in <module>
    main()
  File "eval_tasks.py", line 209, in main
    task_id, batch, model, task_dataloader_val, task_losses, results, others)
  File "/home/tobias/vilbert_beta/vilbert/task_utils.py", line 353, in EvaluatingModel
    question = question.view(-1, question.size(2))
IndexError: Dimension out of range (expected to be in range of [-2, 1], but got 2)

I am using the downloadable coco resnet features as data and the following command to run the script:

python eval_tasks.py --bert_model bert-base-uncased --from_pretrained \
	save/VQA_bert_base_6layer_6conect-pretrained/pytorch_model_19.bin \
	--config_file config/bert_base_6layer_6conect.json --task 0 --split test --batch_size 100

It seems that the tensors loaded from the data do not have the right dimensions?

Any help is appreciated! Thanks!

got bugs when evaluating the pre-trained VCR model

I got this wierd error message on a GPU cluster...
Please help me if you know.

_PyFunction_FastCallDict
_PyObject_FastCallDict
_PyObject_Call_Prepend
PyObject_Call

    _PyObject_FastCallDict

    _PyEval_EvalFrameDefault



    _PyEval_EvalFrameDefault


    _PyEval_EvalFrameDefault
    PyEval_EvalCodeEx
    PyEval_EvalCode

    PyRun_FileExFlags
    PyRun_SimpleFileExFlags
    Py_Main
    main
    __libc_start_main

*** End stack trace ***
Aborted

No module named 'fused_layer_norm_cuda' error using apex

Do you also get this error? If yes, how did you solve it?

what is the amount of time to pretrain the whole model?

Hi I prepare to pretrain a similar model, but I am not sure how much resources does it require.
could you kindly let me know what type/how many gpus do you use and how long did you spend on pretraining?

Would you release the multi-task fine-tuning codes for ViL-BERT?

Hi, I have read your new paper "12-in-1: Multi-Task Vision and Language Representation Learning" on Arxiv, which utilizes multi-task fine-tuning to boost the performance of Vil-BERT. May I ask whether you will release this part of code in this repo or in some other places? Thank you very much!

dependency is out-of-date ?

pytorch-transformers 1.1 is out: https://github.com/huggingface/pytorch-transformers

Pretrained models from bottom-up-attention not available anymore.

Dear Jiasen Lu,
first of all, I would like to congratulate you on your fine work!

Since I want to use ViLBERT on my own dataset, I would like to extract visual features from them, but with the same model as you used, so I do not need to train the whole model again. You described in "vilbert_beta" how you used "bottom-up-attention" for this. So going to that repository and following the installation guide, it says:
"Download pretrained model, and put it under data\faster_rcnn_models.", where the link to the pretrained model is the following: https://www.dropbox.com/s/tr24q7h0zm2wnjv/resnet101_faster_rcnn_final.caffemodel?dl=1

Unfortunately, the link is broken and I can not download the file. Could you point to me how to extract features for ViLBERT then? Thank you!

Features for visual genome

Hi,
Thanks for your work.
In addition to COCO features, could you also provide visual genome features?

Error when running zero-shot retrieval

Num Iters: {'TASK3': 10000}
Batch size: {'TASK3': 1}
Traceback (most recent call last):
File "eval_retrieval.py", line 275, in
main()
File "eval_retrieval.py", line 230, in main
score_matrix[caption_idx, image_idx*500:(image_idx+1)*500] = torch.softmax(vil_logit, dim=1)[:,0].view(-1).cpu().numpy()
ValueError: could not broadcast input array from shape (125) into shape (500)

Has anyone encountered that? Thank you!

Train ViLBERT for DownStream Tasks for VCR

I encountered an issue in:
File "/home/XXX/vilbert_beta/vilbert/vilbert.py", line 322, in forward
position_ids = position_ids.unsqueeze(0).expand_as(input_ids)
RuntimeError: The expanded size of the tensor (60) must match the existing size (4) at non-singleton dimension 2. Target sizes: [16, 4, 60]. Tensor sizes: [1, 4]

From the code we can see:
seq_length = input_ids.size(1)
position_ids = torch.arange(
seq_length, dtype=torch.long, device=input_ids.device
)
The input_ids is [16, 4, 60]
position_ids is [4]
How do we fit the 60?
Thanks!

image features of conceptual caption

Could you release Conceptual Caption features? These features maybe so heavy to upload. But I really want to retrain based on your code.

By the way I have a question about your number of streams study.
For your two-stream version, I found text stream used 12 bert layers and image stream used 6 image bert layers. These two streams would pass through a connection module with 6 layers.
For the single stream version, two streams would share 12 bert layers for encoding.
I don't think these two models are comparable.

Thanks a lot!

Model performance on VCR val split

I download the processed VCR data and achieve Q-->A performance about 72.217 with the shared checkpoint, which is slightly lower than the paper claimed. I have used my own processed data and achieve similar performance on both the fine-tuned checkpoint and the pretrained checkpoint after fine tuning on my processed data.

Feature Extraction of VCR

Can you provide us the data of global feature images and the detail of VCR's feature extraction?
I would be really greatfully if you can help me!

Cannot download ViLBERT 8-Layer for Conceptual Caption

lmdb pre processing file for VQA

Could you please share the pre-processing file to convert to lmdb for vqa.

Distributed Data Parallel hangs

While fine tuning VCR task in Distributed Data Parallel mode, it hangs when loading model to gpu.

The dropbox link might require permissions from you. (It doesn't open.)

Also, we would like to train the model on RefCOCOg. Would it be possible to get some pointers on how to use your code to train the model on RefCOCOg (for example, how to map/format the MattNet features to your features features_h5path1 and features_h5path2). This would be very helpful. (Did you also try RefCOCOg in addition to RefCOCO+). Thanks a lot!

Originally posted by @arjunakula in #9 (comment)

No such file or directory: 'data/VCR/val.jsonl'

In the 'vlbert_tasks.yml', which files do train.jsonl and val.jsonl refers to?

train_annotations_jsonpath: data/VCR/train.jsonl
val_annotations_jsonpath: data/VCR/val.jsonl

Thanks!

Command for Visiolinguistic Pre-training

Thank you!

Error when extract feature from faster rcnn

when I use generate_tsv to extract feature from pretrained caffe model, I got error
"Check failed: error == cudaSuccess (8 vs. 0) invalid device function"
which happend in this layer:
caffe::PoolingLayer<>::Forward_gpu().
I've checked my caffe version is able to run with gpu, I have no idea what's wrong with it.

Evaluation on multi-gpus

While evaluating on multiple gpus, we need to explicily add "-m torch.distributed.launch --nproc_per_node=1 --nnodes=1 --node_rank=0 " to the command. Otherwise, the evaluation takes a lot of time. (the evaluation script do not handle multiple gpus).

Example:
Instead of running as "python eval_tasks.py ....", run as "python -m torch.distributed.launch --nproc_per_node=1 --nnodes=1 --node_rank=0 eval_tasks.py"

Port to Pytorch Lightning Format

Hello,
We love your repo and thanks for open sourcing it. Are there plans to port this to Pytorch lightning ? We are trying to build upon Vilbert for the CVR task and wanted to know what we will need to change to run a pretrained model on the validation set of VCR.

Thanks !

dropbox files are too large to download

Can you please provide google drive links or amazon s3 links?

how do you extract the feature of the whole image by Faster-RCNN?

Hi, @jiasenlu jiasenlu, I am new to object recognition, I have read your paper and, it's mentioned that you have used whole image as a box and extracted its feature by Faster-RCNN. How do you achieve that?
The RPN and Fast-RCNN are built into one caffe net model, do you use a customized Faster-RCNN?
Can anyone help?

The download link for VCR feature is incorrect.

The download link for VCR is actually coco feature
I think it's a mistake.

Does the number '8' in '8-layer' match to the 'k' in 'kx' in the paper?

i can't download Conceptual Captions pretrained model

I think I downloaded it before, but I get a 404 Not Found error. How should I download it? Should I get a Pretrained model of Conceptual Captions from the Vilbert-Multi-Task github?

The required pre-trained vilbert checkpoint is not released

I found that you have released the checkpoint bert_base_6_layer_6_connect/pytorch_model_9.bin, which should be the checkpoint of VilBert after pretraining. However, in the fine tuning phase, the pretrain parameter wants the save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin checkpoint rather than model_9. Are these two checkpoints the same one?

about make

When I use make in tools/refer I got the follow:

# install pycocotools/mask locally process_begin: CreateProcess(NULL, # install pycocotools/mask locally, ...) failed. make (e=2): 系统找不到指定的文件。 make: *** [all] Error 2

Where to find caption_train.json file

Hi, I am running this 12 in 1 multi-task paper. Can I have an ask where you find the caption_train.json file during pre-training? Thanks

Evaluate Model question_id

Should

vilbert_beta/vilbert/datasets/vcr_dataset.py

Line 327 in eee2a69

anno_id = entry["img_id"]

be anno_id = entry["anno_id"]?
I believe img_id is not unique per entry

the documentation is not clear enough, and it is hard to replicate the results.

For the VCR project, can you show us how to organize the data/VCR folder.
I've tried to organize this folder many times in different ways, but I got errors each time.

In the vlbert.yml file, you claim
features_h5path1: data/VCR/VCR_resnet101_faster_rcnn_genome.lmdb
features_h5path2: data/VCR/VCR_gt_resnet101_faster_rcnn_genome.lmdb

I got the error: c
Traceback (most recent call last):
File "eval_tasks.py", line 228, in
main()
File "eval_tasks.py", line 169, in main
= LoadDatasetEval(args, task_cfg, args.tasks.split('-'))
File "/hpchome/carin/ss1043/vilbert_beta/vilbert/task_utils.py", line 279, in
args.in_memory)
File "/hpchome/carin/ss1043/vilbert_beta/vilbert/datasets/_image_features_rea
lock=False, readahead=False, meminit=False)
lmdb.Error: data/VCR/VCR_resnet101_faster_rcnn_genome.lmdb: Not a directory

In the dropbox data folder:
https://www.dropbox.com/sh/9pgxc3njd3iq03o/AADXgnT1HmEdrds7aujTncBGa?dl=0
If i organize data this way, I got another error:
Traceback (most recent call last):
File "eval_tasks.py", line 228, in
main()
File "eval_tasks.py", line 169, in main
= LoadDatasetEval(args, task_cfg, args.tasks.split('-'))
File "/hpchome/carin/ss1043/vilbert_beta/vilbert/task_utils.py", line 283, in LoadDatasetEval
task_feature_reader2[features_h5path] = ImageFeaturesH5Reader(features_h5path, args.in_memory)
File "/hpchome/carin/ss1043/vilbert_beta/vilbert/datasets/_image_features_reader.py", line 43, in init
lock=False, readahead=False, meminit=False)
lmdb.InvalidError: data/VCR/VCR_gt_resnet101_faster_rcnn_genome.lmdb: MDB_INVALID: File is not an LMDB file

Nan Loss when test VCR Q->A

After I configure all the environments and downloaded the data and pretrained model for VCR.
I followed your github to test VCR Q-A
python eval_tasks.py --bert_model bert-base-uncased --from_pretrained save/VCR_Q-A-VCR_QA-R_bert_base_6layer_6conect-pretrained/pytorch_model_19.bin --config_file config/bert_base_6layer_6conect.json --task 1 --split val

and I got nan loss：
Validation [VCR_Q-A]: loss nan score 24.715

then I print the loss in every batch：
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]]), tensor([[[0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]]), tensor([[[[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]], [[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]],

[[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]], `` [[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]]]]), tensor([1000007])] nan

My cuda version is 9.0 with pytorch 1.1
have you meet this kind of error? how did you solve it?
I am not sure whether this error is caused by apex.
I would be appreciate if you can solve my problem.

issue when downloading conceptual captions

When I run download_data.py and finish it with outputs "saved", I just get the reprot.tsv.gz file but not the training and validation folder with origin data.

evaluation for image retrieval does not work. numpy array size mismatch

I get the the tensor size error at the end. The command I am running is this:
python eval_retrieval.py --bert_model bert-base-uncased --from_pretrained save/RetrievalFlickr30k_bert_base_6layer_6conect-pretrained/pytorch_model_19.bin --config_file config/bert_base_6layer_6conect.json --task 3 --split test --batch_size 1

I get the same error if I try the zeroshot also

python3 ./eval_retrieval.py --bert_model bert-base-uncased --from_pretrained save/bert_base_6_layer_6_connect/pytorch_model_9.bin --config_file config/bert_base_6layer_6conect.json --task 3 --split test --batch_size 1 --zero_shot


11/20/2019 16:54:03 - INFO - vilbert.vilbert -   Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex .
11/20/2019 16:54:03 - INFO - vilbert.basebert -   Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex .
11/20/2019 16:54:03 - INFO - __main__ -   device: cuda n_gpu: 2, distributed training: False, 16-bits training: False
11/20/2019 16:54:03 - INFO - pytorch_pretrained_bert.tokenization -   loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at /home/sadali/.pytorch_pretrained_bert/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
11/20/2019 16:54:03 - INFO - vilbert.task_utils -   Loading RetrievalFlickr30k Dataset with batch size 1
11/20/2019 16:54:07 - INFO - vilbert.vilbert -   loading archive file save/RetrievalFlickr30k_bert_base_6layer_6conect-pretrained/pytorch_model_19.bin
11/20/2019 16:54:07 - INFO - vilbert.vilbert -   Model config {
  "attention_probs_dropout_prob": 0.1,
  "bi_attention_type": 1,
  "bi_hidden_size": 1024,
  "bi_intermediate_size": 1024,
  "bi_num_attention_heads": 8,
  "fast_mode": true,
  "fixed_t_layer": 0,
  "fixed_v_layer": 0,
  "fusion_method": "mul",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "in_batch_pairs": false,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "intra_gate": false,
  "max_position_embeddings": 512,
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pooling_method": "mul",
  "predict_feature": false,
  "t_biattention_id": [
    6,
    7,
    8,
    9,
    10,
    11
  ],
  "type_vocab_size": 2,
  "v_attention_probs_dropout_prob": 0.1,
  "v_biattention_id": [
    0,
    1,
    2,
    3,
    4,
    5
  ],
  "v_feature_size": 2048,
  "v_hidden_act": "gelu",
  "v_hidden_dropout_prob": 0.1,
  "v_hidden_size": 1024,
  "v_initializer_range": 0.02,
  "v_intermediate_size": 1024,
  "v_num_attention_heads": 8,
  "v_num_hidden_layers": 6,
  "v_target_size": 1601,
  "vocab_size": 30522,
  "with_coattention": true
}

  Num Iters:  {'TASK3': 10000}
  Batch size:  {'TASK3': 1}
Traceback (most recent call last):
  File "eval_retrieval.py", line 275, in <module>
    main()
  File "eval_retrieval.py", line 235, in main
    score_matrix[caption_idx, image_idx*500:(image_idx+1)*500] = vil_logit.view(-1).cpu().numpy()
ValueError: could not broadcast input array from shape (250) into shape (500)

Error "FileNotFoundError: [Errno 2] No such file or directory: 'data/VQA/cache/trainval_ans2label.pkl'"

I got this error when I ran pre-trained VQA task. Does anybody know how to solve this?

jiasenlu / vilbert_beta Goto Github PK

vilbert_beta's Issues

Recommend Projects

Recommend Topics

Recommend Org