Git Product home page Git Product logo

vilbert_beta's Introduction

ViLBERT

ViLBERT_beta has been deprecated. Please see vilbert-multi-task, which includes implementations for 12-in-1: Multi-Task Vision and Language Representation Learning

Code and pre-trained models for ViLBERT: Pretraining Task-Agnostic VisiolinguisticRepresentations for Vision-and-Language Tasks.

*Note: This codebase is still in beta release to replicate the paper's preformance. *

Repository Setup

  1. Create a fresh conda environment, and install all dependencies.
conda create -n vilbert python=3.6
conda activate vilbert
git clone https://github.com/jiasenlu/vilbert_beta
cd vilbert_beta
pip install -r requirements.txt
  1. Install pytorch
conda install pytorch torchvision cudatoolkit=10.0 -c pytorch
  1. Install apx, follows https://github.com/NVIDIA/apex

  2. compile tools

cd tools/refer
make

Data Setup

Check README.md under data for more details. Check vlbert_tasks.yml for more details.

Pre-trained model for Evaluation

Model Objective Link
ViLBERT 2-Layer Conceptual Caption Google Drive
ViLBERT 4-Layer Conceptual Caption Google Drive
ViLBERT 6-Layer Conceptual Caption Google Drive
ViLBERT 8-Layer Conceptual Caption Google Drive
ViLBERT 6-Layer VQA Google Drive
ViLBERT 6-Layer VCR Google Drive
ViLBERT 6-Layer RefCOCO+ Google Drive
ViLBERT 6-Layer Image Retrieval Google Drive

Evaluation

Zero-Shot Image Retrieval

We can directly use the Pre-trained ViLBERT model for zero-shot image retrieval tasks on Flickr30k.

1: Download the pretrained model with objective Conceptual Caption and put it under save

2: Update featyres_h5path1 and val_annotations_jsonpath in vlbert_task.yml to load the Flickr30k testset image feature and jsonfile (defualt is training feature).

3: Use the following command to evaluate pre-trained 6 layer ViLBERT model. (only support single GPU for evaluation now):

python eval_retrieval.py --bert_model bert-base-uncased --from_pretrained save/bert_base_6_layer_6_connect/pytorch_model_9.bin --config_file config/bert_base_6layer_6conect.json --task 3 --split test --batch_size 1 --zero_shot

Image Retrieval

1: Download the pretrained model with objective Image Retrieval and put it under save

2: Update featyres_h5path1 and val_annotations_jsonpath in vlbert_task.yml to load the Flickr30k testset image feature and jsonfile (defualt is training feature).

3: Use the following command to evaluate pre-trained 6 layer ViLBERT model. (only support single GPU for evaluation now):

python eval_retrieval.py --bert_model bert-base-uncased --from_pretrained save/RetrievalFlickr30k_bert_base_6layer_6conect-pretrained/pytorch_model_19.bin --config_file config/bert_base_6layer_6conect.json --task 3 --split test --batch_size 1

VQA

1: Download the pretrained model with objective VQA and put it under save

2: To test on held out validation split, use the following command:

python eval_tasks.py --bert_model bert-base-uncased --from_pretrained save/VQA_bert_base_6layer_6conect-pretrained/pytorch_model_19.bin --config_file config/bert_base_6layer_6conect.json --task 0 --split minval

VCR

1: Download the pretrained model with objective VCR and put it under save

2: To test on VCR Q->A

python eval_tasks.py --bert_model bert-base-uncased --from_pretrained save/VCR_Q-A-VCR_QA-R_bert_base_6layer_6conect-pretrained/pytorch_model_19.bin --config_file config/bert_base_6layer_6conect.json --task 1 --split val

3: To test on VCR QA->R

python eval_tasks.py --bert_model bert-base-uncased --from_pretrained save/VCR_Q-A-VCR_QA-R_bert_base_6layer_6conect-pretrained/pytorch_model_19.bin --config_file config/bert_base_6layer_6conect.json --task 2 --split val

RefCOCO+

1: Download the pretrained model with objective RefCOCO+ and put it under save

2: We use the Pre-computed detections/masks from MAttNet for fully-automatic comprehension task, Check the MAttNet repository for more details.

3: To test on the RefCOCO+ val set and use the following command:

python eval_tasks.py --bert_model bert-base-uncased --from_pretrained save/refcoco+_bert_base_6layer_6conect-pretrained/pytorch_model_19.bin --config_file config/bert_base_6layer_6conect.json --task 4

Visiolinguistic Pre-training

Once you extracted all the image features, to train a 6-layer ViLBERT model on conceptual caption:

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 train_concap.py --from_pretrained bert-base-uncased --bert_model bert-base-uncased --conf
ig_file config/bert_base_6layer_6conect.json --learning_rate 1e-4 --train_batch_size 512 --save_name pretrained

Train ViLBERT for DownStream Tasks

VQA

To fintune a 6-layer ViLBERT model for VQA with 8 GPU. --tasks 0 means VQA tasks. Check vlbert_tasks.yml for more settings for VQA tasks.

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 train_tasks.py --bert_model bert-base-uncased --from_pretrained save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin  --config_file config/bert_base_6layer_6conect.json  --learning_rate 4e-5 --num_workers 16 --tasks 0 --save_name pretrained

VCR

Similarly, to finetune a 6-layer vilbert model for VCR task, run the following commands. Here we joint train Q->A and QA->R tasks, so the tasks is specified as --tasks 1-2

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 train_tasks.py --bert_model bert-base-uncased --from_pretrained save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin  --config_file config/bert_base_6layer_6conect.json  --learning_rate 2e-5 --num_workers 16 --tasks 1-2 --save_name pretrained

Image Retrieval

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 train_tasks.py --bert_model bert-base-uncased --from_pretrained save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin  --config_file config/bert_base_6layer_6conect.json  --learning_rate 4e-5 --num_workers 9 --tasks 3 --save_name pretrained

Refer Expression

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 train_tasks.py --bert_model bert-base-uncased --from_pretrained save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin  --config_file config/bert_base_6layer_6conect.json  --learning_rate 4e-5 --num_workers 16 --tasks 4 --save_name pretrained
  • For single GPU training, use smaller batch size and simply remove -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0

References

If you find this code is useful for your research, please cite our paper

@article{lu2019vilbert,
  title={ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks},
  author={Lu, Jiasen and Batra, Dhruv and Parikh, Devi and Lee, Stefan},
  journal={arXiv preprint arXiv:1908.02265},
  year={2019}
}

Why does ViLBERT look like ?

vilbert_beta's People

Contributors

jiasenlu avatar linbojin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vilbert_beta's Issues

Port to Pytorch Lightning Format

Hello,
We love your repo and thanks for open sourcing it. Are there plans to port this to Pytorch lightning ? We are trying to build upon Vilbert for the CVR task and wanted to know what we will need to change to run a pretrained model on the validation set of VCR.

Thanks !

about make

When I use make in tools/refer I got the follow:

# install pycocotools/mask locally process_begin: CreateProcess(NULL, # install pycocotools/mask locally, ...) failed. make (e=2): 系统找不到指定的文件。 make: *** [all] Error 2

pretrained vilbert model not found

Hi Jiasen,
I am trying to train (fine-tune) the downstream task for refcoco+... The code throws an error saying that pretrained vilbert model is not found. Is there a link where I can download the pretrained vilbert model.

The error that I get is the following:
ERROR - vilbert.vilbert - Model name 'save/bert_base_6_layer_6_connect_
freeze_0/pytorch_model_8.bin' was not found in model name list...

Thanks a lot!

Pretrained models from bottom-up-attention not available anymore.

Dear Jiasen Lu,
first of all, I would like to congratulate you on your fine work!

Since I want to use ViLBERT on my own dataset, I would like to extract visual features from them, but with the same model as you used, so I do not need to train the whole model again. You described in "vilbert_beta" how you used "bottom-up-attention" for this. So going to that repository and following the installation guide, it says:
"Download pretrained model, and put it under data\faster_rcnn_models.", where the link to the pretrained model is the following: https://www.dropbox.com/s/tr24q7h0zm2wnjv/resnet101_faster_rcnn_final.caffemodel?dl=1

Unfortunately, the link is broken and I can not download the file. Could you point to me how to extract features for ViLBERT then? Thank you!

Error when extract feature from faster rcnn

when I use generate_tsv to extract feature from pretrained caffe model, I got error
"Check failed: error == cudaSuccess (8 vs. 0) invalid device function"
which happend in this layer:
caffe::PoolingLayer<>::Forward_gpu().
I've checked my caffe version is able to run with gpu, I have no idea what's wrong with it.

The required pre-trained vilbert checkpoint is not released

I found that you have released the checkpoint bert_base_6_layer_6_connect/pytorch_model_9.bin, which should be the checkpoint of VilBert after pretraining. However, in the fine tuning phase, the pretrain parameter wants the save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin checkpoint rather than model_9. Are these two checkpoints the same one?

subprocess.CalledProcessError

Hi,
i want to use the pretrained model and fine-tune for VQA and i just run the commands as you provide :

python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 train_tasks.py --bert_model bert-base-uncased --from_pretrained save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin --config_file config/bert_base_6layer_6conect.json --learning_rate 4e-5 --num_workers 16 --tasks 0 --save_name pretrained

but an error appears:

Traceback (most recent call last):
File "/cluster/home/chenjinjie/.conda/envs/vilbert/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/cluster/home/chenjinjie/.conda/envs/vilbert/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/cluster/home/chenjinjie/.conda/envs/vilbert/lib/python3.6/site-packages/torch/distributed/launch.py", line 263, in
main()
File "/cluster/home/chenjinjie/.conda/envs/vilbert/lib/python3.6/site-packages/torch/distributed/launch.py", line 259, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/cluster/home/chenjinjie/.conda/envs/vilbert/bin/python', '-u', 'train_tasks.py', '--local_rank=0', '--bert_model', 'bert-base-uncased', '--from_pretrained', 'save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin', '--config_file', 'config/bert_base_6layer_6conect.json', '--learning_rate', '4e-5', '--num_workers', '16', '--tasks', '0', '--save_name', 'pretrained']' died with <Signals.SIGABRT: 6>.

could you help? thanks

image captioning with ViLBERT

Figure 5 in the paper shows samples of generated image descriptions, but I couldn't reproduce similar results using the pretrained ViLBERT. I have used the BertForMultiModalPreTraining and supplied as features the features of the image which seem to be OK, given that the prediction_scores_v (that is the hv vector in the paper) seeems to reflect what is in the picture. As the "question", I have supplied a tensor with 30 [MASK] tokens.
Then I have been, following the paper, passing that through the model 30 times and at each iteration setting ith token of the "question" (text stream) to the text token with the highest score at the ith position.
I have also tried repeating the procedure multiple times, but it didn't change much. This results in very poor captions, such as "the a man is a man who is a man who is a man ...".

Could you please elaborate on the captioning method you've presented in the publication?

i got an error for training VCR task!!!

i want to train vcr task.
i got an error like this.

(vilbert) ailab@ailab:~/vilbert_beta$ python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 train_tasks.py --bert_model bert-base-uncased --from_pretrained save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin  --config_file config/bert_base_6layer_6conect.json  --learning_rate 2e-5 --num_workers 16 --tasks 1-2 --save_name pretrained
2020-01-26 19:03:18.956063: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:18.956235: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:18.956249: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2020-01-26 19:03:19.029627: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.029808: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.029834: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2020-01-26 19:03:19.099775: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.099913: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.099927: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2020-01-26 19:03:19.122913: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.123137: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.123151: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2020-01-26 19:03:19.268342: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.268462: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.268473: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2020-01-26 19:03:19.272649: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.272879: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.272908: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2020-01-26 19:03:19.364081: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.364204: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.364222: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2020-01-26 19:03:19.481052: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.481345: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda-9.0/lib64
2020-01-26 19:03:19.481381: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
train_tasks.py:158: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  task_cfg = edict(yaml.load(f))
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1556653183467/work/torch/csrc/cuda/Module.cpp line=33 error=10 : invalid device ordinal
Traceback (most recent call last):
  File "train_tasks.py", line 434, in <module>
    main()
  File "train_tasks.py", line 201, in main
    torch.cuda.set_device(args.local_rank)
  File "/home/ailab/anaconda3/envs/vilbert/lib/python3.6/site-packages/torch/cuda/__init__.py", line 265, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: cuda runtime error (10) : invalid device ordinal at /opt/conda/conda-bld/pytorch_1556653183467/work/torch/csrc/cuda/Module.cpp:33
train_tasks.py:158: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  task_cfg = edict(yaml.load(f))
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1556653183467/work/torch/csrc/cuda/Module.cpp line=33 error=10 : invalid device ordinal
Traceback (most recent call last):
  File "train_tasks.py", line 434, in <module>
    main()
  File "train_tasks.py", line 201, in main
    torch.cuda.set_device(args.local_rank)
  File "/home/ailab/anaconda3/envs/vilbert/lib/python3.6/site-packages/torch/cuda/__init__.py", line 265, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: cuda runtime error (10) : invalid device ordinal at /opt/conda/conda-bld/pytorch_1556653183467/work/torch/csrc/cuda/Module.cpp:33
train_tasks.py:158: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  task_cfg = edict(yaml.load(f))
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1556653183467/work/torch/csrc/cuda/Module.cpp line=33 error=10 : invalid device ordinal
Traceback (most recent call last):
  File "train_tasks.py", line 434, in <module>
    main()
  File "train_tasks.py", line 201, in main
    torch.cuda.set_device(args.local_rank)
  File "/home/ailab/anaconda3/envs/vilbert/lib/python3.6/site-packages/torch/cuda/__init__.py", line 265, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: cuda runtime error (10) : invalid device ordinal at /opt/conda/conda-bld/pytorch_1556653183467/work/torch/csrc/cuda/Module.cpp:33
train_tasks.py:158: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  task_cfg = edict(yaml.load(f))
train_tasks.py:158: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  task_cfg = edict(yaml.load(f))
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1556653183467/work/torch/csrc/cuda/Module.cpp line=33 error=10 : invalid device ordinal
Traceback (most recent call last):
  File "train_tasks.py", line 434, in <module>
    main()
  File "train_tasks.py", line 201, in main
    torch.cuda.set_device(args.local_rank)
  File "/home/ailab/anaconda3/envs/vilbert/lib/python3.6/site-packages/torch/cuda/__init__.py", line 265, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: cuda runtime error (10) : invalid device ordinal at /opt/conda/conda-bld/pytorch_1556653183467/work/torch/csrc/cuda/Module.cpp:33
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1556653183467/work/torch/csrc/cuda/Module.cpp line=33 error=10 : invalid device ordinal
Traceback (most recent call last):
  File "train_tasks.py", line 434, in <module>
    main()
  File "train_tasks.py", line 201, in main
    torch.cuda.set_device(args.local_rank)
  File "/home/ailab/anaconda3/envs/vilbert/lib/python3.6/site-packages/torch/cuda/__init__.py", line 265, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: cuda runtime error (10) : invalid device ordinal at /opt/conda/conda-bld/pytorch_1556653183467/work/torch/csrc/cuda/Module.cpp:33
train_tasks.py:158: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  task_cfg = edict(yaml.load(f))
Traceback (most recent call last):
  File "train_tasks.py", line 434, in <module>
    main()
  File "train_tasks.py", line 205, in main
    torch.distributed.init_process_group(backend="nccl")
  File "/home/ailab/anaconda3/envs/vilbert/lib/python3.6/site-packages/torch/distributed/distributed_c10d.py", line 406, in init_process_group
    store, rank, world_size = next(rendezvous(url))
  File "/home/ailab/anaconda3/envs/vilbert/lib/python3.6/site-packages/torch/distributed/rendezvous.py", line 143, in _env_rendezvous_handler
    store = TCPStore(master_addr, master_port, world_size, start_daemon)
RuntimeError: Address already in use
train_tasks.py:158: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  task_cfg = edict(yaml.load(f))
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1556653183467/work/torch/csrc/cuda/Module.cpp line=33 error=10 : invalid device ordinal
Traceback (most recent call last):
  File "train_tasks.py", line 434, in <module>
    main()
  File "train_tasks.py", line 201, in main
    torch.cuda.set_device(args.local_rank)
  File "/home/ailab/anaconda3/envs/vilbert/lib/python3.6/site-packages/torch/cuda/__init__.py", line 265, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: cuda runtime error (10) : invalid device ordinal at /opt/conda/conda-bld/pytorch_1556653183467/work/torch/csrc/cuda/Module.cpp:33
train_tasks.py:158: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  task_cfg = edict(yaml.load(f))
01/26/2020 19:03:20 - INFO - __main__ -   device: cuda:1 n_gpu: 1, distributed training: True, 16-bits training: False
Traceback (most recent call last):
  File "/home/ailab/anaconda3/envs/vilbert/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/ailab/anaconda3/envs/vilbert/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/ailab/anaconda3/envs/vilbert/lib/python3.6/site-packages/torch/distributed/launch.py", line 235, in <module>
    main()
  File "/home/ailab/anaconda3/envs/vilbert/lib/python3.6/site-packages/torch/distributed/launch.py", line 231, in main
    cmd=process.args)
subprocess.CalledProcessError: Command '['/home/ailab/anaconda3/envs/vilbert/bin/python', '-u', 'train_tasks.py', '--local_rank=0', '--bert_model', 'bert-base-uncased', '--from_pretrained', 'save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin', '--config_file', 'config/bert_base_6layer_6conect.json', '--learning_rate', '2e-5', '--num_workers', '16', '--tasks', '1-2', '--save_name', 'pretrained']' returned non-zero exit status 1.
(vilbert) ailab@ailab:~/vilbert_beta$ 01/26/2020 19:03:21 - INFO - pytorch_pretrained_bert.tokenization -   loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at /home/ailab/.pytorch_pretrained_bert/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
01/26/2020 19:03:21 - INFO - vilbert.task_utils -   Loading VCR_Q-A Dataset with batch size 8
01/26/2020 19:03:35 - INFO - vilbert.task_utils -   Loading VCR_QA-R Dataset with batch size 8
01/26/2020 19:03:49 - INFO - vilbert.utils -   logging file at: VCR_Q-A-VCR_QA-R_bert_base_6layer_6conect-pretrained
01/26/2020 19:03:49 - ERROR - vilbert.vilbert -   Model name 'save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin' was not found in model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese). We assumed 'save/bert_base_6_layer_6_connect_freeze_0/pytorch_model_8.bin' was a path or url but couldn't find any file associated to this path or url.
Traceback (most recent call last):
  File "train_tasks.py", line 434, in <module>
    main()
  File "train_tasks.py", line 263, in main
    model.to(device)
AttributeError: 'NoneType' object has no attribute 'to'

how do i do? help T^T

the documentation is not clear enough, and it is hard to replicate the results.

For the VCR project, can you show us how to organize the data/VCR folder.
I've tried to organize this folder many times in different ways, but I got errors each time.

In the vlbert.yml file, you claim
features_h5path1: data/VCR/VCR_resnet101_faster_rcnn_genome.lmdb
features_h5path2: data/VCR/VCR_gt_resnet101_faster_rcnn_genome.lmdb

I got the error: c
Traceback (most recent call last):
File "eval_tasks.py", line 228, in
main()
File "eval_tasks.py", line 169, in main
= LoadDatasetEval(args, task_cfg, args.tasks.split('-'))
File "/hpchome/carin/ss1043/vilbert_beta/vilbert/task_utils.py", line 279, in
args.in_memory)
File "/hpchome/carin/ss1043/vilbert_beta/vilbert/datasets/_image_features_rea
lock=False, readahead=False, meminit=False)
lmdb.Error: data/VCR/VCR_resnet101_faster_rcnn_genome.lmdb: Not a directory

In the dropbox data folder:
https://www.dropbox.com/sh/9pgxc3njd3iq03o/AADXgnT1HmEdrds7aujTncBGa?dl=0
If i organize data this way, I got another error:
Traceback (most recent call last):
File "eval_tasks.py", line 228, in
main()
File "eval_tasks.py", line 169, in main
= LoadDatasetEval(args, task_cfg, args.tasks.split('-'))
File "/hpchome/carin/ss1043/vilbert_beta/vilbert/task_utils.py", line 283, in LoadDatasetEval
task_feature_reader2[features_h5path] = ImageFeaturesH5Reader(features_h5path, args.in_memory)
File "/hpchome/carin/ss1043/vilbert_beta/vilbert/datasets/_image_features_reader.py", line 43, in init
lock=False, readahead=False, meminit=False)
lmdb.InvalidError: data/VCR/VCR_gt_resnet101_faster_rcnn_genome.lmdb: MDB_INVALID: File is not an LMDB file

Train ViLBERT for DownStream Tasks for VCR

I encountered an issue in:
File "/home/XXX/vilbert_beta/vilbert/vilbert.py", line 322, in forward
position_ids = position_ids.unsqueeze(0).expand_as(input_ids)
RuntimeError: The expanded size of the tensor (60) must match the existing size (4) at non-singleton dimension 2. Target sizes: [16, 4, 60]. Tensor sizes: [1, 4]

From the code we can see:
seq_length = input_ids.size(1)
position_ids = torch.arange(
seq_length, dtype=torch.long, device=input_ids.device
)
The input_ids is [16, 4, 60]
position_ids is [4]
How do we fit the 60?
Thanks!

Fine-tune VilBERT on a different VQA dataset

Hello,

I am trying to fine-tune and evaluate the pre-trained VilBERT model on a different VQA dataset, but I am finding it a bit difficult to quickly do it. Could you please briefly describe the steps needed to fine-tune the model on a different VQA dataset?

Thank you!

Best,
Claudio

Error when running the VQA task

I get the following error when running the VQA task:

Traceback (most recent call last):
  File "eval_tasks.py", line 228, in <module>
    main()
  File "eval_tasks.py", line 209, in main
    task_id, batch, model, task_dataloader_val, task_losses, results, others)
  File "/home/tobias/vilbert_beta/vilbert/task_utils.py", line 353, in EvaluatingModel
    question = question.view(-1, question.size(2))
IndexError: Dimension out of range (expected to be in range of [-2, 1], but got 2)

I am using the downloadable coco resnet features as data and the following command to run the script:

python eval_tasks.py --bert_model bert-base-uncased --from_pretrained \
	save/VQA_bert_base_6layer_6conect-pretrained/pytorch_model_19.bin \
	--config_file config/bert_base_6layer_6conect.json --task 0 --split test --batch_size 100

It seems that the tensors loaded from the data do not have the right dimensions?

Any help is appreciated! Thanks!

Features for visual genome

Hi,
Thanks for your work.
In addition to COCO features, could you also provide visual genome features?

image features of conceptual caption

Could you release Conceptual Caption features? These features maybe so heavy to upload. But I really want to retrain based on your code.

By the way I have a question about your number of streams study.
For your two-stream version, I found text stream used 12 bert layers and image stream used 6 image bert layers. These two streams would pass through a connection module with 6 layers.
For the single stream version, two streams would share 12 bert layers for encoding.
I don't think these two models are comparable.

Thanks a lot!

how do you extract the feature of the whole image by Faster-RCNN?

Hi, @jiasenlu jiasenlu, I am new to object recognition, I have read your paper and, it's mentioned that you have used whole image as a box and extracted its feature by Faster-RCNN. How do you achieve that?
The RPN and Fast-RCNN are built into one caffe net model, do you use a customized Faster-RCNN?
Can anyone help?

Nan Loss when test VCR Q->A

After I configure all the environments and downloaded the data and pretrained model for VCR.
I followed your github to test VCR Q-A
python eval_tasks.py --bert_model bert-base-uncased --from_pretrained save/VCR_Q-A-VCR_QA-R_bert_base_6layer_6conect-pretrained/pytorch_model_19.bin --config_file config/bert_base_6layer_6conect.json --task 1 --split val

and I got nan loss:
Validation [VCR_Q-A]: loss nan score 24.715

then I print the loss in every batch:
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]]), tensor([[[0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]]), tensor([[[[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]], [[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]],

[[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]], `` [[0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], ..., [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.], [0., 0., 0., ..., 0., 0., 0.]]]]), tensor([1000007])] nan

My cuda version is 9.0 with pytorch 1.1
have you meet this kind of error? how did you solve it?
I am not sure whether this error is caused by apex.
I would be appreciate if you can solve my problem.

The dropbox link might require permissions from you. (It doesn't open.)

The dropbox link might require permissions from you. (It doesn't open.)

Also, we would like to train the model on RefCOCOg. Would it be possible to get some pointers on how to use your code to train the model on RefCOCOg (for example, how to map/format the MattNet features to your features features_h5path1 and features_h5path2). This would be very helpful. (Did you also try RefCOCOg in addition to RefCOCO+). Thanks a lot!

Originally posted by @arjunakula in #9 (comment)

Error when running zero-shot retrieval

Num Iters: {'TASK3': 10000}
Batch size: {'TASK3': 1}
Traceback (most recent call last):
File "eval_retrieval.py", line 275, in
main()
File "eval_retrieval.py", line 230, in main
score_matrix[caption_idx, image_idx*500:(image_idx+1)*500] = torch.softmax(vil_logit, dim=1)[:,0].view(-1).cpu().numpy()
ValueError: could not broadcast input array from shape (125) into shape (500)

Has anyone encountered that? Thank you!

Model performance on VCR val split

I download the processed VCR data and achieve Q-->A performance about 72.217 with the shared checkpoint, which is slightly lower than the paper claimed. I have used my own processed data and achieve similar performance on both the fine-tuned checkpoint and the pretrained checkpoint after fine tuning on my processed data.

Evaluation on multi-gpus

While evaluating on multiple gpus, we need to explicily add "-m torch.distributed.launch --nproc_per_node=1 --nnodes=1 --node_rank=0 " to the command. Otherwise, the evaluation takes a lot of time. (the evaluation script do not handle multiple gpus).

Example:
Instead of running as "python eval_tasks.py ....", run as "python -m torch.distributed.launch --nproc_per_node=1 --nnodes=1 --node_rank=0 eval_tasks.py"

Feature Extraction of VCR

Can you provide us the data of global feature images and the detail of VCR's feature extraction?
I would be really greatfully if you can help me!

Would you release the multi-task fine-tuning codes for ViL-BERT?

Hi, I have read your new paper "12-in-1: Multi-Task Vision and Language Representation Learning" on Arxiv, which utilizes multi-task fine-tuning to boost the performance of Vil-BERT. May I ask whether you will release this part of code in this repo or in some other places? Thank you very much!

evaluation for image retrieval does not work. numpy array size mismatch

I get the the tensor size error at the end. The command I am running is this:
python eval_retrieval.py --bert_model bert-base-uncased --from_pretrained save/RetrievalFlickr30k_bert_base_6layer_6conect-pretrained/pytorch_model_19.bin --config_file config/bert_base_6layer_6conect.json --task 3 --split test --batch_size 1

I get the same error if I try the zeroshot also

python3 ./eval_retrieval.py --bert_model bert-base-uncased --from_pretrained save/bert_base_6_layer_6_connect/pytorch_model_9.bin --config_file config/bert_base_6layer_6conect.json --task 3 --split test --batch_size 1 --zero_shot


11/20/2019 16:54:03 - INFO - vilbert.vilbert -   Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex .
11/20/2019 16:54:03 - INFO - vilbert.basebert -   Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex .
11/20/2019 16:54:03 - INFO - __main__ -   device: cuda n_gpu: 2, distributed training: False, 16-bits training: False
11/20/2019 16:54:03 - INFO - pytorch_pretrained_bert.tokenization -   loading vocabulary file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt from cache at /home/sadali/.pytorch_pretrained_bert/26bc1ad6c0ac742e9b52263248f6d0f00068293b33709fae12320c0e35ccfbbb.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
11/20/2019 16:54:03 - INFO - vilbert.task_utils -   Loading RetrievalFlickr30k Dataset with batch size 1
11/20/2019 16:54:07 - INFO - vilbert.vilbert -   loading archive file save/RetrievalFlickr30k_bert_base_6layer_6conect-pretrained/pytorch_model_19.bin
11/20/2019 16:54:07 - INFO - vilbert.vilbert -   Model config {
  "attention_probs_dropout_prob": 0.1,
  "bi_attention_type": 1,
  "bi_hidden_size": 1024,
  "bi_intermediate_size": 1024,
  "bi_num_attention_heads": 8,
  "fast_mode": true,
  "fixed_t_layer": 0,
  "fixed_v_layer": 0,
  "fusion_method": "mul",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "in_batch_pairs": false,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "intra_gate": false,
  "max_position_embeddings": 512,
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pooling_method": "mul",
  "predict_feature": false,
  "t_biattention_id": [
    6,
    7,
    8,
    9,
    10,
    11
  ],
  "type_vocab_size": 2,
  "v_attention_probs_dropout_prob": 0.1,
  "v_biattention_id": [
    0,
    1,
    2,
    3,
    4,
    5
  ],
  "v_feature_size": 2048,
  "v_hidden_act": "gelu",
  "v_hidden_dropout_prob": 0.1,
  "v_hidden_size": 1024,
  "v_initializer_range": 0.02,
  "v_intermediate_size": 1024,
  "v_num_attention_heads": 8,
  "v_num_hidden_layers": 6,
  "v_target_size": 1601,
  "vocab_size": 30522,
  "with_coattention": true
}

  Num Iters:  {'TASK3': 10000}
  Batch size:  {'TASK3': 1}
Traceback (most recent call last):
  File "eval_retrieval.py", line 275, in <module>
    main()
  File "eval_retrieval.py", line 235, in main
    score_matrix[caption_idx, image_idx*500:(image_idx+1)*500] = vil_logit.view(-1).cpu().numpy()
ValueError: could not broadcast input array from shape (250) into shape (500)

Recommend pytorch version?

Hello, I'm wondering what version of pytorch you used to run the code? I am running into some issues when I install with the latest version of pytorch (1.3)

got bugs when evaluating the pre-trained VCR model

I got this wierd error message on a GPU cluster...
Please help me if you know.

_PyFunction_FastCallDict
_PyObject_FastCallDict
_PyObject_Call_Prepend
PyObject_Call

    _PyObject_FastCallDict

    _PyEval_EvalFrameDefault



    _PyEval_EvalFrameDefault


    _PyEval_EvalFrameDefault
    PyEval_EvalCodeEx
    PyEval_EvalCode

    PyRun_FileExFlags
    PyRun_SimpleFileExFlags
    Py_Main
    main
    __libc_start_main

*** End stack trace ***
Aborted

Is python-prctl required?

Hi there - thanks for your great work on vision and language pre-training! I'm trying to run the codebase, but I am running into issues install python-prctl, since I do not have sudo access. Is the package required? I do not see it used or imported in the codebase.

Thanks for your help!

performance of Image retrieval on flivkr30k

The best finetune result(finetune from the pretrain model you published) I get is 56.62,83.96,90.56 which is still 1.6 lower than your reported result, furthermore, the zero-shot evaluation result from your public conceptual pretrained model is only 26.83,56.43,68.92, is it the right one? I wish to know more training details, like the command and out file of each pretrained and finetune model, and I have some questions:

  1. Do you use freeze param during finetune or only pretrain?
  2. How do you calculate the hard negative? What kind of image feature do you use to calculate the similarity(ROI feature or other features like resnet or dense net for full image)?
  3. How do you set the LR decay epoch for finetuning? I see from the out file that looks like 0.2 with [11,13,15,17]
  4. The pretrain code output log shows LR=0, is it normal? (because I see that you set different decay weight for different params)

Thanks a lot!

By the way, I finetuned image retrieval on flickr 30k with these params:
Namespace(baseline=False, bert_model='bert-base-uncased', compact=False, config_file='config/bert_base_6layer_6conect.json', do_lower_case=True, evaluation_interval=1, fp16=False, freeze=-1, from_pretrained='/conceptual_pretrained_bert_base_6_layer_6_connect_freeze_0/pytorch_model_9.bin', gradient_accumulation_steps=1, in_memory=False, learning_rate=2e-05, local_rank=0, loss_scale=0, lr_scheduler='mannul', no_cuda=False, num_train_epochs=20, num_workers=9, optimizer='BertAdam', output_dir='/model/vilbert/', save_name='finetune_retrieval_2', seed=0, tasks='3', use_chunk=0, vision_scratch=False, warmup_proportion=0.1)

{
"attention_probs_dropout_prob": 0.1,
"bi_attention_type": 1,
"bi_hidden_size": 1024,
"bi_intermediate_size": 1024,
"bi_num_attention_heads": 8,
"fast_mode": false,
"fixed_t_layer": 0,
"fixed_v_layer": 0,
"fusion_method": "mul",
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"in_batch_pairs": false,
"initializer_range": 0.02,
"intermediate_size": 3072,
"intra_gate": false,
"max_position_embeddings": 512,
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pooling_method": "mul",
"predict_feature": false,
"t_biattention_id": [
6,
7,
8,
9,
10,
11
],
"type_vocab_size": 2,
"v_attention_probs_dropout_prob": 0.1,
"v_biattention_id": [
0,
1,
2,
3,
4,
5
],
"v_feature_size": 2048,
"v_hidden_act": "gelu",
"v_hidden_dropout_prob": 0.1,
"v_hidden_size": 1024,
"v_initializer_range": 0.02,
"v_intermediate_size": 1024,
"v_num_attention_heads": 8,
"v_num_hidden_layers": 6,
"v_target_size": 1601,
"vocab_size": 30522,
"with_coattention": true
}

Referring Expression Evaluation

Trying to evaluate referring expressions shows the following error:
lmdb.Error: data/referExpression/refcoco+_resnet101_faster_rcnn_genome.lmdb: No such file or directory

Could you let me know how to get the file refcoco+_resnet101_faster_rcnn_genome.lmdb. thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.