This repo hosts an implementation of Zhou (2020)'s work on Unified Vision-Language Pre-training (VLP).
VLP_Inference
: To test the released model
VLP_TrainTest.ipynb
: To train the model on Flickr30k
VLP_Hyperparameter
: To To train the model with new hyperparameters on Flickr30k
run_img2txt_dist.py
: Runs the training loop
decode_img2txt.py
: Runs the testing loop
seq2seq_loader.py
: Defines the seq2seq training objective
scst_utils.py
: Defines Self-Critical Sequence Training (SCST) for COCO dataset (not used here)
Rest of the files deal with either pretraining on Conceptual Captions or the VQA task and have not been used here
- Recursively ssh clone the repo to include
coco
andpythia
submodules.
git clone --recursive https://github.com/120205690/597-Course-Project.git
-
Install CUDA (e.g., 10.0), CUDNN (e.g., v7.5), and Miniconda (either Miniconda2 or 3, version 4.6+).
-
Run the following commands to set up conda env and install Python packages:
MINICONDA_ROOT=[to your Miniconda root directory] # e.g., /home/[usrname]/miniconda3
cd VLP
conda env create -f misc/vlp.yml --prefix $MINICONDA_ROOT/envs/vlp
conda activate vlp
- Finally,
cd
to the repo root directory and install other dependencies by running:
./setup.sh
To support language evaluation (SPICE), run
cd coco-caption
./get_stanford_models.sh
- To run the code locally, install NVIDIA Apex from https://github.com/NVIDIA/apex
- To run on Colab, try installing NVIDIA Apex using the shell script given.
- If it doesn't install correctly, run VLP_TrainTest.ipynb. Else use original run_img2txt_dist.py from Vision-Language Pre-training (VLP))
- Trained model checkpoints are contained in the flickr30k folder
All credits go to https://github.com/LuoweiZhou/VLP
Check out their AIII 2020 Paper at Unified Vision-Language Pre-training (VLP).