Unified Vision-Language Pre-training (VLP).

This repo hosts an implementation of Zhou (2020)'s work on Unified Vision-Language Pre-training (VLP).

File Description

VLP_Inference : To test the released model
VLP_TrainTest.ipynb : To train the model on Flickr30k
VLP_Hyperparameter : To To train the model with new hyperparameters on Flickr30k
run_img2txt_dist.py : Runs the training loop
decode_img2txt.py : Runs the testing loop
seq2seq_loader.py : Defines the seq2seq training objective
scst_utils.py : Defines Self-Critical Sequence Training (SCST) for COCO dataset (not used here)
Rest of the files deal with either pretraining on Conceptual Captions or the VQA task and have not been used here

Installation

Conda Environment (Option I, Recommended)

Recursively ssh clone the repo to include coco and pythia submodules.

git clone --recursive https://github.com/120205690/597-Course-Project.git

Install CUDA (e.g., 10.0), CUDNN (e.g., v7.5), and Miniconda (either Miniconda2 or 3, version 4.6+).
Run the following commands to set up conda env and install Python packages:

MINICONDA_ROOT=[to your Miniconda root directory] # e.g., /home/[usrname]/miniconda3
cd VLP
conda env create -f misc/vlp.yml --prefix $MINICONDA_ROOT/envs/vlp
conda activate vlp

Finally, cd to the repo root directory and install other dependencies by running:

./setup.sh

To support language evaluation (SPICE), run

cd coco-caption
./get_stanford_models.sh

Inference and Testing

To run the code locally, install NVIDIA Apex from https://github.com/NVIDIA/apex
To run on Colab, try installing NVIDIA Apex using the shell script given.
If it doesn't install correctly, run VLP_TrainTest.ipynb. Else use original run_img2txt_dist.py from Vision-Language Pre-training (VLP))
Trained model checkpoints are contained in the flickr30k folder

Acknowledgement

All credits go to https://github.com/LuoweiZhou/VLP
Check out their AIII 2020 Paper at Unified Vision-Language Pre-training (VLP).

120205690 / unified-vlp Goto Github PK

unified-vlp's Introduction

Unified Vision-Language Pre-training (VLP).

File Description

Installation

Conda Environment (Option I, Recommended)

Inference and Testing

Acknowledgement

unified-vlp's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent