Git Product home page Git Product logo

unified-vlp's Introduction

Unified Vision-Language Pre-training (VLP).

This repo hosts an implementation of Zhou (2020)'s work on Unified Vision-Language Pre-training (VLP).

File Description

VLP_Inference : To test the released model
VLP_TrainTest.ipynb : To train the model on Flickr30k
VLP_Hyperparameter : To To train the model with new hyperparameters on Flickr30k
run_img2txt_dist.py : Runs the training loop
decode_img2txt.py : Runs the testing loop
seq2seq_loader.py : Defines the seq2seq training objective
scst_utils.py : Defines Self-Critical Sequence Training (SCST) for COCO dataset (not used here)
Rest of the files deal with either pretraining on Conceptual Captions or the VQA task and have not been used here

Installation

Conda Environment (Option I, Recommended)

  1. Recursively ssh clone the repo to include coco and pythia submodules.
git clone --recursive https://github.com/120205690/597-Course-Project.git
  1. Install CUDA (e.g., 10.0), CUDNN (e.g., v7.5), and Miniconda (either Miniconda2 or 3, version 4.6+).

  2. Run the following commands to set up conda env and install Python packages:

MINICONDA_ROOT=[to your Miniconda root directory] # e.g., /home/[usrname]/miniconda3
cd VLP
conda env create -f misc/vlp.yml --prefix $MINICONDA_ROOT/envs/vlp
conda activate vlp
  1. Finally, cd to the repo root directory and install other dependencies by running:
./setup.sh

To support language evaluation (SPICE), run

cd coco-caption
./get_stanford_models.sh

Inference and Testing

  • To run the code locally, install NVIDIA Apex from https://github.com/NVIDIA/apex
  • To run on Colab, try installing NVIDIA Apex using the shell script given.
  • If it doesn't install correctly, run VLP_TrainTest.ipynb. Else use original run_img2txt_dist.py from Vision-Language Pre-training (VLP))
  • Trained model checkpoints are contained in the flickr30k folder

Acknowledgement

All credits go to https://github.com/LuoweiZhou/VLP
Check out their AIII 2020 Paper at Unified Vision-Language Pre-training (VLP).

unified-vlp's People

Contributors

120205690 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.