Git Product home page Git Product logo

large-scale-vrd's Introduction

Large-scale Visual Relationship Understanding

alt text

Example results from the VG80K dataset.

This is the Caffe2 implementation for Large-scale Visual Relationship Understanding, AAAI2019.

This code is for the VG80K dataset only. For results on VG200 and VRD please refer to the PyTorch implementation.

Note: In this repo we use ground-truth boxes during testing, so there is no object detection module involved in this repo.

Caffe2

To install Caffe2 with CUDA support, follow the installation instructions from the Caffe2 website. If you already have Caffe2 installed, make sure to update your Caffe2 to a version that includes the Detectron module.

Please ensure that your Caffe2 installation was successful before proceeding by running the following commands and checking their output as directed in the comments.

# To check if Caffe2 build was successful
python2 -c 'from caffe2.python import core' 2>/dev/null && echo "Success" || echo "Failure"

# To check if Caffe2 GPU build was successful
# This must print a number > 0 in order to use Detectron
python2 -c 'from caffe2.python import workspace; print(workspace.NumCudaDevices())'

If the caffe2 Python package is not found, you likely need to adjust your PYTHONPATH environment variable to include its location (/path/to/caffe2/build, where build is the Caffe2 CMake build directory).

Other Dependencies

Install Python dependencies:

pip install numpy>=1.13 pyyaml>=3.12 matplotlib opencv-python>=3.2 setuptools Cython mock scipy

Large-scale-VRD

Clone the Large-scale-VRD repository:

# Large-scale-VRD=/path/to/clone/Large-scale-VRD
git clone https://github.com/fairinternal/VRD $Large-scale-VRD

Set up Python modules:

cd $Large-scale-VRD/lib && make

Annotations

Download VG annotation files from here. Put the zip file under $Large-scale-VRD and unzip it. You should see a datasets folder unzipped there.

Datasets

Download VG80K images from here. Unzip all images into $Large-scale-VRD/datasets/large_scale_VRD/Visual_Genome/images.

Pretrained Embedding Models

Download pretrained embeddings from here. Put the zip file under $Large-scale-VRD/datasets/large_scale_VRD and unzip it. You should see a "label_embeddings" folder and a "models" folders there.

Our Trained Models

You can download our trained models from here. Put the zip file under $Large-scale-VRD and unzip it. You should see a checkpoints folder unzipped there.

Training

To train VG80K with 8 GPU, run:

python tools/train_net_rel.py --cfg configs/vg/VG_wiki_and_relco_VGG16_softmaxed_triplet_2_lan_layers_8gpu.yaml

Testing

To test VG80K with 8 GPU, run:

python tools/test_net_rel.py --cfg configs/vg/VG_wiki_and_relco_VGG16_softmaxed_triplet_2_lan_layers_8gpu.yaml 

License

This project is licensed under the license found in the LICENSE file in the root directory of this source tree.

Our revised annotations, linked above are based on Visual Genome which is licensed under: Creative Commons Attribution 4.0 International Public License. Our revised annotations are under Attribution-NonCommercial 4.0 International License which can be found under the LICENSE file in the root directory of this source tree.

Citing Large-Scale-VRD

If you use this code in your research, please use the following BibTeX entry.

@conference{zhang2018large,
  title={Large-Scale Visual Relationship Understanding},
  author={Zhang, Ji and Kalantidis, Yannis and Rohrbach, Marcus and Paluri, Manohar and Elgammal, Ahmed and Elhoseiny, Mohamed},
  booktitle={AAAI},
  year={2019}
}

large-scale-vrd's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.