Git Product home page Git Product logo

reltresidual's Introduction

Generating scene graphs using Transformers with Knowledge infusion and Residual Connections

Scene Graph Generation is an active research topic which involves representing a visual scene in term of nodes and edges. Given an image, the objective is to determine the actors or objects present in an image and identify the relationship between the actors. The nodes in a scene graph are the proposed objects and the edges correspond to the relationship between the nodes. For instance, given an image containing a car and a person, the model needs to identify if there is an action connecting the car and the person if it exists. In this project, we extend the work done from the Relation Transformer research work. We introduce residual connections between modules and infuse prior knowledge about the objects into the system. We hypothesize that in doing so, the model will be able to identify the objects and their relationships faster and accurately. We compare the performance of the customized architecture against the baseline RelTR model on the visual genome dataset with 5,000 and 7,500 training samples. We provide detailed inference about the pros and cons of the proposed model.

Dependency Installation

  1. Clone the repo
    git clone https://github.com/rewanth22/RelTResidual.git
    For accounts that are SSH configured
     git clone [email protected]:rewanth22/RelTResidual.git
  2. Install pip
    python -m pip install --upgrade pip
  3. Create and Activate Virtual Environment (Linux)
    python3 -m venv [environment-name]
    source [environment-name]/bin/activate
  4. Install dependencies
    pip install -r requirements.txt

Training/Evaluation on Visual Genome

a) Follow [README](https://github.com/yrcong/RelTR/blob/main/data/README.md) in the data directory to prepare the datasets.

# compile the code computing box intersection
cd lib/fpn
sh make.sh

Inference

a) Download our RelTR model pretrained on the Visual Genome dataset and put it under

ckpt/checkpoint0149.pth

b) Infer the relationships in an image with the command:

python inference.py --img_path $IMAGE_PATH --resume $MODEL_PATH

Training

python main.py --dataset vg --img_folder data/vg/images/ --ann_path data/vg/ --batch_size 2 --output_dir ckpt

Evaluation

python main.py --dataset vg --img_folder data/vg/images/ --ann_path data/vg/ --eval --batch_size 1 --resume ckpt/checkpoint0149.pth

NOTE:

For working with the baseline, you need to swap 3 files. main.py with main_src.py, transformer.py with transformer_src.py and reltr.py with reltr_src.py. For working with the custom model vice-versa. This is done in order to prevent import error issues across different python scripts.

Authors

Krish Rewanth Sevuga Perumal, Prasannakumaran Dhanasekaran

reltresidual's People

Contributors

prasannakumaran avatar rewanth22 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.