Git Product home page Git Product logo

knowing-when-to-look-adaptive-attention's Introduction

Adaptive Attention in PyTorch

PyTorch Implementation of Knowing When to Look: Adaptive Attention via a Visual Sentinal for Image Captioning Paper
Original Torch Implementation by Lu. et al can be found here

ss

Instructions

  1. Download the COCO 2014 dataset from here. In particualr, you'll need the 2014 Training, Validation and Testing images, as well as the 2014 Train/Val annotations.

  2. Download Karpathy's Train/Val/Test Split. You may download it from here.

  3. If you want to do evaluation on COCO, make sure to download the COCO API from here if your on Linux or from here if your on Windows. Then download the COCO caption toolkit from here and re-name the folder to cococaption. (This also requires java. Simply dowload it from here if you don't have it).

Files

preprocess.py Creates the WORDMAP.json file and the .h5 files
dataset.py Creates the custom dataset
util.py Functions to be used throught the code
models.py Defines the architectures
train_eval For Training and Evaluation
run.ipynb For Testing and Visualization
The folder caption data includes data used along with images, mainly for evaluation purposes.

Testing

Place the test image in the folder 'test_imgs', and name it as test.jpg, and then run the run.ipynbjupyter notebook file to get the results.

Results

The file here contains the obtained evalaution scores on the validation split. The pretrained model is provided here as well (trained for 12 epochs). Some results from Karpathy's Split is shown below. The visual grounding probability (1-beta) is shown in green.

demo

References

Code adopted from sgrvinod implementation of "Show, Attend and Tell"

knowing-when-to-look-adaptive-attention's People

Contributors

fawazsammani avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.