Git Product home page Git Product logo

dbs's Introduction

Diverse Beam Search

This code implements Diverse Beam Search (DBS) - a replacement for beam search that generates diverse sequences from sequence models like LSTMs. This repository lets you generate diverse image-captions for models trained using the popular neuraltalk2 repository. A demo of our implementation on captioning is available at dbs.cloudcv.org

Alt Text

Requirements

You will need to install torch and the packages

  • nn
  • nngraph
  • image
  • loadcaffe
  • hdf5 (optional, depending on how you want to input data)

You might want to install torch using this repository. It installs a bunch of the requirements. Additionally, if you are using a GPU you will need to install cutorch and cunn. If the image-captioning checkpoint was trained using cudnn, you will need to download cudnn. First, you will need to download it from NVIDIA's website and add it to your LD_LIBRARY_PATH.

Any of the checkpoints distributed by Andrej Karpathy along with the neuraltalk2 repository can be used with this code. Additionally, you could also train your own model using neuraltalk2 and use this code to sample diverse sentences.

Generating Diverse Sequences

After installing the dependencies, you should be able to obtain diverse captions by:

$ th eval.lua -model /path/to/model.t7 -num_images 1 -image_folder eval_images -gpuid -1

To run a beam search of size 10 with 5 diverse groups and a diversity strength of 0.5 on the same image you would do:

$ th eval.lua -model /path/to/model.t7 -B 10 -M 5 -lambda 0.5 -num_images 1 -image_folder eval_images -gpuid -1

The output of the code will be written to a json file that contains all the generated captions and their scores for each image.

Using DBS for other tasks

The core of our method is in dbs/beam_utils.lua. It contains two functions that you will need to replicate:

  • beam_step - Performs one expansion of the beams held at any given time.
  • beam_search - Modifies the log-probabilities of the sequences and calls beam_step at every time step. This handles both division of the beam budget into groups and augmenting scores with diversity.

3rd party

dbs's People

Contributors

ashwinkalyan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

dbs's Issues

Where is the hamming diversity?

Hi Ashwin,

I don't quite understand your code to add diversity to the probability. In your paper, you multiplied lambda with the Hamming diversity. But in your code, you only minus lambda from the probability.

Can you provide your explanation in greater detail?

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.