Git Product home page Git Product logo

mprc's Introduction

MPRC

Implementation of the model described in the following paper:

The Reinforced Ranker-Reader is tested on several open-domain QA datasets: Quasart-T, SearchQA and Triviaqa (unfiltered); The Single Reader is tested on the benchmark data set SQuAD.

Requirements

  • Torch7 (with cutorch, cunn, cudnn)
  • Python 2.7
  • NLTK (with corpus punkt and stopwords)

Datasets

Reinforced Ranker-Reader for Open-Domain Question Answering

Usage

sh preprocess.sh quasart (searchqa/unftriviaqa/squad)
cd main
th main.lua -task quasart (searchqa/unftriviaqa) -model rankerReader -reward_epoch 12
th main.lua -task squad -model mlstmReader 

sh preprocess.sh quasart (searchqa/unftriviaqa) will download the datasets and preprocess the datasets into the files (train.tsv dev.tsv test.tsv) under the path "data/quasart/sequence" with the format:

question *split_sign* question_id *split_sign* answer_1 *split_answer* answer_2 ... \n
passage_1 *split_sign* IR_score \n
passage_2 *split_sign* IR_score \n
...
passage_n *split_sign* IR_score \n
*new_instance* \n
question_2 *split_sign* question_id *split_sign* answer_1 *split_answer* answer_2 ... \n
...

main.lua will first initialize the preprossed data and word embeddings into a Torch format and then run the alogrithm. The parameter "-reward_epoch" specifies the epoch when the REINFORCE methond starts to be applied.

sh preprocess.sh squad will download the SQuAD dataset and preprocess the it into the files (train.txt dev.txt) under the path "data/squad/sequence" with the format:

Passage \t Question \t sequence of the positions where the answer appear in Passage (e.g. 3 4 5 6) \n

Docker

You may try to use Docker for running the code.

After installation, run the following codes: (Note: the repository path "/PATH/mprc" need to change; a task name "quasart or searchqa or unftriviaqa or squad" need to specify)

nvidia-docker run -it -v /PATH/mprc:/opt --rm -w /opt      shuohang/mprc:1.0 /bin/bash -c "sh preprocess.sh quasart (searchqa/unftriviaqa/squad)"
nvidia-docker run -it -v /PATH/mprc:/opt --rm -w /opt/main shuohang/mprc:1.0 /bin/bash -c "th main.lua -task quasart (searchqa/unftriviaqa) -model rankerReader -reward_epoch 12"
nvidia-docker run -it -v /PATH/mprc:/opt --rm -w /opt/main shuohang/mprc:1.0 /bin/bash -c "th main.lua -task squad -model mlstmReader"

Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering

Usage

sh preprocess.sh quasart (or searchqa)
cd main
sh bash_ans.sh quasart (or searchqa) 0

bash_ans.sh will first train the R^3 model. Then the top-K candidate answers for train, dev and test sets are stored in the files, like "trainedmodel/evaluation/quasart/dev_output_top.txt" . All the passages containing the candidate answers are aggregated in the file, like "data/quasartans/sequence/dev.tsv" , with the format:

question *split_sign* question_id *split_sign* answer\n
candidate_answer1 *split_sign* normalized_counting_score *split_sign* related_passage1 *split_sign* related_passage2...\n
candidate_answer2 *split_sign* normalized_counting_score *split_sign* related_passage1 *split_sign* related_passage2...\n
...

After aggregrating the passages, bash_ans.sh finally runs the reranker.

Docker

You may try to use Docker for running the code.

After installation, run the following codes: (Note: the repository path "/PATH/mprc" need to change; a task name "quasart or searchqa" need to specify)

nvidia-docker run -it -v /PATH/mprc:/opt --rm -w /opt      shuohang/mprc:1.0 /bin/bash -c "sh preprocess.sh quasart"
nvidia-docker run -it -v /PATH/mprc:/opt --rm -w /opt/main shuohang/mprc:1.0 /bin/bash -c "sh bash_ans.sh quasart 0"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.