Git Product home page Git Product logo

mcafp's Introduction

MC-AFP is a machine comprehension dataset that is generated based on the public
available Gigaword dataset (AFP portion). The technique to create such a dataset
is reported in the paper:

"Building Large Machine Reading-Comprehension Datasets using Paragraph Vectors",
Radu Soricut, Nan Ding.

We generate a datasets of around 2 million examples,
on which we estimate that the human-level accuracy is in the 90% range
(in a 5-way multi-choice setup; for comparison, a random-guess approach has 20%
accuracy).
A novel neural-network architecture that combines the representation power
of recursive neural networks with the discriminative power of fully-connected
multi-layered networks achieves the best results we could obtain on our dataset:
83.2% accuracy.

What is enclosed in this package is an encrypted MC-AFP dataset and the code
which decodes the encrypted dataset.

Datasets needed:
D1. English Gigaword Fifth Edition (LDC2011T07) from the Linguistic
    Data Consortium (LDC).
    [We cannot provide you with this dataset, please contact LDC
    at https://www.ldc.upenn.edu/].
D2. The MC-AFP dataset that comes with this package, see data/

Decoding procedure:
1. Specify the path to "(Dataset D1)" in ${LDCDIR} of generate_text.sh
2. Specify the output directory in ${OUTDIR} of generate_text.sh
3. sh generate_text.sh
4. When finished, the final dataset should be in ${OUTDIR}

mcafp's People

Contributors

dingnan-google avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.