Git Product home page Git Product logo

abhi1nandy2 / emnlp-2021-findings Goto Github PK

View Code? Open in Web Editor NEW
33.0 2.0 2.0 51.94 MB

This repo has the code for the paper "Question Answering over Electronic Devices: A New Benchmark Dataset and a Multi-Task Learning based QA Framework" accepted at EMNLP 2021 Findings.

Python 4.45% Jupyter Notebook 0.89% HTML 94.66%
nlp question-answering natural-language-understanding emnlp bert conference-paper dataset natural-language-processing qa

emnlp-2021-findings's Introduction

Question Answering over Electronic Devices: A New Benchmark Dataset and a Multi-Task Learning based QA Framework

This repo has the code for the paper "Question Answering over Electronic Devices: A New Benchmark Dataset and a Multi-Task Learning based QA Framework" accepted at EMNLP 2021 Findings. The blog on this paper can be found here, the poster here, and a corresponding presentation here.

Required dependencies -

Please run pip install -r requirements.txt (python3 required)

E-Manual pre-training corpus

Go to this link. A RoBERTa BASE Model pre-trained on the corpus can be found here, and a BERT BASE UNCASED Model pre-trained on the same here.

Codes

Baselines

  1. Dense Passage Retrieval(DPR) - Used HuggingFace implementation (https://huggingface.co/transformers/model_doc/dpr.html)
  2. Technical Answer Prediction (TAP) - took the help of code in https://github.com/IBM/techqa
  3. MultiSpan - took the help of code in https://github.com/eladsegal/tag-based-multi-span-extraction

Citation

Please cite the work if you would like to use it.

@inproceedings{nandy-etal-2021-question-answering,
    title = "Question Answering over Electronic Devices: A New Benchmark Dataset and a Multi-Task Learning based {QA} Framework",
    author = "Nandy, Abhilash  and
      Sharma, Soumya  and
      Maddhashiya, Shubham  and
      Sachdeva, Kapil  and
      Goyal, Pawan  and
      Ganguly, NIloy",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
    month = nov,
    year = "2021",
    address = "Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.findings-emnlp.392",
    doi = "10.18653/v1/2021.findings-emnlp.392",
    pages = "4600--4609",
    abstract = "Answering questions asked from instructional corpora such as E-manuals, recipe books, etc., has been far less studied than open-domain factoid context-based question answering. This can be primarily attributed to the absence of standard benchmark datasets. In this paper, we meticulously create a large amount of data connected with E-manuals and develop a suitable algorithm to exploit it. We collect E-Manual Corpus, a huge corpus of 307,957 E-manuals, and pretrain RoBERTa on this large corpus. We create various benchmark QA datasets which include question answer pairs curated by experts based upon two E-manuals, real user questions from Community Question Answering Forum pertaining to E-manuals etc. We introduce EMQAP (E-Manual Question Answering Pipeline) that answers questions pertaining to electronics devices. Built upon the pretrained RoBERTa, it harbors a supervised multi-task learning framework which efficiently performs the dual tasks of identifying the section in the E-manual where the answer can be found and the exact answer span within that section. For E-Manual annotated question-answer pairs, we show an improvement of about 40{\%} in ROUGE-L F1 scores over most competitive baseline. We perform a detailed ablation study and establish the versatility of EMQAP across different circumstances. The code and datasets are shared at https://github.com/abhi1nandy2/EMNLP-2021-Findings, and the corresponding project website is https://sites.google.com/view/emanualqa/home.",
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.