Git Product home page Git Product logo

lc-quad's Introduction

LC-QuAD

Largescale Complex Question Answering Dataset

๐Ÿ“ข Announcement: LCQUAD 2.0 is now released, checkout our website http://lc-quad.sda.tech .

Download

๐Ÿฃ Train, Test Data

Links

๐ŸŒ Webpage | ๐Ÿ“„ Paper | ๐Ÿข Lab

Introduction

We release, and maintain a gold standard KBQA (Question Answering over Knowledge Base) dataset containing 5000 Question and SPARQL queries. LC-QuAD uses DBpedia v04.16 as the target KB.

Usage

License: You can download the dataset (released with a GPL 3.0 License), or read below to know more.

Versioning: We use DBpedia version 04-2016 as our target KB. The public DBpedia endpoint (http://dbpedia.org/sparql) no longer uses this version, which might cause many SPARQL queries to not retrieve any answer. We strongly recommend hosting this version locally. To do so, see this guide

Splits: We release the dataset split into training, and test in a 80:20 fashion.

Format: The dataset is released in JSON dumps, where the key corrected_question contains the question, and query contains the corresponding SPARQL query.

The dataset generated has the following JSON structure, kept intact for .

{
 	'_id': 'Unique ID of this datapoint',
  	'corrected_question': 'Corrected, Final Question',
	'id': 'Template ID',
	'query': 'SPARQL Query',
	'template': 'Template used to create SPARQL Query',
	'intermediary_question': 'Automatically generated, grammatically incorrect question'
}

Cite

@inproceedings{trivedi2017lc,
  title={Lc-quad: A corpus for complex question answering over knowledge graphs},
  author={Trivedi, Priyansh and Maheshwari, Gaurav and Dubey, Mohnish and Lehmann, Jens},
  booktitle={International Semantic Web Conference},
  pages={210--218},
  year={2017},
  organization={Springer}
}

Benchmarking/Leaderboard

We're in the process of automating the benchmarking process (and updating results on our webpage). In the meantime, please get in touch with us at [email protected], and we'll do it manually. Apologies for this inconvinience.

Methodology

Overview

  • Automatically create SPARQL queries.
  • Convert SPARQL queries to intermediary NLQs.
  • Manually correct intermediary NLQs to create Questions

We start with a set of Seed Entities, and Predicate Whitelist. Using the whitelist, we generate 2-hop subgraphs around seed entities. With a seed entity as supposed answer, we juxtapose SPARQL Templates onto the subgraph, and generate SPARQL queries.

Corresponding to SPARQL template, and based on certain conditions, we assign hand-made NL question templates to the SPARQLs. Refer to this diagram to understand the nomenclature used in templates.

Finally, we follow a two-step (Correct, Review) system to generate a grammatically correct question for every template-generated one.

Changelog

0.1.3 - 19-06-2018

  • Published train-test splits
  • Website Updated

0.1.2 - 28-01-2018

  • Updated public website
  • Dataset now available in QALD format
  • Leaderboard underway

0.1.1 - 27-10-2017

  • Fixed a bug with rdf:type filter in SPARQL
  • data_set.json updated
  • updated templates.py

0.1.0 - 01-05-2017

lc-quad's People

Contributors

geraltofrivia avatar mohnish-rygbee avatar saist1993 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.