The Chinese version of LogiQA2.0
Logiqa2.0 dataset - logical reasoning in MRC and NLI tasks
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
This repository contains the Chinese version of the LogiQA2.0 datasets for our paper LogiQA2.0 - An Improved Dataset for Logic Reasoning in Question Answering and Textual Inference. To see the full version of LogiQA2.0 and the baseline code, check this link LogiQA2.0
This is the version 2 of the LogiQA dataset, first released as a multi-choice reading comprehension dataset by our previous paper LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning.
The dataset is collected from the Chinese Civil Service Entrance Examination. The dataset is both in Chinese and English (by translation). you can download the version 1 of the LogiQA dataset from here.
To construct LogiQA2.0 dataset, we:
- collect more newly released exam questions and practice questions. There are about 20 provinces in China that hold the exam annually. The exam materials are publicly available on the Internet after the exams. Besides, practice questions are provided by various sources.
- hire professional translators to re-translate the dataset from Chinese to English; verify the labels and annotations with human experts. This program is conducted by Speechocean, a data annotation service provider. The project is accomplished with the help of Microsoft Research Asia.
- introduce a new NLI task to the dataset. The NLI version of the dataset is converted from the MRC version of the dataset, following previous work such as Transforming Question Answering Datasets into Natural Language Inference Datasets.