PeKo: A Large Scale Precondition Knowledge Dataset

Overview

PeKo (Precondition Knowledge) is a large scale crowdsourced event precondition knowledge dataset introduced in our paper "Modeling Preconditions in Text with a Crowd-sourced Dataset" at EMNLP Findings 2020

Preprint is available from here

Crowdsourcing Precondition Knowledge

Data Preparation

We extract events and their temporal relations from news articles using CAEVO (Chambers et al., 2014), a temporal relation extraction system. We used CAEVO on a random sample of 6,837 articles inthe New York Times Annotated Corpus (Sandhaus, 2008). On average CAEVO extracted around 63 events per article, which yielded a total of 3,906 possible relation candidates per document. We filtered these to retain only pairs of events that have a BEFORE or AFTER temporal relation between them. We call the temporally preceding event the candidate precondition, and the temporally subsequent event in the pair the target event.

Crowdsourcing Task

The annotators were presented with a text snippet and two event mentions highlighted as shown below. To prune out event extraction errors from CAEVO, the annotators were first asked if the highlighted text denoted valid events. If both triggers were deemed valid, then the annotators evaluated whether or not the candidate precondition event was an actual precondition for the target event. Specifically they check if the candidate event is necessary for the target event to happen.

As the result of crowdsouring, we have 10,806 preconditions out of 28,948 instances in total.

Tasks

We now propose two tasks that test for the ability to recognize and generate preconditions in textual contexts. Here we describe evaluations to benchmark the performance of current models on these tasks and to better understand the challenges involved.

PeKo Task 1: Precondition Identification

Given a text snippet with a target and candidate event pair, the task is to classify if the candidate event is a precondition for the target in the context described by the text snippet. This is a standard sentence-level classification task.

PeKo Task 2: Precondition Generation Task

Here we introduce Precondition Generation as a more general challenge that a dataset like PeKo now enables. Given a target event t, generate an event p that is a precondition for t. We benchmark performance on evaluation instances drawn from both PeKo and an out-of-domain dataset ATOMIC.

Download

The dataset can be downloaded from here

Citation

Please use the following bibtex entry:

@article{kwon2020modeling,
title={Modeling Preconditions in Text with a Crowd-sourced Dataset},
author={Kwon, Heeyoung and Koupaee, Mahnaz and Singh, Pratyush and Sawhney, Gargi and Shukla, Anmol and Kallur, Keerthi Kumar and Chambers, Nathanael and Balasubramanian, Niranjan},
journal={arXiv preprint arXiv:2010.02429},
year={2020}
}

Dataset Information

data
 ├── peko_all.jsonl             # PeKo dataset
 ├── peko_gen_train.txt         # PeKo generation instances
 ├── peko_gen_dev.txt
 ├── peko_gen_test.txt
 ├── temp_gen_train.txt         # Generation instances for temporal model
 ├── temp_gen_dev.txt
 ├── LM_gen_train.txt           # Generation instances for plain language model
 ├── LM_gen_dev.txt
 └── atomic_samples.txt         # ATOMIC samples for generation task

peko_all.jsonl: PeKo dataset, each line contains a single json document.
- sent_id: sentence ID
- source: a candidate precondition event
- target: a target event
- label: 1 for precondition, 0 for non-precondition
- n_yes: the number of votes for precondition
- n_vote: the number of annotator
- sent: sentence(s), tokens are separated by space
{peko/temp/LM}_gen_*.txt

Tab separated text files. The first column contains full text, which is used for the generation target and the second column contains a precondition-masked-out instance.
atomic_samples.txt

The file contains generation seeds from ATOMIC dataset

Contributors

Heeyoung Kwon (Stony Brook University)
Mahnaz Koupaee (Stony Brook University)
Pratyush Singh (Stony Brook University)
Gargi Sawhney (Stony Brook University)
Anmol Shukla (Stony Brook University)
Keerthi Kumar Kallur (Stony Brook University)
Nate Chambers (US Naval Academy)
Niranjan Balasubramanian (Stony Brook University)

ykl7 / peko Goto Github PK

peko's Introduction

PeKo: A Large Scale Precondition Knowledge Dataset

Overview

Crowdsourcing Precondition Knowledge

Data Preparation

Crowdsourcing Task

Tasks

PeKo Task 1: Precondition Identification

PeKo Task 2: Precondition Generation Task

Download

Citation

Dataset Information

Contributors

peko's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent