Git Product home page Git Product logo

peko's Introduction

PeKo: A Large Scale Precondition Knowledge Dataset

Overview

PeKo (Precondition Knowledge) is a large scale crowdsourced event precondition knowledge dataset introduced in our paper "Modeling Preconditions in Text with a Crowd-sourced Dataset" at EMNLP Findings 2020

Preprint is available from here

Crowdsourcing Precondition Knowledge

Crowdsourcing Task

Data Preparation

We extract events and their temporal relations from news articles using CAEVO (Chambers et al., 2014), a temporal relation extraction system. We used CAEVO on a random sample of 6,837 articles inthe New York Times Annotated Corpus (Sandhaus, 2008). On average CAEVO extracted around 63 events per article, which yielded a total of 3,906 possible relation candidates per document. We filtered these to retain only pairs of events that have a BEFORE or AFTER temporal relation between them. We call the temporally preceding event the candidate precondition, and the temporally subsequent event in the pair the target event.

Crowdsourcing Task

The annotators were presented with a text snippet and two event mentions highlighted as shown below. To prune out event extraction errors from CAEVO, the annotators were first asked if the highlighted text denoted valid events. If both triggers were deemed valid, then the annotators evaluated whether or not the candidate precondition event was an actual precondition for the target event. Specifically they check if the candidate event is necessary for the target event to happen.

HIT example

As the result of crowdsouring, we have 10,806 preconditions out of 28,948 instances in total.

Tasks

We now propose two tasks that test for the ability to recognize and generate preconditions in textual contexts. Here we describe evaluations to benchmark the performance of current models on these tasks and to better understand the challenges involved.

PeKo Task 1: Precondition Identification

Given a text snippet with a target and candidate event pair, the task is to classify if the candidate event is a precondition for the target in the context described by the text snippet. This is a standard sentence-level classification task.

Result Table

PeKo Task 2: Precondition Generation Task

Here we introduce Precondition Generation as a more general challenge that a dataset like PeKo now enables. Given a target event t, generate an event p that is a precondition for t. We benchmark performance on evaluation instances drawn from both PeKo and an out-of-domain dataset ATOMIC.

Generation Result Table

Download

The dataset can be downloaded from here

Citation

Please use the following bibtex entry:

@article{kwon2020modeling,
title={Modeling Preconditions in Text with a Crowd-sourced Dataset},
author={Kwon, Heeyoung and Koupaee, Mahnaz and Singh, Pratyush and Sawhney, Gargi and Shukla, Anmol and Kallur, Keerthi Kumar and Chambers, Nathanael and Balasubramanian, Niranjan},
journal={arXiv preprint arXiv:2010.02429},
year={2020}
}

Dataset Information

data
 ├── peko_all.jsonl             # PeKo dataset
 ├── peko_gen_train.txt         # PeKo generation instances
 ├── peko_gen_dev.txt
 ├── peko_gen_test.txt
 ├── temp_gen_train.txt         # Generation instances for temporal model
 ├── temp_gen_dev.txt
 ├── LM_gen_train.txt           # Generation instances for plain language model
 ├── LM_gen_dev.txt
 └── atomic_samples.txt         # ATOMIC samples for generation task
  • peko_all.jsonl: PeKo dataset, each line contains a single json document.

    • sent_id: sentence ID
    • source: a candidate precondition event
    • target: a target event
    • label: 1 for precondition, 0 for non-precondition
    • n_yes: the number of votes for precondition
    • n_vote: the number of annotator
    • sent: sentence(s), tokens are separated by space
  • {peko/temp/LM}_gen_*.txt

    Tab separated text files. The first column contains full text, which is used for the generation target and the second column contains a precondition-masked-out instance.

  • atomic_samples.txt

    The file contains generation seeds from ATOMIC dataset

Contributors

  • Heeyoung Kwon (Stony Brook University)
  • Mahnaz Koupaee (Stony Brook University)
  • Pratyush Singh (Stony Brook University)
  • Gargi Sawhney (Stony Brook University)
  • Anmol Shukla (Stony Brook University)
  • Keerthi Kumar Kallur (Stony Brook University)
  • Nate Chambers (US Naval Academy)
  • Niranjan Balasubramanian (Stony Brook University)

peko's People

Contributors

in2uitive avatar ykl7 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.