Git Product home page Git Product logo

casrel's Introduction

A Novel Cascade Binary Tagging Framework for Relational Triple Extraction

This repository contains the source code and dataset for the paper: A Novel Cascade Binary Tagging Framework for Relational Triple Extraction. Zhepei Wei, Jianlin Su, Yue Wang, Yuan Tian and Yi Chang. ACL 2020. [pdf]

Overview

At the core of the proposed CasRel framework is the fresh perspective that instead of treating relations as discrete labels on entity pairs, we actually model the relations as functions that map subjects to objects. More precisely, instead of learning relation classifiers f(s,o) -> r, we learn relation-specific taggers f_{r}(s) -> o, each of which recognizes the possible object(s) of a given subject under a specific relation. Under this framework, relational triple extraction is a two-step process: first we identify all possible subjects in a sentence; then for each subject, we apply relation-specific taggers to simultaneously identify all possible relations and the corresponding objects.

overview

Requirements

This repo was tested on Python 3.7 and Keras 2.2.4. The main requirements are:

  • tqdm
  • codecs
  • keras-bert = 0.80.0
  • tensorflow-gpu = 1.13.1

Datasets

Usage

  1. Get pre-trained BERT model for Keras

    Download Google's pre-trained BERT model (BERT-Base, Cased). Then decompress it under pretrained_bert_models/. More pre-trained models are available here.

  2. Build dataset in the form of triples

    Take the NYT dataset for example:

    a) Switch to the corresponding directory and download the dataset

    cd CasRel/data/NYT/raw_NYT

    b) Follow the instructions at the same directory, and just run

    python generate.py

    c) Finally, build dataset in the form of triples

    cd CasRel/data/NYT
    python build_data.py

    This will convert the raw numerical dataset into a proper format for our model and generate train.json, test.json and val.json(if not provided in the raw dataset, it will randomly sample 5% or 10% data from the train.json or test.json to create val.json as in line with previous works). Then split the test dataset by type and num for in-depth analysis on different scenarios of overlapping triples.

  3. Specify the experimental settings

    By default, we use the following settings in run.py:

    {
        "bert_model": "cased_L-12_H-768_A-12",
        "max_len": 100,
        "learning_rate": 1e-5,
        "batch_size": 6,
        "epoch_num": 100,
    }
  4. Train and select the model

    Specify the running mode and dataset at the command line

    python run.py ---train=True --dataset=NYT

    The model weights that lead to the best performance on validation set will be stored in saved_weights/DATASET/.

  5. Evaluate on the test set

    Specify the test dataset at the command line

    python run.py --dataset=NYT

    The extracted result will be saved in results/DATASET/ with the following format:

    {
        "text": "Tim Brooke-Taylor was the star of Bananaman , an STV series first aired on 10/03/1983 and created by Steve Bright .",
        "triple_list_gold": [
            {
                "subject": "Bananaman",
                "relation": "starring",
                "object": "Tim Brooke-Taylor"
            },
            {
                "subject": "Bananaman",
                "relation": "creator",
                "object": "Steve Bright"
            }
        ],
        "triple_list_pred": [
            {
                "subject": "Bananaman",
                "relation": "starring",
                "object": "Tim Brooke-Taylor"
            },
            {
                "subject": "Bananaman",
                "relation": "creator",
                "object": "Steve Bright"
            }
        ],
        "new": [],
        "lack": []
    }

Citation

@inproceedings{wei2020CasRel,
  title={A Novel Cascade Binary Tagging Framework for Relational Triple Extraction},
  author={Wei, Zhepei and Su, Jianlin and Wang, Yue and Tian, Yuan and Chang, Yi},
  booktitle={Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics},
  pages={1476--1488},
  year={2020}
}

casrel's People

Contributors

weizhepei avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.