Git Product home page Git Product logo

albert_final_infer12's Introduction

BERT(S) for Relation Extraction

Overview

A PyTorch implementation of the models for the paper "Matching the Blanks: Distributional Similarity for Relation Learning" published in ACL 2019.
Note: This is not an official repo for the paper.
Additional models for relation extraction, implemented here based on the paper's methodology:

Requirements

Requirements: Python (3.6+), PyTorch (1.2.0), Spacy (2.1.8)
Pre-trained BERT(S) model courtesy of HuggingFace.co (https://huggingface.co)

Training by matching the blanks (MTB)

Run main_pretraining.py with arguments below. Pre-training data can be any .txt continuous text file.
We use Spacy NLP to grab pairwise entities (within a window size of 40 tokens length) from the text to form relation statements for pre-training. Entities recognition are based on NER and dependency tree parsing of objects/subjects.
The pre-training data (cnn.txt) that I've used can be downloaded here.

Note: Pre-training can take a long time, depending on available GPU. It is possible to directly fine-tune on the relation-extraction task and still get reasonable results, following the section below.

main_pretraining.py [-h] 
	[--pretrain_data TRAIN_PATH] 
	[--batch_size BATCH_SIZE]
	[--freeze FREEZE]  
	[--gradient_acc_steps GRADIENT_ACC_STEPS]
	[--max_norm MAX_NORM]
	[--fp16 FP_16]  
	[--num_epochs NUM_EPOCHS]
	[--lr LR]
	[--model_no MODEL_NO (0: BERT ; 1: ALBERT)]

Fine-tuning on SemEval2010 Task 8

Run main_task.py with arguments below. Requires SemEval2010 Task 8 dataset, available here. Download & unzip to ./data/ folder.

main_task.py [-h] 
	[--train_data TRAIN_DATA]
	[--test_data TEST_DATA]
	[--use_pretrained_blanks USE_PRETRAINED_BLANKS]
	[--num_classes NUM_CLASSES] 
	[--batch_size BATCH_SIZE]
	[--gradient_acc_steps GRADIENT_ACC_STEPS]
	[--max_norm MAX_NORM]
	[--fp16 FP_16]  
	[--num_epochs NUM_EPOCHS]
	[--lr LR]
	[--model_no MODEL_NO (0: BERT ; 1: ALBERT)]
	[--train TRAIN]
	[--infer INFER]

Inference (--infer=1)

To infer a sentence, you can annotate entity1 & entity2 of interest within the sentence with their respective entities tags [E1], [E2]. Example:

Type input sentence ('quit' or 'exit' to terminate):
The surprise [E1]visit[/E1] caused a [E2]frenzy[/E2] on the already chaotic trading floor.

Sentence:  The surprise [E1]visit[/E1] caused a [E2]frenzy[/E2] on the already chaotic trading floor.
Predicted:  Cause-Effect(e1,e2) 
from src.tasks.infer import infer_from_trained

inferer = infer_from_trained(args, detect_entities=False)
test = "The surprise [E1]visit[/E1] caused a [E2]frenzy[/E2] on the already chaotic trading floor."
inferer.infer_sentence(test, detect_entities=False)
Sentence:  The surprise [E1]visit[/E1] caused a [E2]frenzy[/E2] on the already chaotic trading floor.
Predicted:  Cause-Effect(e1,e2) 

The script can also automatically detect potential entities in an input sentence, in which case all possible relation combinations are inferred:

inferer = infer_from_trained(args, detect_entities=True)
test2 = "After eating the chicken, he developed a sore throat the next morning."
inferer.infer_sentence(test2, detect_entities=True)
Sentence:  [E2]After eating the chicken[/E2] , [E1]he[/E1] developed a sore throat the next morning .
Predicted:  Other 

Sentence:  After eating the chicken , [E1]he[/E1] developed [E2]a sore throat[/E2] the next morning .
Predicted:  Other 

Sentence:  [E1]After eating the chicken[/E1] , [E2]he[/E2] developed a sore throat the next morning .
Predicted:  Other 

Sentence:  [E1]After eating the chicken[/E1] , he developed [E2]a sore throat[/E2] the next morning .
Predicted:  Other 

Sentence:  After eating the chicken , [E2]he[/E2] developed [E1]a sore throat[/E1] the next morning .
Predicted:  Other 

Sentence:  [E2]After eating the chicken[/E2] , he developed [E1]a sore throat[/E1] the next morning .
Predicted:  Cause-Effect(e2,e1) 

Benchmark Results

SemEval2010 Task 8

  1. Base architecture: BERT base uncased (12-layer, 768-hidden, 12-heads, 110M parameters)

Without MTB pre-training: F1 results when trained on 100 % training data:

  1. Base architecture: ALBERT base uncased (12 repeating layers, 128 embedding, 768-hidden, 12-heads, 11M parameters)

Without MTB pre-training: F1 results when trained on 100 % training data:

To add

  • inference & results on benchmarks (SemEval2010 Task 8) with MTB pre-training
  • felrel task

albert_final_infer12's People

Contributors

soikonomou avatar

Stargazers

zyser avatar Nima Afshar avatar  avatar  avatar  avatar  avatar

Watchers

 avatar paper2code - bot avatar

Forkers

tubbz-alt shaun95

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.