Unsupervised Relation Extraction with Sentence level Distributional Semantics

SURE is an unsupervised system for relationship extraction relying on Sentence level Distributional Semantics (i.e., sentence enocoding). For more details please refer to: The paper

Architecture: system description

Dependencies

You need to have Python 3.6 or above and the following libraries installed:

NLTK: http://www.nltk.org/

sKlearn: https://scikit-learn.org/stable/

Numpy: https://numpy.org/

Sentence-Transformers: https://www.sbert.net/

Pandas: https://pandas.pydata.org/

Pytorch: https://pytorch.org/

which you can install issuing the following command:

pip install -r requirements.txt

Usage:

To run the relation extraction system use the following command:

python main.py corpus entity_type1 entity_type2

Config

between_length: 6             # Maximum number of tokens between two entities       
before_after_window: 3        # Maximum number of tokens before the first entity and maximum number of tokens after second entity
similiraty: 0.25              # Cosine similirity threshold during the first iteration
top_similar: 15               # Maximum number of top similar sentences to the query term using cosine similarity
query_term: born in           # Natural language representation for relationship birthPlace

corpus

A sample sentence in the corpus is one sentence per line, with tags identifing the named type of named-entities, e.g.:

<ORG> Consolidated Edison </ORG>, based in <LOC> New York </LOC>, generated more than $7 billion in annual revenue.
The social media platform <ORG> Facebook,Inc.</ORG> announced it was acquiring <ORG>WhatsApp</ORG>, its largest acquisition to date.
<LOC> Herzogenaurach </LOC> is the home of goods company <ORG> Adidas<ORG>.

entity types

Two named entity types are provided initally for a particular relation for example, for rleation headquarterIn we provide ORG (Organization) and LOC (Location).

Dataset used

Dataset	Download
NYT-FB dataset	Download
Wikipedia_Wikidata dataset	Download
English gigaword	Download

Run with basic configuration

python main.py "text_corpus"  PER LOC 2

Authors

Manzoor Ali (DICE, Paderborn University)
Muhammad Saleem (AKSW, University of Leipzig)
Axel-Cyrille Ngonga Ngomo (DICE, Paderborn University)

manzoorali29 / sure Goto Github PK

sure's Introduction

Unsupervised Relation Extraction with Sentence level Distributional Semantics

Architecture: system description

Dependencies

NLTK: http://www.nltk.org/

sKlearn: https://scikit-learn.org/stable/

Numpy: https://numpy.org/

Sentence-Transformers: https://www.sbert.net/

Pandas: https://pandas.pydata.org/

Pytorch: https://pytorch.org/

Usage:

Config

corpus

entity types

Dataset used

Run with basic configuration

Authors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent