Git Product home page Git Product logo

patent_ner_linking's Introduction

Named entitity recognition (NER) and Detecting Hyponym\Hypernym relationship on the dataset of Patents

The main goals of this project are:

  • Train NER model with dataset of Patents in the specific domain
  • Fine-tune with prodidy
  • Implement automatic detection of hyponyms\hypernyms with Hearst patterns
  • Validate detection results with several methods, inluding Wikidata

Structure

Setup

  1. Install dependencies from requirements.txt
  2. Unpack data:
    tar -xvf G06K.txt.gz
  3. Open project.ipynb and run first cell to chek that all imports works propperly

Notebook structure

Here is a brief overview of the project.ipynb parts.

Data processing

Screenshot 2022-06-03 at 11 14 57

In this section patent text read and prcessed to extract potential Named entities using curated list of terms manyterms.lower.txt

Training NER model

Screenshot 2022-06-03 at 11 21 47

Next, we are training the model on the created dataset.
Additionaly, if you have access to the Prodiy, you can apply Active Learning to tune the model.

Hearst patterns for hyponym detection

Screenshot 2022-06-03 at 11 33 11

Thise section is dedicated to extracting potential Entity linking (like hypernyms) using Hearst Patterns.

Automatic validation of the results

Screenshot 2022-06-03 at 11 34 55

Afte extraction, we validate results automatically, using Wiki API, WordNet or SpaCy embeddings. Here is an example of validation table after processing:

hq2SyK1SEvKTISY0DtddgY_mF9j966vIPi8Fhm26nJq-xPNc_NH0xPhap97ZAruJOHaEjqbf7a2-kKwSZnw6JeRFH9dwk2w06Dd9OjTOq3EmgRbpmFAYIIuyTphYtAeqcYa70NWnW_9ZwK4cGmEv0A

patent_ner_linking's People

Contributors

kinivi avatar gaetanserre avatar marwanmashra avatar nkise-nlab avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.