Git Product home page Git Product logo

sotaner's Introduction

This repo contains the relevant code and result files for the paper "What do we really know about State of the art NER?", published in LREC 2022, co-authored by Sowmya Vajjala (National Research Council, Canada) and Ramya Balasubramaniam (Novisto, Canada).

To cite the paper, refer to various citation formats on the aclanthology page for the paper.

Here is some information on the core NLP libraries and their versions:
-- Spacy 3.0, with en_core_web_trf model (instructions to install: https://spacy.io/usage)

-- Stanza: check here - https://github.com/stanfordnlp/stanza-train.

-- Spark-NLP, version 3.1.2 is used. - https://nlp.johnsnowlabs.com/docs/en/install

Here is some description about the rest of the contents in this repo:
spacy-stanza-scripts/ directory consists of the following information:

  • scripts to convert bio files to spacy/stanza format (tool specific json)
  • scripts to train and save ner models with spacy/stanza
  • scripts to evaluate an existing ner model trained using train scripts above.
  • config.cfg file for spacy.

Python files:

  • conll-to-bio.py: converts ontonotes downloaded format from cemantrix and LDC to CONLL-BIO format, using a pre-existing script.
  • evaluation_spacy_stanza.py: evaluates the NER models that come with spacy and stanza, on any test set in BIO format.
  • evaluation_sparknlp.py: evaluates sparknlp's NER models
  • generate_splits.py: generates 10 random train/dev/test splits for ontonotes dataset.
  • input_perturbances_faker.py: generates new testsets for a given test set based on the chosen faker transformation.
  • train_sparknlp.py: script to train a NER model using SparkNLP's architecture.

Results-Details.xlsx contains all the detailed result tables (and the figures generated out of those).

sotaner's People

Contributors

nishkalavallabhi avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.