Git Product home page Git Product logo

Comments (4)

chrisjbryant avatar chrisjbryant commented on May 28, 2024 1

Yes, that's using en_core_web_sm in both cases.

From what I've read, it was never supposed to be faster, just slightly more accurate and memory efficient. Check the "Model Comparison" table on this page. It shows a large speed drop for a relatively small performance gain. It's true that the parser and NER components are >5% more accurate, but ERRANT mainly relies on the POS tagger, so the ~0.5% POS improvement isn't really significant.

There's also a long issue thread about it here when it went from v1 to v2 , and it seemed to me that the conclusion was that it'll never be as fast as v1.

from errant.

chrisjbryant avatar chrisjbryant commented on May 28, 2024

Hey Sam,

Yes, Spacy 2 support is definitely on the to-do list. I mainly wanted the first pip version to be compatible with the BEA shared task, but newer versions will change the results slightly.

Some good news: Spacy finally updated their English tag map to the same one that I use, so as long as you use spacy >= 2.2.2, rule compatibility shouldn't be a problem. I'm in the process of testing ERRANT with this version of spacy too, so hopefully ERRANT 2.1 will come out soon!

from errant.

chrisjbryant avatar chrisjbryant commented on May 28, 2024

Quick update:

I tried using ERRANT with the latest version of spacy (2.2), and the only thing that broke is a call to an old lemmatiser in the classifier. For a quick fix, you can change the same_lemma function to:

if o_tok.lemma == c_tok.lemma: return True
return False

Otherwise, it looked as if annotation performance decreased by about 1% and processing time took about 3 times longer. I'll need to debug the accuracy loss (and have some ideas already), but there's not really anything I can do about the speed loss...

from errant.

sam-writer avatar sam-writer commented on May 28, 2024

Thanks for the update.

The performance thing is interesting, since spacy2.0 was supposed to be faster... In both cases, is this using en_core_web_sm?

from errant.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.