Comments (4)
Yes, that's using en_core_web_sm
in both cases.
From what I've read, it was never supposed to be faster, just slightly more accurate and memory efficient. Check the "Model Comparison" table on this page. It shows a large speed drop for a relatively small performance gain. It's true that the parser and NER components are >5% more accurate, but ERRANT mainly relies on the POS tagger, so the ~0.5% POS improvement isn't really significant.
There's also a long issue thread about it here when it went from v1 to v2 , and it seemed to me that the conclusion was that it'll never be as fast as v1.
from errant.
Hey Sam,
Yes, Spacy 2 support is definitely on the to-do list. I mainly wanted the first pip version to be compatible with the BEA shared task, but newer versions will change the results slightly.
Some good news: Spacy finally updated their English tag map to the same one that I use, so as long as you use spacy >= 2.2.2, rule compatibility shouldn't be a problem. I'm in the process of testing ERRANT with this version of spacy too, so hopefully ERRANT 2.1 will come out soon!
from errant.
Quick update:
I tried using ERRANT with the latest version of spacy (2.2), and the only thing that broke is a call to an old lemmatiser in the classifier. For a quick fix, you can change the same_lemma
function to:
if o_tok.lemma == c_tok.lemma: return True
return False
Otherwise, it looked as if annotation performance decreased by about 1% and processing time took about 3 times longer. I'll need to debug the accuracy loss (and have some ideas already), but there's not really anything I can do about the speed loss...
from errant.
Thanks for the update.
The performance thing is interesting, since spacy2.0 was supposed to be faster... In both cases, is this using en_core_web_sm
?
from errant.
Related Issues (20)
- Expose errant_compare functionality via the API HOT 3
- Merge Casing Issue HOT 2
- Handling Missing Annotations on certain sentence HOT 5
- Edits missed for a substitute -> Delete -> Substitute sequence. HOT 3
- OSError: [E053] Could not read meta.json from en\meta.json HOT 3
- Implementation issue HOT 6
- UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2490: ordinal not in range(128) HOT 2
- Parallel_to_m2 is not working HOT 1
- Licensing concerns HOT 6
- Errant parse method not working HOT 5
- Wrong format for incorr_sentences.txt HOT 4
- ‘’AttributeError: 'English' object has no attribute 'tagger'” when running the "Quick Start" code in API given in README.md HOT 4
- Ignore temporary files generated by installation HOT 1
- cancelling
- Edit indices HOT 3
- Simulate Errors HOT 1
- API Quickstart script not working - Please update with fix provided HOT 2
- Is there any way to further improve the method of summarizing error types? HOT 1
- Questions about evaluating duplicate corrections HOT 2
- Annotation HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from errant.