liebeck / spacy-iwnlp Goto Github PK
View Code? Open in Web Editor NEWGerman lemmatization with IWNLP as extension for spaCy
License: MIT License
German lemmatization with IWNLP as extension for spaCy
License: MIT License
The current code assigns the lemmata in the call method of spaCyIWNLP to every token which is fine.
However, when registering the extension, a getter is defined as well which will then ignore what has been stored in the "_" dict and instead re-calculate the lemmas and return those.
Unless I misunderstand something important here, I think this is a problem, because it will always invoke the wrapper whenever the lemmas for a token are retrieved.
Also, it makes it impossible for a client pipeline to update or set the iwnlp_lemmas field, should it not return anything or return something wrong (e.g. the IWNLP lemmatizer does not create a lemma for punctuation while Spacy does, so in order to always have some lemma, one could set the iwnlp_lemmas attribute to [token.lemma_]
. Indeed it is possible to set the attribute to this, but it is not possible to retrieve this because it will instead recalculate the value and return that.
I cannot run this line of code
nlp = spacy.load('de')
This is the error message:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/kylefoley/codes/venv/lib/python3.8/site-packages/spacy/__init__.py", line 30, in load
return util.load_model(name, **overrides)
File "/Users/kylefoley/codes/venv/lib/python3.8/site-packages/spacy/util.py", line 175, in load_model
raise IOError(Errors.E050.format(name=name))
OSError: [E050] Can't find model 'de'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
Also, where am I supposed to put the json file here Download the latest processed IWNLP dump from http://lager.cs.uni-duesseldorf.de/NLP/IWNLP/IWNLP.Lemmatizer_20181001.zip and unzip it.
I tried putting it in the same directory that I run the code from. Is that correct?
pip install spacy-iwnlp
fails with the error:
Failed to build spacy
ERROR: Could not build wheels for spacy, which is required to install pyproject.toml-based projects
I was able to install spacy by pip install spacy
but I'm not able to install iwnlp. What could be the problem?
I get error messages when I try to run the code as explained. I tried some things out, but I couldn't make it work.
Error Message:
[E966] nlp.add_pipe
now takes the string name of the registered component factory, not a callable component. Expected string, but got <spacy_iwnlp.spaCyIWNLP object at 0x7f921cddcdf0> (name: 'None').
If you created your component with nlp.create_pipe('name')
: remove nlp.create_pipe and call nlp.add_pipe('name')
instead.
If you passed in a component like TextCategorizer()
: call nlp.add_pipe
with the string name instead, e.g. nlp.add_pipe('textcat')
.
If you're using a custom component: Add the decorator @Language.component
(for function components) or @Language.factory
(for class components / factories) to your custom component and assign it a name, e.g. @Language.component('your_name')
. You can then run nlp.add_pipe('your_name')
to add it to the pipeline.
My code:
from spacy_iwnlp import spaCyIWNLP
nlp = spacy.load('de_core_news_sm')
iwnlp = spaCyIWNLP(lemmatizer_path='data/IWNLP.Lemmatizer_20181001.json')
nlp.add_pipe(iwnlp)
doc = nlp('Wir mögen Fußballspiele mit ausgedehnten Verlängerungen.')
for token in doc:
print('POS: {}\tIWNLP:{}'.format(token.pos_, token._.iwnlp_lemmas)) ```
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.