Git Product home page Git Product logo

spacy-iwnlp's People

Contributors

johann-petrak avatar liebeck avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

spacy-iwnlp's Issues

Why assign the lemma and re-calculate it using the getter?

The current code assigns the lemmata in the call method of spaCyIWNLP to every token which is fine.

However, when registering the extension, a getter is defined as well which will then ignore what has been stored in the "_" dict and instead re-calculate the lemmas and return those.

Unless I misunderstand something important here, I think this is a problem, because it will always invoke the wrapper whenever the lemmas for a token are retrieved.

Also, it makes it impossible for a client pipeline to update or set the iwnlp_lemmas field, should it not return anything or return something wrong (e.g. the IWNLP lemmatizer does not create a lemma for punctuation while Spacy does, so in order to always have some lemma, one could set the iwnlp_lemmas attribute to [token.lemma_]. Indeed it is possible to set the attribute to this, but it is not possible to retrieve this because it will instead recalculate the value and return that.

OSError: [E050]

I cannot run this line of code

nlp = spacy.load('de')

This is the error message:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/kylefoley/codes/venv/lib/python3.8/site-packages/spacy/__init__.py", line 30, in load
    return util.load_model(name, **overrides)
  File "/Users/kylefoley/codes/venv/lib/python3.8/site-packages/spacy/util.py", line 175, in load_model
    raise IOError(Errors.E050.format(name=name))
OSError: [E050] Can't find model 'de'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

Also, where am I supposed to put the json file here Download the latest processed IWNLP dump from http://lager.cs.uni-duesseldorf.de/NLP/IWNLP/IWNLP.Lemmatizer_20181001.zip and unzip it.

I tried putting it in the same directory that I run the code from. Is that correct?

Installing spacy_iwnlp fails

pip install spacy-iwnlp fails with the error:
Failed to build spacy
ERROR: Could not build wheels for spacy, which is required to install pyproject.toml-based projects
I was able to install spacy by pip install spacy but I'm not able to install iwnlp. What could be the problem?

No longer compatible with spacy v3.0?

I get error messages when I try to run the code as explained. I tried some things out, but I couldn't make it work.

Error Message:
[E966] nlp.add_pipe now takes the string name of the registered component factory, not a callable component. Expected string, but got <spacy_iwnlp.spaCyIWNLP object at 0x7f921cddcdf0> (name: 'None').

  • If you created your component with nlp.create_pipe('name'): remove nlp.create_pipe and call nlp.add_pipe('name') instead.

  • If you passed in a component like TextCategorizer(): call nlp.add_pipe with the string name instead, e.g. nlp.add_pipe('textcat').

  • If you're using a custom component: Add the decorator @Language.component (for function components) or @Language.factory (for class components / factories) to your custom component and assign it a name, e.g. @Language.component('your_name'). You can then run nlp.add_pipe('your_name') to add it to the pipeline.

My code:

from spacy_iwnlp import spaCyIWNLP
nlp = spacy.load('de_core_news_sm')
iwnlp = spaCyIWNLP(lemmatizer_path='data/IWNLP.Lemmatizer_20181001.json')
nlp.add_pipe(iwnlp)
doc = nlp('Wir mögen Fußballspiele mit ausgedehnten Verlängerungen.')
for token in doc:
    print('POS: {}\tIWNLP:{}'.format(token.pos_, token._.iwnlp_lemmas)) ```

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.