Git Product home page Git Product logo

Comments (6)

ccoreilly avatar ccoreilly commented on June 2, 2024

Hola Jose!

What model version are you using? I have no problems with spacy 2.3 and the latest model (0.1.0 which is trained on spacy 2.2 but still, there are no issues).

>>> for token in nlp('Bon dia tinga vosté'): print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_, token.shape_, token.is_alpha, token.is_stop)
...
Bon Bon PROPN PROPN nsubj Xxx True False
dia dia NOUN NOUN flat xxx True False
tinga tenir VERB VERB ROOT xxxx True False
vosté vosté PRON PRON xcomp xxxx True False

Entity recognition is not very good as I had little training data when I created the model but the tagger and parser are good enough for my use case.

>>> for token in nlp('La Generalitat de Catalunya té els seus orígens en les Corts Catalanes, les quals, durant el regnat de Jaume I el Conqueridor (1208-1276), es reunien convocades pel rei com a representatives dels estaments socials de l\'època'): print(token.text, token.lemma_, token.pos_, token.tag_, token.dep_, token.shape_, token.is_alpha, token.is_stop, token.ent_type_)
...
La La DET DET det Xx True True
Generalitat Generalitat PROPN PROPN nsubj Xxxxx True False ORG
de de ADP ADP case xx True True ORG
Catalunya Catalunya PROPN PROPN flat Xxxxx True False ORG
té tenir VERB VERB ROOT xx True False
els ell DET DET det xxx True True
seus seure DET DET det xxxx True True
orígens origen NOUN NOUN obj xxxx True False
en en ADP ADP case xx True True
les ell DET DET det xxx True True
Corts Corts PROPN PROPN obl Xxxxx True False ORG
Catalanes Catalanes PROPN PROPN flat Xxxxx True False ORG
, , PUNCT PUNCT punct , False False
les ell DET DET det xxx True True
quals qual PRON PRON nsubj xxxx True True
, , PUNCT PUNCT punct , False False
durant durar ADP ADP case xxxx True True
el ell DET DET det xx True True
regnat regnar NOUN NOUN obl xxxx True False
de de ADP ADP case xx True True
Jaume Jaume PROPN PROPN nmod Xxxxx True False PER
I I CCONJ CCONJ cc X True True PER
el ell DET DET det xx True True PER
Conqueridor Conqueridor PROPN PROPN conj Xxxxx True False PER
( ( PUNCT PUNCT punct ( False False
1208 1208 NOUN NOUN appos dddd False False
- - PUNCT PUNCT punct - False False
1276 1276 NOUN NOUN appos dddd False False G
) ) PUNCT PUNCT punct ) False False
, , PUNCT PUNCT punct , False False
es ell PRON PRON obj xx True True
reunien reunir VERB VERB acl xxxx True False
convocades convocar ADJ ADJ obj xxxx True False
pel pel ADP ADP case xxx True True
rei rei NOUN NOUN obj xxx True False
com com SCONJ SCONJ case xxx True True
a a ADP ADP fixed x True True
representatives representatiu NOUN NOUN obj xxxx True False
dels dels ADP ADP case xxxx True True
estaments estament NOUN NOUN nmod xxxx True False ORG
socials social ADJ ADJ amod xxxx True False ORG
de de ADP ADP case xx True True ORG
l' ell DET DET det x' False False ORG
època època NOUN NOUN nmod xxxx True False ORG

You can see it detects Generalitat de Catalunya and Corts Catalanes as ORG (Organization) and Jaume I el Conqueridor as PER (Person) but it also detects estaments socials de l'època as ORG (which is disputable). There are also other NER tags in the model which are due to an unclean dataset (this is a pending task...).

from spacy-catala.

josejuanmartinez avatar josejuanmartinez commented on June 2, 2024

Hola Ciaran!

I was just installing as per what the Readme.md (0.0.2) in master says. Then I got the results I sent in this issue, and a warning saying that the model is not compatible with spacy 2.3.0. So I created a venv with spacy 2.1.0 and tried again, again same results (only lemma is ok), but without the warning.

I will try with the 0.0.3 in releases and let you know.

Thank you for your time!

from spacy-catala.

josejuanmartinez avatar josejuanmartinez commented on June 2, 2024

Ciaran,

the 3.0.0 version already works, except the lemma. I've tried in Spacy 2.3.0 (throws again a warning saying that the model was trained for 2.2.0, but anyway it works), and also for 2.2.0, and same results.

Any idea?
image

from spacy-catala.

ccoreilly avatar ccoreilly commented on June 2, 2024

Hi Jose,

sorry for the late reply. You're right that version 0.3.0 does not include the lemma. Version 0.1.0 of the model (the latest release) does but it has some issues as well.

I am planning on cleaning up the dataset and releasing a new version so I'll look into fixing it.

from spacy-catala.

josejuanmartinez avatar josejuanmartinez commented on June 2, 2024

from spacy-catala.

ccoreilly avatar ccoreilly commented on June 2, 2024

Hi Jose,

sorry for my late reply but I had no time until now... I just published a new medium sized model trained on spaCy 2.3.2. Would you mind trying it out?

https://github.com/ccoreilly/spacy-catala/releases/tag/ca_fasttext_wiki_md-1.0.0

I will release the large model later today.

from spacy-catala.

Related Issues (4)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.