Git Product home page Git Product logo

Comments (11)

glample avatar glample commented on May 27, 2024

Hi,

I only trained the model on the CoNLL datasets that were already tokenized, so I did not have to tokenized anything. Probably the Moses tokenizer should work well:
https://github.com/moses-smt/mosesdecoder/tree/master/scripts/tokenizer

from tagger.

janwendt avatar janwendt commented on May 27, 2024

@glample can you give one example line how the input.txt should look like?

from tagger.

glample avatar glample commented on May 27, 2024

Yes, you can check the data here:
https://github.com/glample/tagger/tree/master/dataset

from tagger.

mrmotallebi avatar mrmotallebi commented on May 27, 2024

@janwendt What did you eventually do?

from tagger.

janwendt avatar janwendt commented on May 27, 2024

@mrmotallebi I am using the StanfordCoreNLP API which does a very good job but there are similar Python libs (NLTK is pretty good) as well.
Post that got me to the API: https://www.ibm.com/developerworks/community/blogs/nlp/entry/tokenization?lang=en

from tagger.

glample avatar glample commented on May 27, 2024

I would personally recommend the Moses one, it's pretty standard, and very fast.

from tagger.

bjtu-lucas-nlp avatar bjtu-lucas-nlp commented on May 27, 2024

@janwendt Do you have a domo of input.txt to the tagger.py?

from tagger.

bjtu-lucas-nlp avatar bjtu-lucas-nlp commented on May 27, 2024

@janwendt It needn't . I have successed.

from tagger.

gui-li avatar gui-li commented on May 27, 2024

@bjtu-lucas-nlp Could you please share an example of input.txt?
I have tried all kinds of combination but still get O tags of everything in the output.

from tagger.

janwendt avatar janwendt commented on May 27, 2024

@gui-li if you have a column based data structure like:
example1;example2;example3;

the tokenized output and input.txt for your tagger should be:

example1
example2
example3

from tagger.

gui-li avatar gui-li commented on May 27, 2024

@janwendt Thanks for replying.

from tagger.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.