Git Product home page Git Product logo

hunpos's People

Watchers

 avatar

hunpos's Issues

merészé, ..lábé

What steps will reproduce the problem?
merészé NOUN
egéré   NOUN<ANP>
regényé NOUN<CAS<TRA>>
rémé    NOUN<CAS<TRA>>
kézé    NOUN
lábé    NOUN

What is the expected output? What do you see instead?
merészé NOUN<ANP>
regényé NOUN<ANP>
rémé    NOUN<ANP>
kézé    NOUN<ANP>
lábé    NOUN<ANP>

What version of the product are you using? On what operating system?
current, 2009 July linux, debian

Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 13 Jul 2009 at 7:18

windows syntax for training

Hi,

Could you please give the correct training syntax for windows. I've tried: type 
frenchcorpus.txt > hunpos-train french.model (type being the windows equivalent 
of cat) and it doesn't work.
The Linux syntax you recommend is : {{cat uzbek.corpus | ./hunpos-train 
uzbek.model}}}, what are the curly bracket used for ?
TIA

Original issue reported on code.google.com by [email protected] on 25 Nov 2010 at 12:23

Hunpos does not tag properly when input/output is piped to/from file

I am running Hunpos 1.0 on Windows XP, using the provided english-wsj 
model. Hunpos works great when I use it interactively, but when I pipe 
input/output to/from file, the tags are all obviously wrong.

Here is my input file (in.txt):

The
quick
red
fox
jumped
over
the
lazy
dogs.

>hunpos-tag english.model <in.txt >out.txt

This is what I get in out.txt:

The
    NNP 
quick
    VBD 
red
    CD  
fox
    CD  
jumped
    CD  
over
    CD  
the
    CD  
lazy
    CD  
dogs.
    JJ  

    NNS 

If I type the same text in manually at the console, the resulting tags are 
completely different.

What gives?

Thanks in advance... Alex Dowad

Original issue reported on code.google.com by [email protected] on 18 Feb 2010 at 12:06

Unable to checkout the code (error 403)

What steps will reproduce the problem?
1. Try to checkout the project with the command
svn checkout http://hunpos.googlecode.com/svn/trunk/ hunpos-read-only

What is the expected output? What do you see instead?
The project should be checked-out
svn: PROPFIND request failed on '/svn/trunk'
svn: PROPFIND of '/svn/trunk': 403 Forbidden (http://hunpos.googlecode.com)

What version of the product are you using? On what operating system?
svn version 1.4.4 and 1.5

Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 11 Jun 2009 at 12:29

keréké

What steps will reproduce the problem?
keréké  NOUN<PLUR><ANP>
egéré   NOUN<ANP>

What is the expected output? What do you see instead?
keréké  - non plural

What version of the product are you using? On what operating system?
current (2009. July), linux debian

Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 13 Jul 2009 at 7:12

hurrá is UTT-INT, not a noun

What steps will reproduce the problem?
hurrá   NOUN<CAS<TRA>>
jaj     UTT-INT
húrrá   NOUN<CAS<TRA>>


What is the expected output? What do you see instead?
hurrá  expected: UTT-INT

What version of the product are you using? On what operating system?
Most current 2009, april 24, linux debian


Please provide any additional information below.
hurrá is not a conjugated noun, but an UTT-INT (indulatszó). 

Original issue reported on code.google.com by [email protected] on 25 Apr 2009 at 8:13

personal pronouns error

What steps will reproduce the problem?
nekünk  VERB<INF>
nekik   NOUN<PERS><PLUR><CAS<DAT>>
tõletek VERB<MODAL><PERS<1>>
tõle    NOUN<PERS><CAS<ABL>>
hozzátok        NOUN<PLUR>
hozzájuk        NOUN<PERS><PLUR><CAS<ALL>>
értem   VERB<PERS<1>><DEF>
jöttek  VERB<PAST><PLUR>


What is the expected output? What do you see instead?
nekünk  VERB<INF> expected:  NOUN<PERS <1>><PLUR><CAS<DAT>>
tõletek VERB<MODAL><PERS<1>> expected: NOUN<PERS <2>><CAS<ABL>>
hozzátok        NOUN<PLUR> expected: NOUN<PERS <2jöttek 
VERB<PAST><PLUR>>><PLUR><CAS<ALL>>
értem   VERB<PERS<1>><DEF> expected:NOUN<PERS<1>><PLUR><CAS<CAU>>, when
followed by verb, like in example

What version of the product are you using? On what operating system?
current, 2009. apr. 24, linux-debian

Please provide any additional infNOUN<PERS<1>><PLUR><CAS<CAU>>ormation below.


Original issue reported on code.google.com by [email protected] on 25 Apr 2009 at 8:29

Fatal error: exception Failure("empty context_trie)

Hi, 

I've tried to make my own POS model under LINUX. I first tried with a corpus of 
1 600 000 words (and tags) and I got a Stack Overflow error, so I tried with a 
much smaller corpus (100 000 words and tags), the program tells me it's reading 
the training corpus, then compiling probabilities, then it sends me a Fatal 
error: exception Failure("empty context_trie).

What do I do wrong, my corpus is just a file with 1 word and one tag / line 
with LF end of line.

TIA for the answer

Original issue reported on code.google.com by [email protected] on 25 Nov 2010 at 4:09

refactoring of special tokens

special tokens (class based emission probs) are important features of
hunpos and TnT. 

For the following regular expressions hunpos learns the tag distribution of
the training corpus separately to give more reliable estimates for open
class items like numbers unseen during training:

^[0-9]+$ 
^[0-9]+\.$      
^[0-9.,:-]+[0-9]+$
^[0-9]+[a-zA-Z]{1,3}$ 

After this, at tag time, if the word is not found in the lexicon
(numerals are added to the lexicon like all other items) hunpos checks
whether  the unseen word matches some of the regexps, and uses the
distribution learned for this regexp to guess the tag.

Now these regexpr are hardcoded in special_tokens.ml file. Need some very
fast regexp matching or something like tranducers.

Original issue reported on code.google.com by [email protected] on 30 Jun 2007 at 11:54

Enhancement request, lemma display would be very useful.

What steps will reproduce the problem?
gyáraikat       NOUN<PLUR><POSS<PLUR>><CAS<ACC>>

What is the expected output? What do you see instead?
gyáraikat       NOUN<PLUR><POSS<PLUR>><CAS<ACC>>  gyár

What version of the product are you using? On what operating system?
2009 április 24, current. linux-debian

Please provide any additional information below.
This is an enhancement request. It would be very useful, if wird stem
(lemma) would be displayed for each word.

Original issue reported on code.google.com by [email protected] on 25 Apr 2009 at 8:34

human readable model files

Tomaz Erjavec sent this feature request

"- I think it would be usefull if the model were in fact plain text, as
with TnT. I have often found it useful to be able to edit the model by
hand, say for adding missed out morphological forms to the lexicon. Or,
right now we have the case where we have a tagged corpus with lots of words
tagged as unknown. Id like to train the tagger on this corpus, then remove
the unknown words from the lexcion, and tag the same corpus, substituting
the unknown tag with the one assigned by the tagger. It is not obvious how
I could do this with HunPos"

Original issue reported on code.google.com by [email protected] on 29 Jun 2007 at 2:03

tagging erroneous in some cases

What steps will reproduce the problem?
Tag a sample corpus


What is the expected output? What do you see instead?
I wrote in for example:
       got       expected
I   PRP 
do  VBP 
work    NN  <-- VB
.   SENT    


What version of the product are you using? On what operating system?
1.2.8, linux mint

Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 19 Dec 2010 at 8:59

hunpos-tag binary cannot be run, differs from self-compiled version

What steps will reproduce the problem?
1. download binaries from the hunpos repository
2. run hunpos-tag
3.

What is the expected output? What do you see instead?
Result:
$ ./hunpos-tag 
bash: ./hunpos-tag: No such file or directory


What version of the product are you using? On what operating system?
1.0 on Ubuntu 12.04

Please provide any additional information below.
Compiling from the source in repository and running hunpos-tag works.

Original issue reported on code.google.com by [email protected] on 26 Sep 2012 at 8:25

import TnT model file

nice feature to import TnT model files directly for example
[http://www.bultreebank.org/taggers/taggers.html Bulgarian model file]


Original issue reported on code.google.com by [email protected] on 30 Jun 2007 at 9:59

Adjektive comparativ and superlativ missing, super-superlativ shown as noun

What steps will reproduce the problem?
jó      ADJ
jobb    ADJ
legjobb ADJ
legeslegjobb    NOUN

What is the expected output? What do you see instead?
jó      ADJ
jobb    ADJ  expected: ADJR (Adjektive, comparative)
legjobb ADJ  expected:  ADJS (Adjektive, superlative)
legeslegjobb    NOUN  expected: ADJSS (Adjektive, super-superlative)


What version of the product are you using? On what operating system?
Most current (2009. apr. 24), linux debian

Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 25 Apr 2009 at 8:11

Stack over flow for large corpora

I am trying to make a new model with hunpos but it has issues with large 
corpora. My corpus is 300 000 words and I get stack overflow problem.
I am using it on linux. Does this POS tagger support larger corpora? Or this 
problem is because of my OS and my Computer RAM? 

Original issue reported on code.google.com by [email protected] on 19 Feb 2014 at 9:50

drink NN instead of VB

Maybe   RB  
that    WDT 
made    VBD 
Mother  NNP 
drink   NN  
,   ,   
I   PRP 
do  VBP 
not RB  
know    VB  
.   .   

Elvárt:
Maybe   RB  
that    WDT 
made    VBD 
Mother  NNP 
drink   VB  <----------------- ez nem jó   
,   ,   
I   PRP 
do  VBP 
not RB  
know    VB  
.   .   

Original issue reported on code.google.com by [email protected] on 24 Nov 2010 at 10:15

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.