williamockham / hunpos Goto Github PK
View Code? Open in Web Editor NEWAutomatically exported from code.google.com/p/hunpos
Automatically exported from code.google.com/p/hunpos
What steps will reproduce the problem?
merészé NOUN
egéré NOUN<ANP>
regényé NOUN<CAS<TRA>>
rémé NOUN<CAS<TRA>>
kézé NOUN
lábé NOUN
What is the expected output? What do you see instead?
merészé NOUN<ANP>
regényé NOUN<ANP>
rémé NOUN<ANP>
kézé NOUN<ANP>
lábé NOUN<ANP>
What version of the product are you using? On what operating system?
current, 2009 July linux, debian
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 13 Jul 2009 at 7:18
Hi,
Could you please give the correct training syntax for windows. I've tried: type
frenchcorpus.txt > hunpos-train french.model (type being the windows equivalent
of cat) and it doesn't work.
The Linux syntax you recommend is : {{cat uzbek.corpus | ./hunpos-train
uzbek.model}}}, what are the curly bracket used for ?
TIA
Original issue reported on code.google.com by [email protected]
on 25 Nov 2010 at 12:23
I am running Hunpos 1.0 on Windows XP, using the provided english-wsj
model. Hunpos works great when I use it interactively, but when I pipe
input/output to/from file, the tags are all obviously wrong.
Here is my input file (in.txt):
The
quick
red
fox
jumped
over
the
lazy
dogs.
>hunpos-tag english.model <in.txt >out.txt
This is what I get in out.txt:
The
NNP
quick
VBD
red
CD
fox
CD
jumped
CD
over
CD
the
CD
lazy
CD
dogs.
JJ
NNS
If I type the same text in manually at the console, the resulting tags are
completely different.
What gives?
Thanks in advance... Alex Dowad
Original issue reported on code.google.com by [email protected]
on 18 Feb 2010 at 12:06
What steps will reproduce the problem?
1. Try to checkout the project with the command
svn checkout http://hunpos.googlecode.com/svn/trunk/ hunpos-read-only
What is the expected output? What do you see instead?
The project should be checked-out
svn: PROPFIND request failed on '/svn/trunk'
svn: PROPFIND of '/svn/trunk': 403 Forbidden (http://hunpos.googlecode.com)
What version of the product are you using? On what operating system?
svn version 1.4.4 and 1.5
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 11 Jun 2009 at 12:29
What steps will reproduce the problem?
keréké NOUN<PLUR><ANP>
egéré NOUN<ANP>
What is the expected output? What do you see instead?
keréké - non plural
What version of the product are you using? On what operating system?
current (2009. July), linux debian
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 13 Jul 2009 at 7:12
What steps will reproduce the problem?
hurrá NOUN<CAS<TRA>>
jaj UTT-INT
húrrá NOUN<CAS<TRA>>
What is the expected output? What do you see instead?
hurrá expected: UTT-INT
What version of the product are you using? On what operating system?
Most current 2009, april 24, linux debian
Please provide any additional information below.
hurrá is not a conjugated noun, but an UTT-INT (indulatszó).
Original issue reported on code.google.com by [email protected]
on 25 Apr 2009 at 8:13
What steps will reproduce the problem?
nekünk VERB<INF>
nekik NOUN<PERS><PLUR><CAS<DAT>>
tõletek VERB<MODAL><PERS<1>>
tõle NOUN<PERS><CAS<ABL>>
hozzátok NOUN<PLUR>
hozzájuk NOUN<PERS><PLUR><CAS<ALL>>
értem VERB<PERS<1>><DEF>
jöttek VERB<PAST><PLUR>
What is the expected output? What do you see instead?
nekünk VERB<INF> expected: NOUN<PERS <1>><PLUR><CAS<DAT>>
tõletek VERB<MODAL><PERS<1>> expected: NOUN<PERS <2>><CAS<ABL>>
hozzátok NOUN<PLUR> expected: NOUN<PERS <2jöttek
VERB<PAST><PLUR>>><PLUR><CAS<ALL>>
értem VERB<PERS<1>><DEF> expected:NOUN<PERS<1>><PLUR><CAS<CAU>>, when
followed by verb, like in example
What version of the product are you using? On what operating system?
current, 2009. apr. 24, linux-debian
Please provide any additional infNOUN<PERS<1>><PLUR><CAS<CAU>>ormation below.
Original issue reported on code.google.com by [email protected]
on 25 Apr 2009 at 8:29
Hi,
I've tried to make my own POS model under LINUX. I first tried with a corpus of
1 600 000 words (and tags) and I got a Stack Overflow error, so I tried with a
much smaller corpus (100 000 words and tags), the program tells me it's reading
the training corpus, then compiling probabilities, then it sends me a Fatal
error: exception Failure("empty context_trie).
What do I do wrong, my corpus is just a file with 1 word and one tag / line
with LF end of line.
TIA for the answer
Original issue reported on code.google.com by [email protected]
on 25 Nov 2010 at 4:09
tnt-diff is a nifty little program. Maybe at some point you could add this?
Original issue reported on code.google.com by [email protected]
on 29 Jun 2007 at 11:06
special tokens (class based emission probs) are important features of
hunpos and TnT.
For the following regular expressions hunpos learns the tag distribution of
the training corpus separately to give more reliable estimates for open
class items like numbers unseen during training:
^[0-9]+$
^[0-9]+\.$
^[0-9.,:-]+[0-9]+$
^[0-9]+[a-zA-Z]{1,3}$
After this, at tag time, if the word is not found in the lexicon
(numerals are added to the lexicon like all other items) hunpos checks
whether the unseen word matches some of the regexps, and uses the
distribution learned for this regexp to guess the tag.
Now these regexpr are hardcoded in special_tokens.ml file. Need some very
fast regexp matching or something like tranducers.
Original issue reported on code.google.com by [email protected]
on 30 Jun 2007 at 11:54
What steps will reproduce the problem?
gyáraikat NOUN<PLUR><POSS<PLUR>><CAS<ACC>>
What is the expected output? What do you see instead?
gyáraikat NOUN<PLUR><POSS<PLUR>><CAS<ACC>> gyár
What version of the product are you using? On what operating system?
2009 április 24, current. linux-debian
Please provide any additional information below.
This is an enhancement request. It would be very useful, if wird stem
(lemma) would be displayed for each word.
Original issue reported on code.google.com by [email protected]
on 25 Apr 2009 at 8:34
What steps will reproduce the problem?
1.
2.
3.
What is the expected output? What do you see instead?
What version of the product are you using? On what operating system?
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 18 Feb 2014 at 10:46
Tomaz Erjavec sent this feature request
"- I think it would be usefull if the model were in fact plain text, as
with TnT. I have often found it useful to be able to edit the model by
hand, say for adding missed out morphological forms to the lexicon. Or,
right now we have the case where we have a tagged corpus with lots of words
tagged as unknown. Id like to train the tagger on this corpus, then remove
the unknown words from the lexcion, and tag the same corpus, substituting
the unknown tag with the one assigned by the tagger. It is not obvious how
I could do this with HunPos"
Original issue reported on code.google.com by [email protected]
on 29 Jun 2007 at 2:03
What steps will reproduce the problem?
Tag a sample corpus
What is the expected output? What do you see instead?
I wrote in for example:
got expected
I PRP
do VBP
work NN <-- VB
. SENT
What version of the product are you using? On what operating system?
1.2.8, linux mint
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 19 Dec 2010 at 8:59
Many thanks in advance :-)
Best regards,
Vlado B, 16:45
Original issue reported on code.google.com by [email protected]
on 4 Aug 2014 at 2:47
What steps will reproduce the problem?
1. download binaries from the hunpos repository
2. run hunpos-tag
3.
What is the expected output? What do you see instead?
Result:
$ ./hunpos-tag
bash: ./hunpos-tag: No such file or directory
What version of the product are you using? On what operating system?
1.0 on Ubuntu 12.04
Please provide any additional information below.
Compiling from the source in repository and running hunpos-tag works.
Original issue reported on code.google.com by [email protected]
on 26 Sep 2012 at 8:25
nice feature to import TnT model files directly for example
[http://www.bultreebank.org/taggers/taggers.html Bulgarian model file]
Original issue reported on code.google.com by [email protected]
on 30 Jun 2007 at 9:59
What steps will reproduce the problem?
jó ADJ
jobb ADJ
legjobb ADJ
legeslegjobb NOUN
What is the expected output? What do you see instead?
jó ADJ
jobb ADJ expected: ADJR (Adjektive, comparative)
legjobb ADJ expected: ADJS (Adjektive, superlative)
legeslegjobb NOUN expected: ADJSS (Adjektive, super-superlative)
What version of the product are you using? On what operating system?
Most current (2009. apr. 24), linux debian
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 25 Apr 2009 at 8:11
I am trying to make a new model with hunpos but it has issues with large
corpora. My corpus is 300 000 words and I get stack overflow problem.
I am using it on linux. Does this POS tagger support larger corpora? Or this
problem is because of my OS and my Computer RAM?
Original issue reported on code.google.com by [email protected]
on 19 Feb 2014 at 9:50
Maybe RB
that WDT
made VBD
Mother NNP
drink NN
, ,
I PRP
do VBP
not RB
know VB
. .
Elvárt:
Maybe RB
that WDT
made VBD
Mother NNP
drink VB <----------------- ez nem jó
, ,
I PRP
do VBP
not RB
know VB
. .
Original issue reported on code.google.com by [email protected]
on 24 Nov 2010 at 10:15
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.