bjut-hz / mate-tools Goto Github PK
View Code? Open in Web Editor NEWAutomatically exported from code.google.com/p/mate-tools
Automatically exported from code.google.com/p/mate-tools
What steps will reproduce the problem?
1. svn checkout of the mate-tools package
2. try to build with ANT
3.
Please provide any additional information below.
In the build.xml file the classpath is to ./libs while the library directory in
the project tree structure is named ./lib
Original issue reported on code.google.com by [email protected]
on 18 Apr 2013 at 1:43
What steps will reproduce the problem?
1. Execute sh scripts/parse_full.sh
What is the expected output? What do you see instead?
Output gives
Exception in thread "main" java.lang.NoClassDefFoundError: is2/util/OptionsSuper
at se.lth.cs.srl.languages.Language.getLemmatizer(Language.java:99)
at se.lth.cs.srl.languages.Language.getPreprocessor(Language.java:72)
at se.lth.cs.srl.CompletePipeline.getCompletePipeline(CompletePipeline.java:37)
at se.lth.cs.srl.CompletePipeline.main(CompletePipeline.java:93)
What version of the product are you using? On what operating system?
SRL pipeline with all required models.
srl-20130917
Please provide any additional information below.
Changing
classpath variable from
CP="srl.jar:lib/anna.jar:lib/liblinear-1.51-with-deps.jar:lib/opennlp-tools-1.4.
3.jar:lib/maxent-2.5.2.jar:lib/trove.jar:lib/seg.jar"
to
CP="srl.jar:lib/anna-3.3.jar:lib/liblinear-1.51-with-deps.jar:lib/opennlp-tools-
1.4.3.jar:lib/maxent-2.5.2.jar:lib/trove.jar:lib/seg.jar"
with the required models (cf. next command)
java -cp
srl.jar:lib/anna-3.3.jar:lib/liblinear-1.51-with-deps.jar:lib/opennlp-tools-1.4.
3.jar:lib/maxent-2.5.2.jar:lib/trove.jar:lib/seg.jar -Xmx3g
se.lth.cs.srl.CompletePipeline eng -tagger
models/CoNLL2009-ST-English-ALL.anna-3.3.postagger.model -parser
models/CoNLL2009-ST-English-ALL.anna-3.3.parser.model -srl
models/CoNLL2009-ST-English-ALL.anna-3.3.srl-4.1.srl.model -lemma
models/CoNLL2009-ST-English-ALL.anna-3.3.lemmatizer.model -test input.txt -out
output.txt
solves the problem.
Original issue reported on code.google.com by nikoschenk
on 8 Oct 2014 at 9:53
Is SRL component can run on windows, by api or command line?
Original issue reported on code.google.com by [email protected]
on 27 Oct 2013 at 4:29
We build an implementation of the mate tools version 3.5 with input injection
of the CollReader format (one token per line, separated by a line break per
sentence).
The Parser however shows some strange behaviour, where it deletes the first
token of each sentence and starts with the second. This is a relatively new
issue and might has something to do with the input format/encoding. The
Lemmatizer and POS-tagger however work fine. All the data is encoded in UTF-8.
Example output (Der Buchstabe A hat eine durchschnittliche Häufigkeit von
6.51%.):
-------- TOKEN FORMS @AFTER PARSE
2 Buchstabe _ buchstabe _ NN _
case=nom|number=sg|gender=masc -1 3 _ SB _ _
3 A _ -- _ NE _
case=nom|number=sg|gender=* -1 1 _ NK _ _
4 hat _ haben _ VAFIN _
number=sg|person=3|tense=pres|mood=ind -1 0 _ -- _
_
5 in _ in _ APPR _ _ -1 3
_ MO _ _
6 deutschen _ deutsch _ ADJA _
case=dat|number=pl|gender=fem|degree=pos -1 6 _ NK
_ _
7 Texten _ text _ NN _
case=dat|number=pl|gender=fem -1 4 _ NK _ _
8 eine _ ein _ ART _
case=acc|number=sg|gender=fem -1 9 _ NK _ _
9 durchschnittliche _ durchschnittlich _ ADJA
_ case=acc|number=sg|gender=fem|degree=pos -1 9 _
NK _ _
10 Häufigkeit _ häufigkeit _ NN _
case=acc|number=sg|gender=fem -1 3 _ OA _ _
11 von _ von _ APPR _ _ -1 9
_ MNR _ _
12 6,51 _ 6,51 _ CARD _ _ -1 12
_ NK _ _
13 % _ % _ NN _
case=*|number=*|gender=neut -1 10 _ NK _ _
14 . _ -- _ $. _ _ -1 12
_ -- _ _
Thanks
Original issue reported on code.google.com by [email protected]
on 31 Oct 2013 at 11:15
What steps will reproduce the problem?
1. take a sentence, e.g. "Quick brown fox jumps over the lazy dog ."
2. split, lemmatize, tag and parse it
3. have a look at the intermediate results
Here's what I get:
1 Quick _ _ _ _ _ _ _ _ _ _ _ _ _
2 brown _ _ _ _ _ _ _ _ _ _ _ _ _
3 fox _ _ _ _ _ _ _ _ _ _ _ _ _
4 jumps _ _ _ _ _ _ _ _ _ _ _ _ _
5 over _ _ _ _ _ _ _ _ _ _ _ _ _
6 the _ _ _ _ _ _ _ _ _ _ _ _ _
7 lazy _ _ _ _ _ _ _ _ _ _ _ _ _
8 dog _ _ _ _ _ _ _ _ _ _ _ _ _
9 . _ _ _ _ _ _ _ _ _ _ _ _ _
1 Quick _ quick _ _ _ _ -1 _ _ _ _ _
2 brown _ brown _ _ _ _ -1 _ _ _ _ _
3 fox _ fox _ _ _ _ -1 _ _ _ _ _
4 jumps _ jump _ _ _ _ -1 _ _ _ _ _
5 over _ over _ _ _ _ -1 _ _ _ _ _
6 the _ the _ _ _ _ -1 _ _ _ _ _
7 lazy _ lazy _ _ _ _ -1 _ _ _ _ _
8 dog _ dog _ _ _ _ -1 _ _ _ _ _
9 . _ . _ _ _ _ -1 _ _ _ _ _
1 Quick quick _ _ JJ _ _ -1 _ _ _ _ _
2 brown brown _ _ JJ _ _ -1 _ _ _ _ _
3 fox fox _ _ NN _ _ -1 _ _ _ _ _
4 jumps jump _ _ VBZ _ _ -1 _ _ _ _ _
5 over over _ _ IN _ _ -1 _ _ _ _ _
6 the the _ _ DT _ _ -1 _ _ _ _ _
7 lazy lazy _ _ JJ _ _ -1 _ _ _ _ _
8 dog dog _ _ NN _ _ -1 _ _ _ _ _
9 . . _ _ . _ _ -1 _ _ _ _ _
1 Quick _ quick _ JJ _ _ 3 3 NMOD NMOD _ _
2 brown _ brown _ JJ _ _ 3 3 NMOD NMOD _ _
3 fox _ fox _ NN _ _ 4 4 SBJ SBJ _ _
4 jumps _ jump _ VBZ _ _ 0 0 ROOT ROOT _ _
5 over _ over _ IN _ _ 4 4 ADV ADV _ _
6 the _ the _ DT _ _ 8 8 NMOD NMOD _ _
7 lazy _ lazy _ JJ _ _ 8 8 NMOD NMOD _ _
8 dog _ dog _ NN _ _ 5 5 PMOD PMOD _ _
9 . _ . _ . _ _ 4 4 P P _ _
Note that the value for PLEMMA column produced by the lemmatizer became LEMMA
value after the tagging. I believe this is not supposed to happen.
Morphological tagger and dependency parser also swap the predicted and
gold-standard lemma, so if one skips the morphological tagging step, the two
swaps cancel out and the end result is fine, otherwise the role labeler reads
the lemma value from the third column and we end up with "_" in place of the
lemma.
Original issue reported on code.google.com by [email protected]
on 18 Jul 2011 at 3:14
It says:
java -cp srl.jar:lib/liblinear-1.51-with-deps.jarse.lth.cs.srl.Parse [...]
I believe that it should be
java -cp srl.jar:lib/liblinear-1.51-with-deps.jar se.lth.cs.srl.Parse [...] (a white space after jar)
Original issue reported on code.google.com by [email protected]
on 9 Sep 2014 at 1:22
Building a wrapper for the MATE pipeline for processing several documents and
without loading each model for every document, one needs to call the out()
methods of the respective tools (after initialising each tool once with a
model).
However, the following methods are not set to public and thus disallow direct
use of the is2.parser.Parser:
- is2.parser.Parser.out()
- is2.parser.Pipe.nextInstance()
Also, using the morph tagger at is2.mtag.Tagger is not possible because of the
non-public access to these fields:
- is2.mtag.Tagger.pipe
- is2.mtag.Tagger.params
I do not believe this is intended as the respective fields/methods are public
in the other processor classes, e.g. the lemmatizer or POS tagger.
The problem can easily be solved by setting these fields/methods to public.
For more details and links to the respective source code, also check
http://korap.ids-mannheim.de/2013/07/issues-with-mate-pipeline/
Original issue reported on code.google.com by [email protected]
on 16 Jul 2013 at 12:56
Hi all,
Does anyone know where or how to get one-sentence-per-line corpus?
I need a dependence-parser, so I want to use this tool. but the input is
one-sentence-per-line corpus.
Please help me.
Thanks.
Kopro
Original issue reported on code.google.com by [email protected]
on 3 Oct 2014 at 7:00
What steps will reproduce the problem?
1. Load a German pipeline with the following resources:
prs-ger-cs_1.model";
tagger-ct.model";
lemmatizer.model";
2. Load an English SEPARATE(!) pipeline with the following resources:
prs-eng.model";
tag-eng.model";
lemma-eng.model";
3. Now parse a German sentence with the parser from 1. and inspect the output
(everything is fine):
1 - Karin - Karin - SB - 2 - fliegt - NE -
2 - fliegt - fliegen - ROOT - 0 - ROOTnode - VVFIN -
3 - nach - nach - MO - 2 - fliegt - APPR -
4 - New - New - PNC - 5 - York - NE -
5 - York - York - NK - 3 - nach - NE -
6 - . - _ - PUNC - 2 - fliegt - $. -
4. Now parse an English sentence with the parser from 2. and inspect the output
(everything is fine):
1 - This - this - SBJ - 2 - is - DT -
2 - is - be - ROOT - 0 - ROOTnode - VBZ -
3 - nice - nice - PRD - 2 - is - JJ -
4 - and - and - COORD - 3 - nice - CC -
5 - pretty - pretty - CONJ - 4 - and - RB -
6 - . - . - P - 2 - is - . -
5. NEW: Again use the parser from 1. and parse the German sentence (OUTPUT
CONTAINS ERRORS NOW AND LOOKS STRANGE!!!):
1 - Karin - Ka - ROOT - 0 - ROOTnode - NNP -
2 - fliegt - flieg - P - 1 - Karin - POS -
3 - nach - nach - MNR - 1 - Karin - NNP -
4 - New - New - APPO - 1 - Karin - NNP -
5 - York - York - APPO - 1 - Karin - NNP -
6 - . - . - P - 1 - Karin - POS -
Any suggestions? Any help is appreciated.
Original issue reported on code.google.com by nikoschenk
on 6 Mar 2012 at 1:54
What steps will reproduce the problem?
1. Run two instances of the mate parser in the same JVM
What do you see instead?
Running two instances of the mate parser in the same JVM leads to following
exception
java.lang.ArrayIndexOutOfBoundsException: 45
2013-11-14 14:37:49 STDIO [ERROR] at
is2.parser.ParallelDecoder.call(ParallelDecoder.java:74)
2013-11-14 14:37:49 STDIO [ERROR] at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
2013-11-14 14:37:49 STDIO [ERROR] at
java.util.concurrent.FutureTask.run(FutureTask.java:138)
2013-11-14 14:37:49 STDIO [ERROR] at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:8
86)
2013-11-14 14:37:49 STDIO [ERROR] at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
2013-11-14 14:37:49 STDIO [ERROR] at java.lang.Thread.run(Thread.java:662)
What version of the product are you using? On what operating system?
mate-tools 3.5 checked out from http://mate-tools.googlecode.com/svn/trunk/
Please provide any additional information below.
I noticed this problem while trying to run the mate-tools within a storm
topology (see http://storm-project.net/). In Storm, you can parallelize an
operation unit called bolt; in my case the mate parser was the bolt that I
wanted to parallelize . The storm manager then deployed two instances of the
parser on the same JVM and this lead to the exception described above.
Regards,
Abou Drame
Original issue reported on code.google.com by [email protected]
on 14 Nov 2013 at 2:40
What steps will reproduce the problem?
1. Trying SRL 4.3
2. Exception in thread "main" java.lang.NullPointerException: entry
at java.util.zip.ZipFile.getInputStream(ZipFile.java:342)
at se.lth.cs.srl.pipeline.Reranker.<init>(Reranker.java:79)
What is the expected output? What do you see instead?
I did try the SRL model given above, but it seems like the file named "global"
is missing from the zipped model?
What version of the product are you using? On what operating system?
I'm trying SRL 4.3 on Debian 7.0
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 11 Dec 2013 at 1:59
What steps will reproduce the problem?
1. svn checkout
2. ant compile
What is the expected output? What do you see instead?
I expect it to work; it complains about missing gnu trove.
"ln -s lib libs" does the job
What version of the product are you using? On what operating system?
The checked out source code, revision 168.
Original issue reported on code.google.com by [email protected]
on 10 Jul 2012 at 10:18
What steps will reproduce the problem?
1. Input (see attached file):
1 Michael Michael Michael NE NE _ _ _ _ _ _ _ _ _
2 war sein sein VAFIN VAFIN _ _ _ _ _ _ _ _ _
3 ein eine eine ART ART _ _ _ _ _ _ _ _ _
4 guter gut gut ADJA ADJA _ _ _ _ _ _ _ _ _
5 Junge Junge Junge NN NN _ _ _ _ _ _ _ _ _
6 . . . $. $. _ _ _ _ _ _ _ _ _
2. Command for testing:
java -Xmx2G -cp anna-3.3.jar is2.mtag.Tagger -model
tiger-complete.anna-3-1.morphtagger.model -test /dev/stdin -out /dev/stdout
3. Current output (see attached file):
45.20.675 is2.data.ParametersFloat 121:read -> read parameters
134217727 not zero 4044229
45.20.677 is2.data.Cluster 113:<init> -> Read cluster with 0
words
45.20.678 is2.mtag.Tagger 148:readModel -> Loading data finished.
45.20.679 is2.mtag.Tagger 150:readModel -> number of parameter
134217727
45.20.679 is2.mtag.Tagger 151:readModel -> number of classes 268
Processing Sentence:
1 Michael Michael Michael NE NE _ case=nom|number=sg|gender=masc -1 -1 _ _ _ _
2 war sein sein VAFIN VAFIN _ number=sg|person=3|tense=past|mood=ind -1 -1 _ _ _
_
3 ein eine eine ART ART _ case=nom|number=sg|gender=masc -1 -1 _ _ _ _
4 guter gut gut ADJA ADJA _ case=nom|number=sg|gender=masc|degree=pos -1 -1 _ _
_ _
5 Junge Junge Junge NN NN _ case=nom|number=sg|gender=masc -1 -1 _ _ _ _
6 . . . $. $. _ _ -1 -1 _ _ _ _
2 0.0095 seconds/sentnece
Used time 0.019 seconds
What is the expected output? What do you see instead?
In the latest change of DB.java, the debug variable was switched on by default.
But since all debug info gets printed to the same stream as processed strings,
this output can't later be fed into parser via a pipe. Would it be possible to
switch the debug off, or, even better, to print the debug info to System.err,
so that it could be separated from the rest in cases when -out is set to
/dev/stdout. It's of course possible to fiddle with file descriptors, but
nevertheless sending debug to System.err would probably be nicer.
What version of the product are you using? On what operating system?
anna-3.3.jar
tiger-complete.anna-3-1.morphtagger.model
OS: Linux 3.4.47-2.38-desktop x86_64
Please provide any additional information below.
Original issue reported on code.google.com by [email protected]
on 4 Sep 2013 at 7:40
Attachments:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.