xiaoling / figer Goto Github PK
View Code? Open in Web Editor NEWFine-Grained Entity Recognizer
Home Page: http://xiaoling.github.io/figer/
License: Other
Fine-Grained Entity Recognizer
Home Page: http://xiaoling.github.io/figer/
License: Other
After I ran the command ./run.sh "aaai/exp.conf" &> aaai/exp.log
on my Mac High Sierra, I got the following response as recorded in the log. Any idea how to fix it?
[0m[�[0minfo�[0m] �[0mLoading project definition from /Users/chsuong/figer/project�[0m java.lang.NullPointerException at java.base/java.util.regex.Matcher.getTextLength(Matcher.java:1770) at java.base/java.util.regex.Matcher.reset(Matcher.java:416) at java.base/java.util.regex.Matcher.<init>(Matcher.java:253) at java.base/java.util.regex.Pattern.matcher(Pattern.java:1133) at java.base/java.util.regex.Pattern.split(Pattern.java:1261) at java.base/java.util.regex.Pattern.split(Pattern.java:1334) at sbt.IO$.pathSplit(IO.scala:723) at sbt.IO$.parseClasspath(IO.scala:821) at sbt.compiler.CompilerArguments.extClasspath(CompilerArguments.scala:64) at sbt.compiler.AggressiveCompile.withBootclasspath(AggressiveCompile.scala:50) at sbt.compiler.AggressiveCompile.compile2(AggressiveCompile.scala:83) at sbt.compiler.AggressiveCompile.compile1(AggressiveCompile.scala:70) at sbt.compiler.AggressiveCompile.apply(AggressiveCompile.scala:45) at sbt.Compiler$.apply(Compiler.scala:70) at sbt.Defaults$.sbt$Defaults$$compileTaskImpl(Defaults.scala:722) at sbt.Defaults$$anonfun$compileTask$1.apply(Defaults.scala:716) at sbt.Defaults$$anonfun$compileTask$1.apply(Defaults.scala:716) at scala.Function1$$anonfun$compose$1.apply(Function1.scala:47) at sbt.$tilde$greater$$anonfun$$u2219$1.apply(TypeFunctions.scala:42) at sbt.std.Transform$$anon$4.work(System.scala:64) at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:237) at sbt.Execute$$anonfun$submit$1$$anonfun$apply$1.apply(Execute.scala:237) at sbt.ErrorHandling$.wideConvert(ErrorHandling.scala:18) at sbt.Execute.work(Execute.scala:244) at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:237) at sbt.Execute$$anonfun$submit$1.apply(Execute.scala:237) at sbt.ConcurrentRestrictions$$anon$4$$anonfun$1.apply(ConcurrentRestrictions.scala:160) at sbt.CompletionService$$anon$2.call(CompletionService.scala:30) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834)
I can see Tame Impala in the training data, tagged:
Tame ImpalaB^M/music/artistB^T/music/musical_groupB^]/internet/social_network_userB^M/common/topicH
Also in types.map 2 of the tags are mapped:
/music/artist /person/artist
/music/musical_group /person/musician
But Tame Impala is not tagged by the model. Is there a reason why?
Is 'aaai/exp.txt file' the test data?
Thanks.
build.sbt:28: error: not found: value jetty
jetty()
^
[error] Type error in expression
Project loading failed: (r)etry, (q)uit, (l)ast, or (i)gnore? i
[warn] Ignoring load failure: no project loaded.
[error] Expected ID character
[error] Not a valid command: assembly
[error] assembly.sbt
[error] ^
When I specify absolute paths for testFile
and outputFile
in the config file, each sentence with predicted mentions triggers a cryptic sid not found
message and a NullPointerException, and is omitted from the output.
The workaround is to use a relative path. But a more informative error message, at least, would be helpful.
I want to build training data using protobuf. I explored entity.proto and reading the existing train.data file. I copied the single tagged sentence as it is and wrote a writer using protobuf and created train.data file. After that, I created model using that train.data file but I am unable to open. It gives Not a GZIP format.
I also want to know that how you create train.data using wikipedia data? Do you that code inside your git repository and where is it?
I am attaching writer file and figer.conf file both . PFA for that and remind me, if I am doing anything wrong.
To support attached, I have renamed the files.
Renamed figer.conf to figer.txt -> figer.txt
Renamed WriteEntityFile.java to WriteEntityFile.txt -> WriteEntityFile.txt
Hi Xiao
I think for getting the types of Wikientities in the train.data, you had mapped them first to Freebase!
I wounder if you can provide the mapping file that you've used for that, because I need to get back the Freebase MIDs of entities! (I have a mapping file myself but a lot of entities are not covered there)
I would appreciate if you help me.
Thanks,
Yadollah
Thanks for the useful tool! When running on new data, is there a way to write the predictions in BIO format to a file?
Download link
With train.tar.gz data I created model which I am unable to load due to this error.
java.util.zip.ZipException: Not in GZIP format
at java.util.zip.GZIPInputStream.readHeader(GZIPInputStream.java:164)
at java.util.zip.GZIPInputStream.(GZIPInputStream.java:78)
at java.util.zip.GZIPInputStream.(GZIPInputStream.java:90)
at edu.washington.cs.figer.util.Serializer.deserialize(Serializer.java:40)
at edu.washington.cs.figer.ml.LogisticRegression.readModel(LogisticRegression.java:57)
at edu.washington.cs.figer.FigerSystem.(FigerSystem.java:87)
at edu.washington.cs.figer.FigerSystem.instance(FigerSystem.java:64)
at edu.washington.cs.figer.web.WebDemoServlet.doGet(WebDemoServlet.java:56)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:751)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:566)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:578)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1111)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:498)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:183)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1045)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:199)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:98)
at org.eclipse.jetty.server.Server.handle(Server.java:461)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:284)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)
at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:534)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:607)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:536)
at java.lang.Thread.run(Thread.java:745)
2015-11-17 15:24:40.642:WARN:oejs.ServletHandler:qtp200539426-16: /figer
java.lang.NullPointerException
at edu.washington.cs.figer.ml.LogisticRegression.readModel(LogisticRegression.java:58)
at edu.washington.cs.figer.FigerSystem.(FigerSystem.java:87)
at edu.washington.cs.figer.FigerSystem.instance(FigerSystem.java:64)
at edu.washington.cs.figer.web.WebDemoServlet.doGet(WebDemoServlet.java:56)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:687)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:751)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:566)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:578)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1111)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:498)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:183)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1045)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:199)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:98)
at org.eclipse.jetty.server.Server.handle(Server.java:461)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:284)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)
at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:534)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:607)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:536)
at java.lang.Thread.run(Thread.java:745)
Greetings, Mr. Xiao
Could you please provide a link to download the WEX 20110513 Wiki dump from where you extracted your sentences?
Regards,
Luís
Hi xiaoling,
I want to re-train Figer with the addition of some data to existing training data but training data available in git hub is serialized using google protobuf. Can you please guide me to modify training data, so that I can add some training data and corresponding label in types.map of it?
As i converted the train.tar.gz to text data, i've found that the types are a bit noisy, e.g.,
"LCD Soundsystem",
"/internet/social_network_user /broadcast/artist /music/artist /music/musical_group /common/topic".
There are some types not in the typeList in the research paper.
So, i wonder if there is a clear/specific mapping file, which maps entities exactly to some of the 112 types??
Anyway, i want to using the famous FIGER dataset in my experiments, so i have to pay attention to the details.
Thanks a lot!
I want to download Traning data used in the FIGER system for another NER task to see if this data can improve the accuracy. The traning data mentioned has Proto Buffer format which can be accessed using Java class.
I need the data in format like OntoNotes train/test data where tokens are labeled individually. I can take care of BIO format. Can you please provide the link to download the data. This will help.
We _ _ O
respectfully _ _ O
invite _ _ O
you _ _ O
to _ _ O
watch _ _ O
a _ _ O
special _ _ O
edition _ _ O
of _ _ O
Across _ _ B-ORGANIZATION
China _ _ I-ORGANIZATION
Thank you.
Is the file exp.label containing the 434 sentences of the golden set used for the evaluation in the research paper?
What is the file exp.out containing? The predictions of the FIGER model? If yes, why the sentences are a lot less than 434?
Hi,
I generated a jar of figer using this command from the readme:
sbt assembly
I now have 2 jar files ("figer_2.10-0.jar" and "figer-assembly-0.jar"), and am trying to use them to generate NE labels for a corpus of sentences.
Specifically, I've tried these commands:
java -jar figer_2.10-0.jar edu.washington.cs.figer.FigerSystem sentences.txt
java -jar figer-assembly-0.jar edu.washington.cs.figer.FigerSystem sentences.txt
In both cases, I get this error:
no main manifest attribute
My questions are:
Note that I'm able to execute sbt "runMain edu.washington.cs.figer.FigerSystem sentences.txt"
successfully, so the issue is not the source code.
Thanks in advance for your help!
Hi,
I am able to train your system on new training data and able to run it on new test data. But in test output some confidence scores are negative. How to interpret it? Should it be considered as valid classification with negative score or it should be ignored as 'O'?
Regards
Tapas
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.