mir-mu / mias Goto Github PK
View Code? Open in Web Editor NEWIndexes text with math in Lucene/Solr-based full-text search engines
Home Page: https://mir.fi.muni.cz/mias/
License: Apache License 2.0
Indexes text with math in Lucene/Solr-based full-text search engines
Home Page: https://mir.fi.muni.cz/mias/
License: Apache License 2.0
I am trying to install mias using following command
$mvn install
but it is giving me the following error
[ERROR] Failed to execute goal on project MIaS: Could not resolve dependencies for project cz.muni.fi.mias:MIaS:jar:1.6.4-4.10.4-SNAPSHOT: Could not find artifact cz.muni.fi.mias:miasmath:jar:1.6.4-4.10.4-SNAPSHOT
please help me.
Dear All,
I'm trying to index MathML using MIaS, but the parser, always raise an error, following is the content of the test file i want to index. Am i missing something here ?
<math xmlns="http://www.w3.org/1998/Math/MathML">
<mrow>
<mrow>
<msup>
<mi>x</mi>
<mn>2</mn>
</msup>
<mo>+</mo>
<mrow>
<mn>4</mn>
<mo></mo>
<mi>x</mi>
</mrow>
<mo>+</mo>
<mn>4</mn>
</mrow>
<mo>=</mo>
<mn>0</mn>
</mrow>
</math>
2017-05-14 17:11:25,795 [main] INFO cz.muni.fi.mias.indexing.Indexing - adding to index tes2/quadratic_equation.html docId=tes2/quadratic_equation.html#0
2017-05-14 17:11:26,137 [main] WARN cz.muni.fi.mias.math.MathTokenizer - Input could not be parsed (probably it is not valid MathML)
cz.muni.fi.mir.mathmlcanonicalization.modules.ModuleException: Error while parsing the input file
at cz.muni.fi.mir.mathmlcanonicalization.modules.ElementMinimizer.execute(ElementMinimizer.java:86) ~[mathml-canonicalizer-1.3.1.jar:?]
at cz.muni.fi.mir.mathmlcanonicalization.MathMLCanonicalizer.executeStreamModules(MathMLCanonicalizer.java:368) ~[mathml-canonicalizer-1.3.1.jar:?]
at cz.muni.fi.mir.mathmlcanonicalization.MathMLCanonicalizer.canonicalize(MathMLCanonicalizer.java:330) ~[mathml-canonicalizer-1.3.1.jar:?]
at cz.muni.fi.mias.math.MathTokenizer.parseMathML(MathTokenizer.java:302) [miasmath-1.6.6-4.10.4-SNAPSHOT.jar:?]
at cz.muni.fi.mias.math.MathTokenizer.processFormulae(MathTokenizer.java:278) [miasmath-1.6.6-4.10.4-SNAPSHOT.jar:?]
at cz.muni.fi.mias.math.MathTokenizer.reset(MathTokenizer.java:244) [miasmath-1.6.6-4.10.4-SNAPSHOT.jar:?]
at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:613) [lucene-core-4.10.4.jar:4.10.4 1662817 - mike - 2015-02-27 16:38:43]
at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:359) [lucene-core-4.10.4.jar:4.10.4 1662817 - mike - 2015-02-27 16:38:43]
at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:318) [lucene-core-4.10.4.jar:4.10.4 1662817 - mike - 2015-02-27 16:38:43]
at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:241) [lucene-core-4.10.4.jar:4.10.4 1662817 - mike - 2015-02-27 16:38:43]
at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:465) [lucene-core-4.10.4.jar:4.10.4 1662817 - mike - 2015-02-27 16:38:43]
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1526) [lucene-core-4.10.4.jar:4.10.4 1662817 - mike - 2015-02-27 16:38:43]
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1500) [lucene-core-4.10.4.jar:4.10.4 1662817 - mike - 2015-02-27 16:38:43]
at cz.muni.fi.mias.indexing.Indexing.indexDocsThreaded(Indexing.java:145) [MIaS-1.6.6-4.10.4-SNAPSHOT.jar:?]
at cz.muni.fi.mias.indexing.Indexing.indexFiles(Indexing.java:89) [MIaS-1.6.6-4.10.4-SNAPSHOT.jar:?]
at cz.muni.fi.mias.MIaS.main(MIaS.java:34) [MIaS-1.6.6-4.10.4-SNAPSHOT.jar:?]
Caused by: com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in prolog
at [row,col {unknown-source}]: [1,0]
at com.ctc.wstx.sr.StreamScanner.throwUnexpectedEOF(StreamScanner.java:686) ~[wstx-asl-3.2.7.jar:3.2.7]
at com.ctc.wstx.sr.BasicStreamReader.handleEOF(BasicStreamReader.java:2134) ~[wstx-asl-3.2.7.jar:3.2.7]
at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2040) ~[wstx-asl-3.2.7.jar:3.2.7]
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069) ~[wstx-asl-3.2.7.jar:3.2.7]
at cz.muni.fi.mir.mathmlcanonicalization.modules.ElementMinimizer.minimizeElements(ElementMinimizer.java:133) ~[mathml-canonicalizer-1.3.1.jar:?]
at cz.muni.fi.mir.mathmlcanonicalization.modules.ElementMinimizer.execute(ElementMinimizer.java:82) ~[mathml-canonicalizer-1.3.1.jar:?]
... 15 more
For the method:
/**
* Searches the index for query specified by string.
*
* @param query String with the query
* @param print if true, results will be printed to standard output
* @param offset index of the first retrieved result
* @param limit number of results to retrieve
* @param debug if true, results will contain debugging information
*
* @return Search result
*/
public SearchResult search(String query, boolean print, int offset, int limit, boolean debug) {
return search(query, print, offset, limit, debug, MathTokenizer.MathMLType.BOTH);
}
What does the query string look like?
I am sending it something like this:
<m:math>
<m:semantics xml:id="m1.1a">
<m:apply xml:id="m1.1.4" xref="m1.1.4.pmml">
<m:plus xml:id="m1.1.2" xref="m1.1.2.pmml"/>
<m:ci xml:id="m1.1.1" xref="m1.1.1.pmml">x</m:ci>
<mws:qvar xmlns:mws="http://search.mathweb.org/ns" name="y"/>
</m:apply>
<m:annotation-xml encoding="MathML-Presentation" xml:id="m1.1b">
<m:mrow xml:id="m1.1.4.pmml" xref="m1.1.4">
<m:mi xml:id="m1.1.1.pmml" xref="m1.1.1">x</m:mi>
<m:mo xml:id="m1.1.2.pmml" xref="m1.1.2">+</m:mo>
<mws:qvar xmlns:mws="http://search.mathweb.org/ns" name="y"/>
</m:mrow>
</m:annotation-xml>
<m:annotation encoding="application/x-tex" xml:id="m1.1c">x+\qvar@construct{y}</m:annotation>
</m:semantics>
</m:math>
Mean
Arithmetic
Am I on the right track? It is giving me an error however:
Exception in thread "main" java.lang.IllegalArgumentException: expected '>' at position 59
at org.apache.lucene.util.automaton.RegExp.parseSimpleExp(RegExp.java:1128)
at org.apache.lucene.util.automaton.RegExp.parseCharClassExp(RegExp.java:1091)
at org.apache.lucene.util.automaton.RegExp.parseComplExp(RegExp.java:1079)
at org.apache.lucene.util.automaton.RegExp.parseRepeatExp(RegExp.java:1048)
at org.apache.lucene.util.automaton.RegExp.parseConcatExp(RegExp.java:1041
How to compute the index size of your mias?
[email protected]
I am trying to index and I believe I correctly installed all the different components with maven. When I run the following command: java -jar MIaS-1.6.6-4.10.4-SNAPSHOT-jar-with-dependencies.jar -conf /home/d6fraser/Documents/Research/MIaS/MIaS/conf/mias.properties -add /home/d6fraser/Documents/Research/MIaS/MIaS/data/doc/ /home/d6fraser/Documents/Research/MIaS/MIaS/data/index/
I get the following error:
xception in thread "main" java.util.ServiceConfigurationError: Cannot instantiate SPI class: org.apache.lucene.codecs.appending.AppendingCodec
at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:77)
at org.apache.lucene.util.NamedSPILoader.(NamedSPILoader.java:47)
at org.apache.lucene.util.NamedSPILoader.(NamedSPILoader.java:37)
at org.apache.lucene.codecs.Codec.(Codec.java:41)
at org.apache.lucene.index.LiveIndexWriterConfig.(LiveIndexWriterConfig.java:125)
at org.apache.lucene.index.IndexWriterConfig.(IndexWriterConfig.java:171)
at cz.muni.fi.mias.indexing.Indexing.indexFiles(Indexing.java:78)
at cz.muni.fi.mias.MIaS.main(MIaS.java:34)
Caused by: java.lang.IllegalArgumentException: An SPI class of type org.apache.lucene.codecs.PostingsFormat with name 'Lucene40' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath. The current classpath supports the following names: [OrdsLucene41, BloomFilter, Direct, FSTOrd41, FSTOrdPulsing41, FST41, FSTPulsing41, Memory, Pulsing41, SimpleText]
at org.apache.lucene.util.NamedSPILoader.lookup(NamedSPILoader.java:109)
at org.apache.lucene.codecs.PostingsFormat.forName(PostingsFormat.java:100)
at org.apache.lucene.codecs.lucene40.Lucene40Codec.(Lucene40Codec.java:117)
at org.apache.lucene.codecs.appending.AppendingCodec.(AppendingCodec.java:37)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:67)
... 7 more
Just not exactly sure why it is not finding the correct codec. Should I be running a different command?
docker run -v "$PWD"/dataset:/dataset:ro -v "$PWD"/index:/index:rw --rm miratmu/mias
ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...
how to fix it
I am trying to install mias using following command
$ mvn install
but it is giving me the following error
[ERROR] Failed to execute goal on project MIaS: Could not resolve dependencies for project cz.muni.fi.mias:MIaS:jar:1.6.4-4.10.4-SNAPSHOT: Could not find artifact cz.muni.fi.mias:miasmath:jar:1.6.4-4.10.4-SNAPSHOT
please help
[email protected]
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.