Git Product home page Git Product logo

mias's People

Contributors

davidluptak avatar dependabot[bot] avatar empt-ak avatar martinliska avatar michal-ruzicka avatar witiko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

mias's Issues

Input could not be parsed (probably it is not valid MathML)

Dear All,

I'm trying to index MathML using MIaS, but the parser, always raise an error, following is the content of the test file i want to index. Am i missing something here ?

<math xmlns="http://www.w3.org/1998/Math/MathML">
<mrow>
  <mrow>
    <msup>
      <mi>x</mi>
      <mn>2</mn>
    </msup>
    <mo>+</mo>
    <mrow>
      <mn>4</mn>
      <mo>⁢</mo>
      <mi>x</mi>
    </mrow>
    <mo>+</mo>
    <mn>4</mn>
  </mrow>
  <mo>=</mo>
  <mn>0</mn>
</mrow>
</math>
2017-05-14 17:11:25,795 [main] INFO  cz.muni.fi.mias.indexing.Indexing - adding to index tes2/quadratic_equation.html docId=tes2/quadratic_equation.html#0
2017-05-14 17:11:26,137 [main] WARN  cz.muni.fi.mias.math.MathTokenizer - Input could not be parsed (probably it is not valid MathML)
cz.muni.fi.mir.mathmlcanonicalization.modules.ModuleException: Error while parsing the input file
	at cz.muni.fi.mir.mathmlcanonicalization.modules.ElementMinimizer.execute(ElementMinimizer.java:86) ~[mathml-canonicalizer-1.3.1.jar:?]
	at cz.muni.fi.mir.mathmlcanonicalization.MathMLCanonicalizer.executeStreamModules(MathMLCanonicalizer.java:368) ~[mathml-canonicalizer-1.3.1.jar:?]
	at cz.muni.fi.mir.mathmlcanonicalization.MathMLCanonicalizer.canonicalize(MathMLCanonicalizer.java:330) ~[mathml-canonicalizer-1.3.1.jar:?]
	at cz.muni.fi.mias.math.MathTokenizer.parseMathML(MathTokenizer.java:302) [miasmath-1.6.6-4.10.4-SNAPSHOT.jar:?]
	at cz.muni.fi.mias.math.MathTokenizer.processFormulae(MathTokenizer.java:278) [miasmath-1.6.6-4.10.4-SNAPSHOT.jar:?]
	at cz.muni.fi.mias.math.MathTokenizer.reset(MathTokenizer.java:244) [miasmath-1.6.6-4.10.4-SNAPSHOT.jar:?]
	at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:613) [lucene-core-4.10.4.jar:4.10.4 1662817 - mike - 2015-02-27 16:38:43]
	at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:359) [lucene-core-4.10.4.jar:4.10.4 1662817 - mike - 2015-02-27 16:38:43]
	at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:318) [lucene-core-4.10.4.jar:4.10.4 1662817 - mike - 2015-02-27 16:38:43]
	at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:241) [lucene-core-4.10.4.jar:4.10.4 1662817 - mike - 2015-02-27 16:38:43]
	at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:465) [lucene-core-4.10.4.jar:4.10.4 1662817 - mike - 2015-02-27 16:38:43]
	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1526) [lucene-core-4.10.4.jar:4.10.4 1662817 - mike - 2015-02-27 16:38:43]
	at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1500) [lucene-core-4.10.4.jar:4.10.4 1662817 - mike - 2015-02-27 16:38:43]
	at cz.muni.fi.mias.indexing.Indexing.indexDocsThreaded(Indexing.java:145) [MIaS-1.6.6-4.10.4-SNAPSHOT.jar:?]
	at cz.muni.fi.mias.indexing.Indexing.indexFiles(Indexing.java:89) [MIaS-1.6.6-4.10.4-SNAPSHOT.jar:?]
	at cz.muni.fi.mias.MIaS.main(MIaS.java:34) [MIaS-1.6.6-4.10.4-SNAPSHOT.jar:?]
Caused by: com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in prolog
 at [row,col {unknown-source}]: [1,0]
	at com.ctc.wstx.sr.StreamScanner.throwUnexpectedEOF(StreamScanner.java:686) ~[wstx-asl-3.2.7.jar:3.2.7]
	at com.ctc.wstx.sr.BasicStreamReader.handleEOF(BasicStreamReader.java:2134) ~[wstx-asl-3.2.7.jar:3.2.7]
	at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2040) ~[wstx-asl-3.2.7.jar:3.2.7]
	at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069) ~[wstx-asl-3.2.7.jar:3.2.7]
	at cz.muni.fi.mir.mathmlcanonicalization.modules.ElementMinimizer.minimizeElements(ElementMinimizer.java:133) ~[mathml-canonicalizer-1.3.1.jar:?]
	at cz.muni.fi.mir.mathmlcanonicalization.modules.ElementMinimizer.execute(ElementMinimizer.java:82) ~[mathml-canonicalizer-1.3.1.jar:?]
	... 15 more

Quick Question about cz.muni.fi.mias.search.Searching - Query String;

For the method:

/**
     * Searches the index for query specified by string.
     *
     * @param query String with the query
     * @param print if true, results will be printed to standard output
     * @param offset index of the first retrieved result
     * @param limit number of results to retrieve
     * @param debug if true, results will contain debugging information
     *
     * @return Search result
     */
    public SearchResult search(String query, boolean print, int offset, int limit, boolean debug) {
        return search(query, print, offset, limit, debug, MathTokenizer.MathMLType.BOTH);
}

What does the query string look like?

I am sending it something like this:

<m:math>
          <m:semantics xml:id="m1.1a">
            <m:apply xml:id="m1.1.4" xref="m1.1.4.pmml">
              <m:plus xml:id="m1.1.2" xref="m1.1.2.pmml"/>
              <m:ci xml:id="m1.1.1" xref="m1.1.1.pmml">x</m:ci>
              <mws:qvar xmlns:mws="http://search.mathweb.org/ns" name="y"/>
            </m:apply>
            <m:annotation-xml encoding="MathML-Presentation" xml:id="m1.1b">
              <m:mrow xml:id="m1.1.4.pmml" xref="m1.1.4">
                <m:mi xml:id="m1.1.1.pmml" xref="m1.1.1">x</m:mi>
                <m:mo xml:id="m1.1.2.pmml" xref="m1.1.2">+</m:mo>
                <mws:qvar xmlns:mws="http://search.mathweb.org/ns" name="y"/>
              </m:mrow>
            </m:annotation-xml>
            <m:annotation encoding="application/x-tex" xml:id="m1.1c">x+\qvar@construct{y}</m:annotation>
          </m:semantics>
        </m:math>
Mean
Arithmetic

Am I on the right track? It is giving me an error however:

Exception in thread "main" java.lang.IllegalArgumentException: expected '>' at position 59
	at org.apache.lucene.util.automaton.RegExp.parseSimpleExp(RegExp.java:1128)
	at org.apache.lucene.util.automaton.RegExp.parseCharClassExp(RegExp.java:1091)
	at org.apache.lucene.util.automaton.RegExp.parseComplExp(RegExp.java:1079)
	at org.apache.lucene.util.automaton.RegExp.parseRepeatExp(RegExp.java:1048)
	at org.apache.lucene.util.automaton.RegExp.parseConcatExp(RegExp.java:1041

Trying to Index

I am trying to index and I believe I correctly installed all the different components with maven. When I run the following command: java -jar MIaS-1.6.6-4.10.4-SNAPSHOT-jar-with-dependencies.jar -conf /home/d6fraser/Documents/Research/MIaS/MIaS/conf/mias.properties -add /home/d6fraser/Documents/Research/MIaS/MIaS/data/doc/ /home/d6fraser/Documents/Research/MIaS/MIaS/data/index/

I get the following error:
xception in thread "main" java.util.ServiceConfigurationError: Cannot instantiate SPI class: org.apache.lucene.codecs.appending.AppendingCodec
at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:77)
at org.apache.lucene.util.NamedSPILoader.(NamedSPILoader.java:47)
at org.apache.lucene.util.NamedSPILoader.(NamedSPILoader.java:37)
at org.apache.lucene.codecs.Codec.(Codec.java:41)
at org.apache.lucene.index.LiveIndexWriterConfig.(LiveIndexWriterConfig.java:125)
at org.apache.lucene.index.IndexWriterConfig.(IndexWriterConfig.java:171)
at cz.muni.fi.mias.indexing.Indexing.indexFiles(Indexing.java:78)
at cz.muni.fi.mias.MIaS.main(MIaS.java:34)
Caused by: java.lang.IllegalArgumentException: An SPI class of type org.apache.lucene.codecs.PostingsFormat with name 'Lucene40' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath. The current classpath supports the following names: [OrdsLucene41, BloomFilter, Direct, FSTOrd41, FSTOrdPulsing41, FST41, FSTPulsing41, Memory, Pulsing41, SimpleText]
at org.apache.lucene.util.NamedSPILoader.lookup(NamedSPILoader.java:109)
at org.apache.lucene.codecs.PostingsFormat.forName(PostingsFormat.java:100)
at org.apache.lucene.codecs.lucene40.Lucene40Codec.(Lucene40Codec.java:117)
at org.apache.lucene.codecs.appending.AppendingCodec.(AppendingCodec.java:37)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at org.apache.lucene.util.NamedSPILoader.reload(NamedSPILoader.java:67)
... 7 more

Just not exactly sure why it is not finding the correct codec. Should I be running a different command?

how to fix it docker run -v "$PWD"/dataset:/dataset:ro -v "$PWD"/index:/index:rw --rm miratmu/mias ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...

docker run -v "$PWD"/dataset:/dataset:ro -v "$PWD"/index:/index:rw --rm miratmu/mias
ERROR StatusLogger Log4j2 could not find a logging implementation. Please add log4j-core to the classpath. Using SimpleLogger to log to the console...

how to fix it

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.