Git Product home page Git Product logo

miasmath's Introduction

MIaSMath – Math processing for Lucene / Solr

ci

MIaSMath is a math processing plugin for Lucene or Solr.

Usage

To integrate MIaSMath including MathTokenizer into a Solr instance:

  1. Copy the following libraries to the solr/lib directory:
  1. Configure the following attributes in schema.xml for the tokenizer MathTokenizer:
  • subformulaetrue for analyzer type index, and false for analyzer type query, as follows:

    <fieldType name="math" class="solr.TextField">
      <analyzer type="index">
        <tokenizer class="cz.muni.fi.mias.MathTokenizerFactory" subformulae="true"/> 
      </analyzer>
      <analyzer type="query">
        <tokenizer class="cz.muni.fi.mias.MathTokenizerFactory" subformulae="false"/> 
      </analyzer>
    </fieldType>
  • Declare a field for storing math as follows:

    <field name="math" type="math" indexed="true" stored="false" multiValued="true" />

That's it. You can now run your Solr instance and test MathTokenizer in the analysis interface.

Citing MIaSMath

Text

SOJKA, Petr and Martin LÍŠKA. The Art of Mathematics Retrieval. In Matthew R. B. Hardy, Frank Wm. Tompa. Proceedings of the 2011 ACM Symposium on Document Engineering. Mountain View, CA, USA: ACM, 2011. p. 57–60. ISBN 978-1-4503-0863-2. doi:10.1145/2034691.2034703.

BibTeX

@inproceedings{doi:10.1145:2034691.2034703,
     author = "Petr Sojka and Martin L\'{i}\v{s}ka",
      title = "{The Art of Mathematics Retrieval}",
  booktitle = "{Proceedings of the ACM Conference on Document Engineering,
  		DocEng 2011}",
  publisher = "{Association of Computing Machinery}",
    address = "{Mountain View, CA}",
       year = 2011,
      month = Sep,
       isbn = "978-1-4503-0863-2",
      pages = "57--60",
        url = {http://doi.acm.org/10.1145/2034691.2034703},
        doi = {10.1145/2034691.2034703},
   abstract = {The design and architecture of MIaS (Math Indexer and Searcher), 
	       a system for mathematics retrieval is presented, and design 
	       decisions are discussed. We argue for an approach based on 
	       Presentation MathML using a similarity of math subformulae. The 
	       system was implemented as a math-aware search engine based on the 
	       state-of-the-art system Apache Lucene. Scalability issues were 
	       checked against more than 400,000 arXiv documents with 158 
	       million mathematical formulae. Almost three billion MathML 
	       subformulae were indexed using a Solr-compatible Lucene.},
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.