mir-mu / mathmlcan Goto Github PK
View Code? Open in Web Editor NEWCanonicalizes different MathML encodings of equivalent formulae
Home Page: https://mir.fi.muni.cz/mathml-normalization/
License: Apache License 2.0
Canonicalizes different MathML encodings of equivalent formulae
Home Page: https://mir.fi.muni.cz/mathml-normalization/
License: Apache License 2.0
Apart from removing redundant mrow elements, MrowNormalizer also adds mrow to transform detected fenced expressions to the form:
<mrow><mo>(</mo><mrow> ... </mrow><mo>)</mo></mrow>
I'm not sure what should be preferred in case when this format breaks the rules for mrow minimizing (only one element in inner mrow
, etc.) Should the redundant elements be removed or should the exact format be kept?
Empty mi
and mn
elements (or those containing whitespace) can be removed.
Opposite conversion (mo -> mi) should be allowed, e.g. for function identifiers: exp sin cos tan tg cot cotan cotg ctg ctn sec csc cosec arcsin arccos arctan arccot arcsec arccsc sinh cosh tanh coth cesh csch arcsinh arcosh artanh arcoth arsech arcsch log lg ln
https://mir.fi.muni.cz/apps/MathCalEval/canonicoutput/view/22144
In case when the operator *
does not mean multiplication it should not be removed.
If we really need to remove infix *
I suggest checking if it has operands.
OperatorNormalizer should not remove mo
element if it is a required argument of its parent. This can change formula meaning and even create invalid MathML from a valid input.
New module should convert between table representations and other possibilities - e.g. binomial coefficients can be expressed by mtable elements or mfrac elements with linethickness=0
Hello, I am getting a strange exception when trying to index some documents. The exception is as follows:
May 09, 2018 8:42:41 PM cz.muni.fi.mir.mathmlcanonicalization.modules.ElementMinimizer execute
SEVERE: error while parsing the input file.
com.ctc.wstx.exc.WstxEOFException: Unexpected EOF in prolog
at [row,col {unknown-source}]: [1,0]
at com.ctc.wstx.sr.StreamScanner.throwUnexpectedEOF(StreamScanner.java:686)
at com.ctc.wstx.sr.BasicStreamReader.handleEOF(BasicStreamReader.java:2134)
at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2040)
at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069)
at cz.muni.fi.mir.mathmlcanonicalization.modules.ElementMinimizer.minimizeElements(ElementMinimizer.java:134)
at cz.muni.fi.mir.mathmlcanonicalization.modules.ElementMinimizer.execute(ElementMinimizer.java:84)
at cz.muni.fi.mir.mathmlcanonicalization.MathMLCanonicalizer.executeStreamModules(MathMLCanonicalizer.java:375)
at cz.muni.fi.mir.mathmlcanonicalization.MathMLCanonicalizer.canonicalize(MathMLCanonicalizer.java:326)
at cz.muni.fi.mias.math.MathTokenizer.parseMathML(MathTokenizer.java:304)
at cz.muni.fi.mias.math.MathTokenizer.processFormulae(MathTokenizer.java:280)
at cz.muni.fi.mias.math.MathTokenizer.reset(MathTokenizer.java:246)
at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:613)
at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:359)
at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:318)
at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:241)
at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:465)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1526)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1500)
at cz.muni.fi.mias.indexing.Indexing.indexDocsThreaded(Indexing.java:145)
at cz.muni.fi.mias.indexing.Indexing.indexFiles(Indexing.java:89)
at cz.muni.fi.mias.MIaS.main(MIaS.java:39)
java -jar MIaS-1.6.6-4.10.4-SNAPSHOT.jar -conf ~/sandbox/mias/mias.properties -overwrite ~/sandbox/mias/data/samples/sample-mathml.xhtml ~/sandbox/mias/data/samples
MathMLCan: develop
branch
MIaS: master
branch
MIaSMath: master
branch
$ java -version
java version "1.8.0_171"
Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)
mias.properties
configuration is:INDEXDIR=~/sandbox/mias/indexes/index-0
UPDATE=false
THREADS=16
MAXRESULTS=10000
DOCLIMIT=-1
FORMULA_DOCUMENTS=true
sample-mathml.xhtml
file is as simple as:<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8" />
</head>
<body>
<math>
<mfrac linethickness="1">
<!-- numerator -->
<mrow>
<mi> x </mi>
<mo> + </mo>
<mi mathcolor="red"> y </mi>
<mo> + </mo>
<mi> z </mi>
</mrow>
<!-- denominator -->
<mrow>
<mi> x </mi>
<mphantom>
<mo> + </mo>
<mi> y </mi>
</mphantom>
<mo> + </mo>
<mi> z </mi>
</mrow>
</mfrac>
<mfenced open=":" close="?">
</mfenced>
</math>
</body>
</html>
Is there please any help out there or any ideas what could be a problem here?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.