Git Product home page Git Product logo

lacast's Introduction

Made With JavaMade With LaTeX Tests Maintainability Test Coverage

LaCASt - A LaTeX Translator for Computer Algebra Systems

LaCASt is the first context-aware translator for mathematical LaTeX expressions. LaCASt includes natural language processing to analyze textual contexts, custom semantic LaTeX parser to analyze math inputs, and CAS interfaces (currently Maple and Mathematica) to compute and verify translated expressions automatically.

Publications

If you want to reference to this tool in general, please use the most recent publication from TPAMI 2023. If you want to refer to automatic evaluations only, use the 2nd latest publication in TACAS 2022.

A. Greiner-Petter, M. Schubotz, C. Breitinger, P. Scharpf, A. Aizawa, B. Gipp (2023) "Do the Math: Making Mathematics in Wikipedia Computable". In TPAMI 2023: 4384-4395
@Article{GreinerPetter23,
  author       = {Andr{\'{e}} Greiner{-}Petter and
                  Moritz Schubotz and
                  Corinna Breitinger and
                  Philipp Scharpf and
                  Akiko Aizawa and
                  Bela Gipp},
  title        = {Do the Math: Making Mathematics in Wikipedia Computable},
  journal      = {{IEEE} Trans. Pattern Anal. Mach. Intell.},
  volume       = {45},
  number       = {4},
  pages        = {4384--4395},
  year         = {2023},
  url          = {https://doi.org/10.1109/TPAMI.2022.3195261},
  doi          = {10.1109/TPAMI.2022.3195261},
  timestamp    = {Mon, 28 Aug 2023 21:37:38 +0200},
  biburl       = {https://dblp.org/rec/journals/pami/GreinerPetterSBSAG23.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}
A. Greiner-Petter, H. S. Cohl, A. Youssef, M. Schubotz, A. Trost, R. Dey, A. Aizawa, B. Gipp (2020) "Comparative Verification of the Digital Library of Mathematical Functions and Computer Algebra Systems". In TACAS 2022: 87-105
@InProceedings{Greiner-PetterC22,
  author    = {Andr{\'{e}} Greiner{-}Petter and
               Howard S. Cohl and
               Abdou Youssef and
               Moritz Schubotz and
               Avi Trost and
               Rajen Dey and
               Akiko Aizawa and
               Bela Gipp},
  title     = {Comparative Verification of the Digital Library of Mathematical Functions
               and Computer Algebra Systems},
  booktitle = {Tools and Algorithms for the Construction and Analysis of Systems
               - 28th International Conference, {TACAS} 2022, Held as Part of the
               European Joint Conferences on Theory and Practice of Software, {ETAPS}
               2022, Munich, Germany, April 2-7, 2022, Proceedings, Part {I}},
  series    = {Lecture Notes in Computer Science},
  volume    = {13243},
  pages     = {87--105},
  publisher = {Springer},
  year      = {2022},
  url       = {https://doi.org/10.1007/978-3-030-99524-9\_5},
  doi       = {10.1007/978-3-030-99524-9\_5}
}
A. Greiner-Petter, M. Schubotz, H. S. Cohl, B. Gipp (2019) "Semantic preserving bijective mappings for expressions involving special functions between computer algebra systems and document preparation systems". In: Aslib Journal of Information Management. 71(3): 415-439
@Article{Greiner-Petter19,
  author    = {Andr{\'{e}} Greiner{-}Petter and
               Moritz Schubotz and
               Howard S. Cohl and
               Bela Gipp},
  title     = {Semantic preserving bijective mappings for expressions involving special
               functions between computer algebra systems and document preparation
               systems},
  journal   = {Aslib Journal of Information Management},
  volume    = {71},
  number    = {3},
  pages     = {415--439},
  year      = {2019},
  url       = {https://doi.org/10.1108/AJIM-08-2018-0185},
  doi       = {10.1108/AJIM-08-2018-0185}
}
H. S. Cohl, A. Greiner-Petter, M. Schubotz (2018) "Automated Symbolic and Numerical Testing of DLMF Formulae Using Computer Algebra Systems". In: CICM: 39-52
@InProceedings{Cohl18,
  author    = {Howard S. Cohl and
               Andr{\'{e}} Greiner{-}Petter and
               Moritz Schubotz},
  title     = {Automated Symbolic and Numerical Testing of {DLMF} Formulae Using
               Computer Algebra Systems},
  booktitle = {Intelligent Computer Mathematics - 11th International Conference,
               {CICM} 2018, Hagenberg, Austria, August 13-17, 2018, Proceedings},
  series    = {Lecture Notes in Computer Science},
  volume    = {11006},
  pages     = {39--52},
  publisher = {Springer},
  year      = {2018},
  url       = {https://doi.org/10.1007/978-3-319-96812-4\_4},
  doi       = {10.1007/978-3-319-96812-4\_4}
}
H. S. Cohl, M. Schubotz, A. Youssef, A. Greiner-Petter, J. Gerhard, B. V. Saunders, M. A. McClain, J. Bang, K. Chen (2017) "Semantic Preserving Bijective Mappings of Mathematical Formulae Between Document Preparation Systems and Computer Algebra Systems". In: CICM: 115-131
@InProceedings{Cohl17,
  author    = {Howard S. Cohl and
               Moritz Schubotz and
               Abdou Youssef and
               Andr{\'{e}} Greiner{-}Petter and
               J{\"{u}}rgen Gerhard and
               Bonita V. Saunders and
               Marjorie A. McClain and
               Joon Bang and
               Kevin Chen},
  title     = {Semantic Preserving Bijective Mappings of Mathematical Formulae Between
               Document Preparation Systems and Computer Algebra Systems},
  booktitle = {Intelligent Computer Mathematics - 10th International Conference,
               {CICM} 2017, Edinburgh, UK, July 17-21, 2017, Proceedings},
  series    = {Lecture Notes in Computer Science},
  volume    = {10383},
  pages     = {115--131},
  publisher = {Springer},
  year      = {2017},
  url       = {https://doi.org/10.1007/978-3-319-62075-6\_9},
  doi       = {10.1007/978-3-319-62075-6\_9}
}

How to use our program

The following provides a high level introduction on how to use the JARs and LaCASt in general. If you want to dive into the source code, we advice you to check our contribution guidelines first for more details on the structure.

The bin directory contains a couple of executable jars. Any of these programs require the lacast.config.yaml. Copy the config/template-lacast.config.yaml to the main directory and rename it to lacast.config.yaml. Afterward, update the entries in the template file to the properties that are applicable for you. LaCASt tries to load the config by following these rules:

  1. The system variable LACAST_CONFIG specifies the config location, e.g., export LACAST_CONFIG="path/to/lacast.config.yaml".
  2. The config file is in the current working directory.
  3. Loads the default config from the internal resources in the jar, see default config in interpreter.common/src/main/resources/

If none of the rules above point to a valid config, LaCASt stops with an error.


LaCASt contains several executable JARs as standalone applications. The following list explains the functionality of each JAR in more detail.

latex-to-cas-converter.jar: The forward translator (LaTeX -> CAS)

The executable jar for the translator can be found in the bin subdirectory. A standalone version can be found in the bin/*.zip file. Unzip the archive where you want and run the jar from the root folder of the respository

java -jar bin/latex-to-cas-converter.jar

Without additional information, the jar runs as an interactive program. You can start the program to directly trigger the translation process or set further flags (every flag is optional):

  • -CAS=<NameOfCAS>: Sets the computer algebra system you want to translate to, e.g., -CAS=Maple for Maple;
  • -Expression="<exp>": Sets the expression you want to translate. Double qutation marks are mandatory;
  • --clean or -c: Only returns the translated expression without any other information. (since v1.0.1)
  • --debug or -d: Returns extra information for debugging, such as computation time and list of elements. (--clean overrides this setting).
  • --extra or -x: Shows further information about translation of functions, e.g., branch cuts, DLMF-links and more. (--clean flag overrides this setting)

lexicon-creator.jar: Maintain the translation dictionary

Is used to maintain the internal translation dictionaries. Once the translation pattern is defined in the CSV files it must be trasformed to the dictionaries. The typical workflow is:

andre@agp:~$ java -jar bin/lexicon-creator.jar 
Welcome, this converter translates given CSV files to lexicon files.
You didn't specified CSV files (do not add DLMFMacro.csv).
Add a new CSV file and hit enter or enter '-end' to stop the adding process.
all
Current list: [CAS_Maple.csv, CAS_Mathematica.csv]
-end

maple-translator.jar: The backward translator for Maple (Maple -> Semantic LaTeX)

This jar requires an installed Maple license on the machine! To start the translator, you have to set the environment variables to properly run Maple (see here Building and Running a Java OpenMaple Application) In my case, Maple is installed in /opt/maple2019 and I'm on a Linux machine which requires to set MAPLE and LD_LIBRARY_PATH. In addition, you have to provide more heap size via -Xss50M, otherwise Maple crashes. Here is an example:

andre@agp:~$ export MAPLE="/opt/maple2019"
andre@agp:~$ export LD_LIBRARY_PATH="/opt/maple2019/bin.X86_64_LINUX"
andre@agp:~$ java -Xss50M -jar bin/maple-translator.jar 

To get the Maple paths, you can start maple and enter the following commands:

kernelopts( bindir );   <- returns <Maple-BinDir>
kernelopts( mapledir ); <- returns <Maple-Directory>

symbolic-tester.jar: Symbolic verification program

This is only for advanced users! First, setup the properties:

  1. config/symbolic_tests.properties Critical and required settings are:
# the path to the dataset
dlmf_dataset=/home/andreg-p/Howard/together.txt

# the lines that should be tested in the provided dataset
subset_tests=7209,7483

# the output path
output=/home/andreg-p/Howard/Results/AutoMaple/22-JA-symbolic.txt

# the output path for missing macros
missing_macro_output=/home/andreg-p/Howard/Results/AutoMaple/22-JA-missing.txt
  1. symbolic-tester.jar program arguments:
    • -maple to run the tests with Maple
    • -mathematica to run the tests with Mathematica (you can only specify one at a time, maple or mathematica)
    • -Xmx8g increase the java memory, that's not required but useful
    • -Xss50M increase the heap size if you use Maple

Additionally, you have to set environment variables if you work with Maple (see the maple-translator.jar instructions above for more details about required variables).

  1. Since you may want to run automatically evaluations on subsets, you can use the scripts/symbolic-evaluator.sh. Of course you need to update the paths in the script. With config/together-lines.txt you can control what subsets the script shall evaluate, e.g.,
04-EF: 1465,1994
05-GA: 1994,2179

The second argument is excluded (i.e., 1,2 runs only one line, 1 but not 2). To test the lines 1465-1994 and 1994-2179 and store the results in 04-EF-symbolic.txgt and 05-GA-symbolic.txt file.


numeric-tester.jar: Numeric verification program

This is only for advanced users! First, setup the properties:

  1. config/numerical_tests.properties Critical and required settings are:
# the path to the dataset
dlmf_dataset=/home/andreg-p/Howard/together.txt

# either you define a subset of lines to test or you define the results file of symbolic evaluation, which is recommended
# subset_tests=7209,7483
symbolic_results_data=/home/andreg-p/Howard/Results/AutoMath/11-ST-symbolic.txt

# the output path
output=/home/andreg-p/Howard/Results/MathNumeric/11-ST-numeric.txt
  1. numeric-tester.jar program arguments:

    • -maple to run the tests with Maple
    • -mathematica to run the tests with Mathematica
    • -Xmx8g increase the java memory, that's not required but useful
    • -Xss50M increase the heap size if you use Maple
  2. Since you may want to run automatically evaluations on subsets, you can use the scripts/numeric-evaluator.sh. Of course you need to update the paths in the script. With config/together-lines.txt you can control what subsets the script shall evaluate, e.g.,

04-EF: 1465,1994
05-GA: 1994,2179

This will automatically load the symbolic result files 04-EF-symbolic.txg and 05-GA-symbolic.txt and start the evaluation.


Update Translation Patterns

The translation patterns are defined in libs/ReferenceData/CSVTables. If you wish to add translation patterns you need to compile the changes before the translator can use them. To update the translations, use the lexicon-creator.jar (see the explanations above).

Update Pre-Processing Replacement Rules

The pre-processing replacement rules are defined in config/replacements.yml and config/dlmf-replacements.yml. Each config contains further explanations how to add replacement rules. The replacement rules are applied without further compilation. Just change the files to add, modify, or remove rules.

Contributors

Role Name Contact
Main Developer André Greiner-Petter greinerpetter (at) wuppertal.de
Supervisor Dr. Howard Cohl howard.cohl (at) nist.gov
Advisor Dr. Moritz Schubotz schubotz (at) uni-wuppertal.de
Advisor Prof. Abdou Youssef abdou.youssef (at) nist.gov
Student Developers Avi Trost, Rajen Dey, Claude, Jagan

lacast's People

Contributors

andreg-p avatar rajendey avatar physikerwelt avatar poortho avatar avitrost avatar abdouyoussef avatar howardcohl avatar snyk-bot avatar notjagan avatar dependabot[bot] avatar

Stargazers

Andrew Helsley avatar  avatar Jeff Carpenter avatar  avatar Wei avatar Mahmoud Rusty Abdelkader avatar  avatar

Watchers

 avatar Axel Kramer avatar  avatar  avatar

lacast's Issues

Which lexicons are necessary for the PomParser to run properly

As discussed, it would be good to put all other lexicons in a subdirectory. As far as I know the common user only need the following lexicons:

  • global-lexicon.txt
  • thesaurus.txt
  • global-ignorableLatex.txt
  • MathAndTextCmdsWithParamsWithoutRelevantEffect.txt

Change programs to command line programs

There are some features not yet support by the translator because there is no function for the command line program yet (for instance use alternative translations).
The Lexicon creator has no command line functionality at all yet!

Convert semantic LaTeX to a given CAS

The first milestone provides the conversion from semantic LaTeX to a given CAS (Maple & Mathematica).

  • support all DLMF macros
  • support simple prefix and suffix and DLMF macros in one equation (simple +-*/)
  • support complicated arguments (functions like sqrt, frac)
  • support multiple DLMF macros in one equation (multiply several macros)
  • support nested functions and macros (macro of a macro)
  • provide information about domains
  • provide information about branch cuts
  • support multiple CAS

Factorial feature set for ! and !!

I don't know if this is necessary but in a math environment we can assume n! means factorial function. Currently ! is just a punctuation symbol.

I could add it to my own global-lexicon but I think that's not meaningful.

Add modulo to grammar?

Currently modulo is not supported by our program. There are two possible solutions to solve this problem.

  1. Check for each expression if the next one is mod and handle that case.
  2. Add modulo expressions to the grammar back-end and update our program than.

Questions for Abdou

This is just a quick list of questions I want to ask Abdou. Kind of remainder for myself...

  • list of tags
    got it

  • where can i found the implementation of sqrt and frac and how to change or extend it
    waiting for another email of the grammar.

  • lexicon entry for each DLMF macro or is there a better option?
    yes, good approach

  • I expected PomParser only use global-lexicon but throws an exception if some other lexicons are missing

  • what about multiple names with same feature set. is it possible to write
    use a feature like "synonym" and searching in the lexicon for the synonym.

Open translation issues

There are some problematic special cases hard to translate. The current program cannot translate those cases properly (latest version 81bbdeb). The following table exemplifies the problem for translations from semantic LaTeX (DLMF/DRMF macros) to the CAS Maple.

Problem Name Example Description
Functions in Arguments1 \LaplaceTrans{f}{s} <> laplace(f, t, s) t is the argument of a given function f. The argument of f is not defined in DLMF macros.
Vector and Matrix arguments \Matrix{A} <> Matrix([[],[]]) A could be a LaTeX matrix. We have to extract and translate each element of the matrix and put it into the correct order. But the positioning of elements is currently resolved by place holders like $0. This concept doesn't work for matrices and vectors, since we don't know how big a matrix/vector is.
Prefix notation or Polish notation \det A <> Determinant(A) We cannot handle prefix notations yet. We have to figure out how long the following argument is. Currently we can translate it only if the MLP grammar takes \det as an expression.
Translations for special values \HyperboldpFq{2}{1}@{a,b}{c}{z} <> hypergeom([a, b], [c], d) Our current program cannot handle specific values of DLMF macros. In this case, it assumes 2 and 1 are the first two arguments of the macro. It would be translated to hypergeom([2], [1], a,b), which is completely wrong.
List of arguments \multinomial{m}{n1, n2 ... nk} <> multinomial(m, n1, n2 .. nk) It causes problems with other computer algebra systems. With Maple it works fine, because we can translate all , as ,.

1 Could be solved, if the function f is given by the DLMF macro for a function: \f{f}@{x}. But in this case we need to extract the argument of one macro (here x) to put it in another translation. \LaplaceTrans{ \f{f}@{x} }{s} should translated to laplace(f, x, s). Extract the 2nd argument of \f and put into the outer translation of \LaplaceTrans, namely into the 2nd position of laplace(...), is not possible now.

Extract information from style files

Write a program that extracts all the information we need from the DLMF style files.
We need information about the subscript, superscript, variables and the form of them (are their any parenthesis, semicolons and if, where are their).

Role of physical constants

All mathematical constants has the Feature:
Role: mathematical constant

But all physical constants don't have a feature named "Role". Would be easier to find if all constants have a feature named "Role".

Set placeholders in CSV

Currently a lot of the translations in Macro2Maple has no placeholders yet. Without these placeholders we cannot translate it correctly.

It's a easy task but it needs time.

Version numbers for MLP

@abdouyoussef
I think it would be nice to have some version numbers for the MLP jar. Sometimes we only have some changes in the lexicon file but since you upload the whole zip file always it is pretty hard for me to see what's going on in the jar. It worked well so far but now we reached a point where some updates in the jar could destroy my program.

Alternative Translations

The current concept provides alternative translations. For instance \acot@@{z} will be translated to Maple's arccot(z). But since Maple uses another branch cut for this function, the user has the alternative translation to arctan(1/z).

Currently our program just informs about this alternative but the user cannot configure the translation in a way, that our program choose the alternative translation instead of the first (most intuitive) translation.

Another example shows, why we should tackle this problem. Some DLMF macros has no direct translation, but based on the definition, we can provide an appropriate translation. For instance \BesselKtilde{\nu}@{x} has no translation to Maple yet. But based on the DLMF definition, we can provide the alternative translation to BesselK(I * nu, x).

Maple problems with white spaces

Maple has difficulties to understand numbers after strings letters (variables, constants, functions). For instance

  • 2 Catalan computes approx. 1.83
  • Catalan 2 produces an Error, missing operation.

Mathematica has no problems with it.

Additional Add-Ons

@physikerwelt I have no good ideas how to keep the door open for additional add-ons. Here is a list of examples:

  • Mathematical Constants: solved by a JSON file GreekLettersAndConstants.json
  • Greek Letters: solved by a JSON file GreekLettersAndConstants.json
  • MathTerms, a list of all cases can be found in MathTermParser.java. As an example, "star" and "forward slash" can be seen as multiplication and division in Maple and Mathematica and can be translated directly to * and / but maybe not in all CAS.

Convert the information from style files into lexicon files

Once we have done this, we have to write a program that translates the information about the macros into a lexicon file to use all information with Abdou's MLP.

We should extend the already existing lexicon file for special functions called [Special-functions-lexicon.txt](https://github.com/TU-Berlin/latex-grammar/blob/master/libs/ReferenceData/Lexicons/Special-functions-lexicon.txt).

Avoid hardcoding

@ClaudeZou @notjagan
MathMode.java

    private static int skipEscaped(String latex) {
        if (latex.startsWith("\\$")) {
            return 2;
        } else if (latex.startsWith("\\\\[")) {
            return 3;
        } else if (latex.startsWith("\\\\(")) {
            return 3;
        }
        return 0;
    }

Please store these numbers in private global constants. Maybe it's also better to store the strings in global constants as well.

Potential variable ambiguities

As some examples:

  • What is \pi? A constant or a variable? The user could uses \cpi to make it clear but what if he uses \pi?
  • What is i? Is it \iunit?
  • \gamma vs \EulerConstant

And an example of a different kind of ambiguities could be:

  • n!! is it the factorial of the factorial or the double factorial? Maple opens a pop up window to ask the user what he means by that.

We should warn the user about potential ambiguities.

Translation from MLP to strict content MathML

Write a simple program that translates the MLP output to strict content MathML. For a definition take a look to W3C. We currently only want to support simple cases. Here is a list of all supported strict content MathML tags:

  • cn: for numbers
  • csymbol: for mathematical operators (+, -, *, /). It can be used for mathematical functions in general but it needs a content dictionary to understand it.:
<apply>
  <csymbol cd="arith1">plus</csymbol>
  <ci>x</ci>
  <ci>y</ci>
  <ci>z</ci>
</apply>
  • ci: for variables or parameters. For instance n can be an integer -> <ci type="integer">n</ci>
  • cs: for general strings
  • apply: a mathematical object (similar to a container). This block is needed to build the typical expression tree of a mathematical expression.

Implement Standard Tests for MLP

When the underlying MLP software from Abdou changes, it may break our system. Thus, create some standard tests that help to incorporate new versions of the MLP.

\left( vs \left (

Is something like\left ( (with space between t and ( ) allowed in LaTeX?
@abdouyoussef your code handles \left( x \right) pretty well but cannot understand
\left ( x \right ).
Some quick tests with LaTeX shows me that the second variant also works but I'm not sure about that.

Role of @s in DRMF macros

@HowardCohl
We know from the glossary file the maximum number of @ of each macro right? And if there is at least one @ in the macro, there is always a representation without an @ AND variables. Is this always true?

For instance the continuous q-Hermite polynomial If there is no @ in this macro, that means there are no variables defined also right? Something like \ctsqHermite{n}{x}{q} is not possible?

Bug for "\\sqrt \\frac{1}{2}"

@abdouyoussef
I found a bug in the MLP grammar. To reproduce it, just try to parse

\\sqrt \\frac{1}{2}

Guess the reason is the definition for LatexCmd1or2Param or OtherLatex1Param.

General Interpreter - Multiple CAS

Currently we using some hard-coded translations. We need to extract these and try to put all of them into a JSON file to make it easier to extend it.

A mathematical constant

@abdouyoussef You added the role a mathematical constant to your global lexicon for i and e but all of the other constants has simply the role mathematical constant (without an "a").

Hard-Coded paths in PomParser

@abdouyoussef it seems you use hard-coded string variables to go from a given "References" directory into the Lexicon subfolder. That means your software is only usable on Windows machines, since Windows is the only system that uses \ as path delimiter. Linux uses / for instance.

Due to the fact that you use Java 7, I would recommended to change the code to Paths. Because a Path variable in Java 7 and higher always takes the system specific delimiters.

// given a string "refdir" as path to References
String pathToGlobalLex = refdir + Paths.get("Lexicon","global-lexicon.txt").toString();

PS: As far as I know, the only differences between Linux and Windows systems are the path delimiter and the new-line characters. To avoid this problem you can also use

String newLine = System.lineSeparator();

With these changes your program should work on all systems which supports Java 7 or higher.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.