kwarc / fomid Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 1 KB

fomid's People

Contributors

Watchers

fomid's Issues

Pop-up/Collapsible annotations in PDFs (e.g. structured proofs, links to math concepts)

Here and here are discussion on how to implement collapsible/expandable sections in LaTeX. To do that with arbitrary content will require some fiddling with tikz, to avoid the collapsed section taking up empty space in the pdf, but will enable structured proofs. Not sure how well that "nests", will need investigation.

Here is a discussion how to do tooltips/hoverboxes in TeX, which we might exploit for showing definitions on hovering over notations and symbol references.

Variable bindings in binding operators

Problem: FLaTeX needs support for binders. Bound variables can represent simple variables (e.g. \sum_{n\in N}n, where N is a "type"), but also sequences (e.g. \forall{x_1,...,x_n}..., see #7) or more complicated patterns (e.g. \sum_{(a,b)\in S}a^2\cdot b^2).

In the latter case, MMT probably has to take care of converting this to something tangible.

Related to #3

Special `\symdef`-variants for constants/functions/predicates

Problem:

Lots of boilerplate, multiple arities for different notations of the same symbol.
Example (SMGloM): A frequent pattern is of the form (see #3)
\symdef[name=foo]{fooOp}{op}
\symdef{foo}[2]{\fooOp(#1,#2)}
where \fooOp allows to reference the operator foo rather than the application of foo to arguments.
=> two LOC for a very common pattern. More of those, if variants are defined that only differ in the operator symbol itself, e.g.
\notation[variant=x]{timesOp}{\times}
\notation[variant=x]{times}[2]{#1 \timesOp[x] #2}
\notation[variant=star]{timesOp}{*}
\notation[variant=star]{times}[2]{#1 \timesOp[star] #2}

Question: are the more patterns like this? Out-of-the-box support for operator-macros or something more general?
Also: often the operator notation is just an application operator applied to "dummy arguments" (usually \cdot), e.g. \foo\cdot\cdot - is this the default operator notation? (MMT: dummy arguments are lambda abstracted away) Then the dummy arguments need to be changeable, e.g. Hom(-,A) is more common than Hom(\cdot,A)

(TODO)

Multi-sentence statements

Example

"Let a an A with b a B(a) such that P(a,b). If Q(a,b) and there is some c with R(a,b,c), then every D(a,b) is an E(a,b)"

=> \forall a\in A,b\in B(a). P(a,b) \to \Q(a,b) \to (\exists c\in C. R(a,b,c)) \to \forall d \in D(a,b). d\in E(a,b)

Idea: Multiple declarations in a structural feature that elaborates into a single proposition / definition. Retains both presentations.

Additional idea: LaTeX/LaTeXML can implement such features generically, by adding a <span> whose property-attribute reflects the name of the structural feature. That way, we can add additional features by providing a structural feature and a LaTeX macro in additional packages, istead of hardcoding them in the FLaTeX-package.

Natural language annotations in general

Example sentence

"A natural number $N$ is even, if its square is even."

Let \operator a semantic macro that takes n arguments.
e.g.: \implication, n=2.
In mathmode, \operator behaves as usual (e.g. $\implication{A}{B}$ expands to (ignoring semantics) $A \Rightarrow B$ - see https://github.com/KWARC/FoMID/issues/3)
Let p=(p_1,...,p_m) a phrase that represents an application of \operator to arguments arg_1,...,arg_n
e.g. p = "A natural number $N$ is even, if its square is even."
arg_1="its square is even",
arg_2="A natural number $N$ is even"

Requirements

\operator in text mode needs n arguments. The n arguments are subsequences of p in arbitrary order, interleaved with subsequences that do not represent an argument.
e.g. first argument of \implication is "its square is even",
second argument is "A natural number $N$ is even", and
", if" does not represent an argument - these are considered verbalization components of \operator.
arguments can be invisible.
e.g. the second argument of \natpow in "its square" is 2, which is not explicitly represented in the text.
each argument has to contain a uniquely identifiable subsequence that represents the actual term
e.g. in "A natural number $N$", the argument to \even is "$N$", not "natural number"!
variables: If declared, not compositional. If not declared, the scope needs determining ("is this still the same $N$ as this one?"),
its quantification (universally/existentially) and "type" (optionally) provided, and all of these ideally at most once, even if the variable
occurs multiple times.

Ideas

Macro \operator takes n {} arguments, representing the operator arguments, interleaved with optional [] arguments in between (including leading and trailing) representing verbalization components.
e.g. \even{its square}[is even] => "its square" is the argument, "is even" is just a verbalization component.
An argument prefixed by *[k] indicates that what follows is the k-th argument rather than the next one.
e.g. \implication*[2]{A natural number $N$ is even}[, if]{its square is even} indicates that the second argument comes first.
An argument prefixed by * is invisible
e.g. \natpow{its}[ square]*{2} == \natpow{its}*{2}[ square] == \natpow*[2]*{2}{its}[ square]

=> \implication*[2]{\even{A \NaturalNumbers[natural number] $N$}[ is even]}[, if]{\even{\natpow{its}*{2}[ square]}[ is even]}.

Remaining questions

Both "A natural number $N$" and "its" represent OMV(N), where N has type \NaturalNumbers and is universally quantified.
How to represent this? (see https://github.com/KWARC/FoMID/issues/5 )
Binding applications (see https://github.com/KWARC/FoMID/issues/8 )
Sequences (see https://github.com/KWARC/FoMID/issues/7 )

XHTML/LaTeXML/FLOMDoc

XHTML/LaTeXML

For declarations we can generate <script type="application/(xml | json)"> nodes (for xml: Requires prefixing all inner nodes with xhtml:, but they're not checked against the rnc and stripped away in post processing anyway).
For annotations, we probably need at most an annotation type (OMS, OMA etc.) and a URI/ID/name. For OMS, OMV etc. a URI will do, for OMAs. Applications of complex terms need an apply-operator anyway, so the head of an OMA is always a Symbol (?). For more complex attributes, we can add a <script> as the first child that holds complex terms etc.
In text mode, we can use <div> or <span> with attributes resource (URI) and property (annotation type).
In math mode, the only valid attribute that survives post processing in LaTeXML is class - might even be "semantically valid" in that both annotation type and a URI could even meaningfully be used as CSS classes. Then we can add property and resource or id as (as usual space separated) class attributes. Complex terms (if necessary) can be moved to a <script> after the <math> node and linked to via an id in the classes (inelegant, but functional, I guess).

Examples

Concept	property	resource
Theory	stex:theory	URI
OMA	stex:oma	URI of head symbol notation
OMV	stex.omv	URI of variable declaration if declared, name otherwise

Changing notations

On the latex side: a macro to set a default notation option, globally (for the whole document) or locally.

On the viewer side: we can give notations (including arbitrary value combinations for variant, lang, arity, see #3) unique URIs and annotate with those, so that MMT can parse out how a symbolic expression (i.e. MathML) came about. Conversely, \symdef/\notations can write their notations as MathML into the xhtml (in a <script>), so MMT should be able to easily substitute notations when serving a document, or even custom notations input via a latex field(?).

Question: how do we treat multiple notations for the same symbol on the MMT side? Multiple notations should be easily implementable, but what if new notations are added later on in separate modules - introduce defined constants? Keep FLaTeX notations separately in the flatex-Plugin for MMT?

Sequences and sequence arguments for flexary (associative) operators

Problem: We need systematic treatment of sequences and flexary arguments, in particular with respect to ellipses.

e.g. a_1 + ... + a_n, or a_0 + a_2 + a_4 + ... + a_n.

Needs to be compatible with earlier declarations, e.g. Let $(a_i)_{i\in I}$ a sequence in $A$ such that....

Sequence declarations?

Problem gets even trickier if you think of e.g. matrices like

\begin{pmatrix}
   a_{11} & \cdots & a_{1n} \\
   \vdots & \ddots & \vdots \\
   a_{n1} & \cdots & a_{nn}
\end{pmatrix}

...how do we treat this systematically?

Variables (scopes, quantification,...)

Possible approach 1: Variable declarations

\vardecl[name=N,type={\NaturalNumbers}]{varN}{N} (syntax analogous to \symdef, see #3) introduces a macro \varN for the variable N, and behaves exactly like a semantic macro,
i.e. in mathmode, $\varN$ simply yields "N" (its notation), in textmode it takes 0 {}-arguments and one optional []-argument:

=> \implication*[2]{\even{\varN[A \NaturalNumbers[natural number] $\varN$]}[ is even]}[, if]{\even{\natpow{\varN[its]}*{2}[ square]}[ is even]}

Possible approach 2: Scope environments
Can optionally declare variables, e.g. \varscope([name=name, type=t, notation=not, (universal|existential)])*{...} - with everything optional. If an occuring variable is not "declared in a \varscope, its scope is the most inner \varscope in which it occurs. A variable occurence is whatever LaTeXML's math parser considers a variable, or a \var{name} (for not explicitly declared variables in text, e.g. "its" in the example), or \name if a variable with name name is declared in the current \varscope, e.g.:

\varscope{\implication*[2]{\even{\var{N][A \NaturalNumbers[natural number] $N$]}[ is even]}[, if]{\even{\natpow{\var{N}[its]}*{2}[ square]}[ is even]}}

\varscope[name=varN, type=\NaturalNumbers, not=N]{\implication*[2]{\even{\varN[A \NaturalNumbers[natural number] $N$]}[ is even]}[, if]{\even{\natpow{\varN[its]}*{2}[ square]}[ is even]}}

Advantage: compatible with fewer explicit annotations, scope still has to be provided, though

Disadvantage: might require an additional \apply-operator for applications of (function) variables, but we probably can't avoid that anyway.

Deprecating signature modules

Instead of having a signature module foo and multiple natural language modules foo.en, foo.de as in SMGLoM, the library should not have to rely on "purely formal" modules. Instead, the formal content of all NL moduls should either coincide (entails duplication, but if definitions are extracted from annotated natural language, it will do that to some extent anyway), or ideally, \symdecls should only occur in some default language (probably english, given the expected coverage), and that module is included/extended by other NL modules.

TeX will have to be smarter about NL modules - e.g. pick the "default" language (according to babel package?) automatically, resolve e.g. ?foo to ?foo.<lang> automatically

Translation / Alignments

We might want a syntax for alignments in LaTeX
We might want users to be able to add new alignmentes on MathHub
=> We probably need a better syntax and interfaces for users to specify/provide alignments including their arguments
MMT needs a better way to store/retrive/know about alignments => ULO, relational? (Work in progress, once relational is reimplemented as ULO/rdf)? Where/how should they be stored in the future to be persistent (if provided by a standalone user interface?)

(commutative) diagrams

...need annotations for tikz/tikzcd/xy? All of them? Some of them?

Might require annotations for nodes and arrows as "universally" and "existentially" quantified for the usual categorial definitions

(Algebraic) structures

Problem

Structured concepts (e.g. groups, topological spaces, probability spaces etc.) in informal mathematics act as both "types" ("Let G a group") as well as bundles of declarations (a group is a tuple (G,o,e,⁻¹)). They can be "instantiated" with an arbitrary number of explicit concepts (e.g. "ℤ is a group", "(ℤ,+) is a group", "(ℤ,+,0,-) is a group",...)

=> They need to be both symbols (to act as types) as well as theories (for extension/inheritance: "A group is a monoid such that...")

Ideas

Obvious solution for symbol/theory duality: Mod-types => structures as records

Inheritance/Instantiation: Parametric theories: fields are theory parameters, declared individually, treated like variables (OMVs), e.g. (prototypical syntax:)

\begin{theory}[name=Magma]
  \parameter[type=\set]{universe}{M}
  \parameter[type={\funtype{\universe,\universe}{\universe}}]{op}[2]{#1 \circ #2}
\end{theory}

\begin{theory}[name=Monoid]
  \parameter[type=\set]{universe}{M}
  \parameter[type={\funtype{\universe,\universe}{\universe}}]{op}[2]{#1 \circ #2}
  \parameter[type=\universe]{unit}{e}
  \extendmodule[universe=\universe,op=\op]{Magma}

  (axioms)
\end{theory}

where \parameter follows the syntax of \symdef (see #3) and \extendmodule[...]{T} behaves like \importmodule, except that it \lets the parameter declarations in T to their assignments in the optional arguments (if given).

Note: this is probably best implemented by having the macro that stores all declarations in a model contain the unexpanded macro definitions, the actual semantic-macro definition macros (\notation,\symdefetc.) expanding notations, and \protecting the inner-most semantic-macro expansion (the one that we need e.g. LaTeXML bindings for, which has access to the full URI of a symbol). That way, \extendmodule[foo=\bar] really should automatically replace \foo by \bar everywhere in the relevant scope, and similar for nested \extendmodules.

Instantiation

Question: Do we need to distinguish between the structure and the universe? Are there "structures" that don't have a set universe as its primary component? Does something like Let \mathcal G = (G,\circ,e,{}^{-1}) ever occur?

Either way: \instantiate[...]{T}{name } creates a macro \name. MMT can set the type of \name as Mod(T) and define it as a record with the provided assignments or undefined "dummy" constants for the missing ones. \name[p] yields the parameter declaration p in module T, which (analogous to \extendmodule) is being \let as \foo if p=\foo is part of the optional arguments of \instantiate. MMT can then lambda-abstract away the not-provided assignments (i.e. dummy constants), which corresponds to universally quantified variables or (equivalently) proof arguments (in the case of axioms) in the current scope. \instantiate then basically behaves like \extendmodule in tex, except that the names of the parameters are \name[*p*] instead of \name, and MMT can handle them differently.

If we additionally let \name without parameters be \name[p] for the first parameter p in T, and (in some cases) the macroname allowed to be empty, then this allows for both e.g. Let \instantiate{Monoid}{monoidM} $\monoidM$ a group with operation $\monoidM[op]{\cdot}{\cdot} as well as
\istantiate[universe=\IntegerNumbers,op=\natplus,unit=0]{Monoid}{} (introducing proof obligations for the monoid axioms, by abstracting away the missing declarations)

Question: It occurs to me, that we may then not even make a distinction between \parameter and \symdef - the only difference is then between \importmodule - which is an import on the MMT side - and \extendmodule and \instiate - which are an implicit structure and a named structure (or somewhat equivalently a record) respectively.
MMT needs to take care that all recursively included modules in the structures are assigned to plain includes.

Realms

FLaTeX/FLOMDoc needs support for realms, see here.

Examples:

Groups. Face: theory of groups. Pillars: e.g. "Monoid with inverses" and "associative loop". This is fully solved by \extendmodule, see #2 - where the theory of groups would extend both monoid and loop => No two separate pillar modules even needed...?
Real Numbers. Face: Theory of archimedian topologically closed totally ordered fields. Pillars: e.g. Dedekind cuts, equivalence classes on Cauchy sequences. Maybe \extendmodule also solves these instances?

Question: What do we need on the MMT side to support this systematically?

Guided tours

Easy to implement in theory: Take the topologically sorted list of dependencies of a concept/document and present to reader.

Desirable: Have a user permanently mark modules as trivial/known, so that they are omitted from guided tours for that specific user. Easy to do in an IDE, on MathHub that would require accounts, I guess?

Implicit/default arguments/notations with differing arities

e.g. the general linear group of dimension n over a field K is usually just denoted as GL(n), omitting the field. Should we expect users to provide them each time?

Alternatively: Semantic abbreviations (I think sTeX has an \abbrdef, but it's not documented afaik?), possibly with an \implicit macro that signals MMT that an implicit argument occurs here, e.g.
\symdef{gln}[2]{GL(#1,#2)}
\abbrdef{glni}[1]{\gln{\implicit}{#1}}{GL(#1)}
Then MMT can attempt to do type inference and search the context for the most likely value. We can also allow for
local \abbrdefs that explicitly state a fixed argument for the current section, in which case we might want to mark these as not-exported in \includemodules.

verbalizations with arguments

We need a general scheme for verbalizations of functional concepts. Marcel Dreier's B.Sc Thesis has some thoughts on this, which are not quite implemented yet.

But the is more than discussed there. I have been stumbling over the treatment of "$n$-ary" again, which I had marked up as \defi[name=nary]{$n$-ary} or as $n$\defi[name=arity]{ary}. All of this muddles the fact that the word ary is functional and takes $n$ as an argument. Other examples are $\mathal{C}$-derivable and $C^\infty$-manifold or $n$ to $m$ relation.

Library structuring

Desirable features:

Multilinguality
Multiple differing definitions for the same concepts (e.g. using realms, see #12)
Remarks, Examples
Glossary-like entry points? On MathHub, should be followed by definitions, remarks and examples
Provenance/Confidence? ("Definition published here", "proof published here", "proof implemented in system X here", "added by user Y")

=> SMGloM structure as smglom/area/conceptname.language.tex likely won't do anymore. Archive structure is questionable anyway, but also the addition of several "types" of entries and multiple entries for "the same" concepts put this into question. Also, once the library grows, the individual archives will grow by largely irrelevant (for a given user) content. E.g. everyone needs basic algebra, no one needs highly specialized concepts in higher commutative algebra.

=> GIT integration in MMT could maybe solve that by LMH specifically cherry picking (dependency trees of) individual modules? In that case, the "backend" organizational structures becomes largely irrelevant

Semantic Search

Should largely work out of the box using MathWebSearch, but we might need to figure out how to generate an index from FLOMDoc-XHTML, that maybe also makes use of semantic information?

(I notice that the word semantic is somewhat overlaoded here: Semantic serach in the sense of MathWebSearch operates on syntax trees only, so it's more syntactic search in the logical sense and only semantic in the sense of "syntax-tree aware")

Symbolic notations

\symdecl[name=foo,type=tp]{bar} introduces a new symbol with name foo and macro \bar. If no name is given, the name is bar. The (optional) type is an arbitrary symbolic expression.

\notation... introduces a new notation for a symbol (more details below). \symdef[...]{bar}... combines \symdecl and \notation.

Notation schema

\notation[prec=p;p1x...xpn,args=args, vt=name]{foo}[n]{not}...

n is the (optional) arity, subsumed by:
args is a list of is,as and bs. If given, the length of * args* is the arity. If no args is given, the arity is n and all arguments are normal (i). If no n is given, then n=0.
- i represents a normal argument
- a represents an associative argument list
- b represents a bound variable argument
p is the operator precedence, the pi are the individual argument precedences. Default: pi=p, p=0, except if n=0, in which case p=\infprec (see bracketing below).
foo is the name or macroname or URI of the symbol that gets the new notation.
not is the actual notation, with the usual #i representing the ith argument.
\notation takes an additional argument for each associative argument list, that provides a notation for the concatenation of two list elements (i.e. a binary macro).
vt is one of (variant,lang,arity), name is a string identifying the notation variant. A notation for \foo with variant=bar is used via \foo[variant=bar] or \foo[bar]. Variants in the three dimensions (variant,lang,arity) can be composed (if a notation for the composition exist), e.g. \foo[bar,lang=de]. variant is generic, lang is used for language-specifc variants, arity is for variants of operators where commond "default" arguments are often suppressed -- e.g. least common multiple lcm(n,m) is called kleinstes gemeinsames Vielfaches in german, and correspondingly represented as kgV(n,m).

Question: how to handle binders and bounc variable arguments (separate issue: #8)

Example:

\notation{plus}[2]{#1 + #2}: \plus{a}{b} yields a+b

\notation[args=a]{plus}{#1}{#1 + #2}: \plus{a,b,c,d} yields a+b+c+d

\notation[args=ai]{sumlethan}{#1 < #2}{#1+#2}: \sumlethan{a,b,c}{A} yields a+b+c<A

Automatic bracketing

Automatic bracketing works by comparing a downwards precedence p_d with an upwards precedence p_u. The initial downward precedence is p_d:=-\infprec.
A semantic macro \foo{a1}...{an} sets the current downward precedence for each argument ai to its argument precedence p_d:=pi in the ith position. When expanding a semantic macro \bar, the operator precedence p of \bar is taken as upwards precedence p_u=p. Compare p_d and p_u. If p_u≤p_d, parentheses are wrapped around \bar, otherwise not.

Example:

\notation[prec=500]{plus}[2]{#1 + #2}
\notation[prec=600]{times}[2]{#1 * #2}
In both cases, argument precedences are operator precedences.

Consider \times{a}{\plus{b}{c}}. Then \times compares its precedence 600 with the (initial) downwards precedence -\infprec, where \infprec:=1000000. No parentheses are inserted. \times then sets the downwards precedence to 600 and expands its arguments.
\plus compares its precedende 500 with the downwards precedence 600 and inserts parentheses.
Result: a*(b+c)
Conversely: \plus{a}{\times{b,c}}. Here, \plus sets the downwards precedence to 500. \times compares its operator precedence 600 and does not insert parenthesis. Hence: a+b*c.

The default precedence \infprec for 0-ary semantic macros means no parentheses are ever automatically wrapped around constant symbols.

Question: do we need argument precedences in MMT?

Customization

The algorithm uses \notation@lparen and \notation@rparen as opening and closing brackets. By default, \notation@lparen:=( and \notation@rparen=). In displaymode, \left\notation@lparen and \right\notation@rparen are used.

\dobrackets{...} wraps brackets around its arguments and is used by the bracketing algorithm, hence allows for forcing brackets. \withbrackets{a}{b}{t} sets \notation@lparen:=a and \notation@rparen:=b in t <- allows customizing which brackets to use.

Question: Multiple notations in MMT, see #11

Typechecking expressions / Translating to foundations

Question: Can we typecheck FLaTeX expressions based on weak type annotations directly or do we need to translate them to a foundation?

Obviously, translation to e.g. MitM allows for type checking, but also requires decent alignment coverage.

Additionally/Alternatively: Maybe we can use realms (see #12) for this?
E.g. both a module for set theory and a module for type theory could provide pillars for an interface theory for implication, the former instantiates it with a set of pairs, the latter by a primitive constructor with inference rules. Then an expression containing only symbols for which pillars in type theory (or symbols which have definitions <- recurse) exist could be trivially translated to LF/MitM/LATIN2. "Type theory modules" in FLaTeX/FLOMDoc could be trivially aligned with LF/LATIN2/MitM concepts.

Similarly, a module for natural numbers could have a pillar for the Von Neumann construction, indicating how to translate it to ZF (or a weaker set theory - we only need infinity, extensionality and restricted comprehension, I think?), whereas the peano-axiomatic face-theory naturally yields a "type theoretic" pillar.

kwarc / fomid Goto Github PK

fomid's People

Contributors

Watchers

fomid's Issues

Problem:

Example

Example sentence

Requirements

Ideas

Remaining questions

XHTML/LaTeXML

Examples

Problem

Ideas

Instantiation

Notation schema

Automatic bracketing

Customization

Recommend Projects

Recommend Topics

Recommend Org