Git Product home page Git Product logo

Comments (7)

edemaine avatar edemaine commented on June 25, 2024

Unfortunately, regexes won't cut it to fix this problem in full generality, e.g. to support

$$ x = \text{what is $x$?} $$

Would you be willing to review a PR that rewrites the parsing engine to count {/}s and check whether $s are escaped with \, when processing in dollars mode?

from markdown-it-texmath.

goessner avatar goessner commented on June 25, 2024

hmm ... interesting.

Usually I am very reluctant with extending the regexes used in texmath. They are performance critical, as they are evaluated once with each user key stroke.

In fact escaped dollar sign \$ is a valid element inside latex math and interestingly it is working flawlessly inside of other non-dollar delimiters.

Dollar delimiters are special beasts. While having a short look into the math-inline regex

/\$((?:\S)|(?:\S.*?\S))\$/

I reidentified the final \S (last character before closing dollar) as a shortcut for [^\r\n\t\f\v ] (character exept), which might be easily extended to [^\r\n\t\f\v \\] presumably without too much performance cost.

After testing it in https://regex101.com/ successfully for several relevant cases, I considered it worth for also use it as a guard in the math-block regexes.

Surprisingly a drastic simplification is helping here ... from

/\${2}([^$]+?)\${2}/

to

/\${2}(.+?)\${2}/

Expect it available in the next version.

thanks ...

from markdown-it-texmath.

goessner avatar goessner commented on June 25, 2024

Erik, thanks for your potential help.

If there are some problems with my bug fix attempt, I would like to come back to your PR offer.

from markdown-it-texmath.

edemaine avatar edemaine commented on June 25, 2024
/\${2}(.+?)\${2}/

Changing from [^$] to . excludes newlines. I think you want to allow single newlines (though you could forbid double newlines, as TeX does) within math. Instead of . you probably want [^], as in:

/\${2}([^]+?)\${2}/

But none of this fixes use of \$ (escaped $) or \text{$...$} (nested $) within a math expression.

One potential fix would be to use the regex as above, and then check whether the closing $ is actually escaped (had an odd number of \s before it), or has more unescaped {s than }s in it, and in that case, doing more work (probably another regex match to get contents until the next \${2}, concatenating, and trying again). This would mostly just take extra time in the weird edge cases of escaped and nested $s, which currently don't work, so seems like a win? But it would involve counting the number of {s and }s (without preceding \s) to make sure it's matched. I imagine this is all way faster than the cost of calling KaTeX, though.

Alternatively (and what I originally had in mind), the regex could match the opening \${2}, and then do a secondary search for { or } or `${2}$, checking for escaping in each case, and repeat until finding the unnested closing notation. I could test to see which is faster in which cases.

from markdown-it-texmath.

goessner avatar goessner commented on June 25, 2024

In fact I had temporarily forgotten that '.' excludes newlines. So taking your proposed '[^]' works fine ...

inline: [
   {   name: 'math_inline_double',
       rex: /\${2}(.+?)\${2}/gy
   },
   {   name: 'math_inline',
       rex: /\$((?:[^\r\n\t\f\v \\])|(?:\S.*?[^\r\n\t\f\v \\]))\$/gy
  }
],
block: [
  {   name: 'math_block_eqno',
      rex: /\${2}([^]+?[^\\])\${2}\s*?\(([^)\s]+?)\)/gmy
  },
  {   name: 'math_block',
     rex: /\${2}([^]+?[^\\])\${2}/gmy
  }
]

... as you can see with this example code (https://github.com/goessner/markdown-it-texmath/blob/master/test/bug-dollars.html)

    const str = `
  # Simple Dollar tests
  ## Inline

  here "$a+\\$ = b$" we "$\\$$" go "$\\text{\\$some...\\$}$"

  ## Inline block (single line only)
  
  $$a+\\$ = \\text{\\$more...\\$} \\$$$  or ...

  ## Block (multiline)

  $$
  a+\\$ = \\text{\\$text...\\$} \\$
  $$
`

resulting in ...
grafik

To also handle unescaped dollars inside of \text{$...$} I see effort to use relation as disproportionate. It seems to be reasonable to also escape dollars \text{\$...\$} in that edge case.
So I would prefer to live with that small insufficiency of markdown-it-texmath for performance and simplicity reasons.

I hope I have not overlooked anything ... thanks.

from markdown-it-texmath.

edemaine avatar edemaine commented on June 25, 2024

Minor points:

  • math_inline_double still uses .; should probably switch that to [^] too.
  • I think you can use [^\s\\] instead of [^\r\n\t\f\v \\]. (The behavior is slightly different: the former treats all Unicode space identically. Probably better?)
  • [^]+?[^\\] (which occurs in both block rules) seems to require at least two characters. Should probably be [^]*?[^\\].
  • Shouldn't math_inline_double have the same addition to exclude a trailing \?
  • \$((?:[^\r\n\t\f\v \\])|(?:\S.*?[^\r\n\t\f\v \\]))\$ can be simplified to \$([^\r\n\t\f\v \\]|\S.*?[^\r\n\t\f\v \\])\$ or \$([^\s\\]|\S.*?[^\s\\])\$. (I'm not quite sure why you're forbidding spaces next to the $s but I assume that's intentional, to avoid some stray matching.)

These new rules seem to deal with \$ properly. Nice!!

Nested $s are for re-entering math mode. \text{$x+y$} is different from \text{\$x+y\$}:

image

So it's not possible to escape these instances of $, as \$ means something in LaTeX.

Unmatched braces generate errors in KaTeX, though. (I see either Uncaught ParseError: KaTeX parse error: Expected '}', got 'EOF' at end of input or KaTeX parse error: Unexpected end of input in a macro argument, expected '}' at end of input: \text{.) So perhaps that could be detected, which triggers an "extension" regex? I believe the extension regex is exactly math_inline without the leading $, or applying math_inline but starting from the final $. On input $\text{$x+y$}$, after matching $\text{$, you'd next grab x+y$, fail, and then grab }$, and then succeed. This is quadratic time in the number of $s, but that's probably small... Alternatively, when in extension mode, we could count {/}s, so call KaTeX at most twice.

from markdown-it-texmath.

goessner avatar goessner commented on June 25, 2024

Sorry for the delay. Thanks for your valuable input.

* `math_inline_double` still uses `.`; should probably switch that to `[^]` too.

math_inline should be written on a single line, but I added it to be more forgiving here.

* I think you can use `[^\s\\]` instead of `[^\r\n\t\f\v \\]`. (The behavior is slightly different: the former treats all Unicode space identically. Probably better?)

This is definitely better ... taken.

* `[^]+?[^\\]` (which occurs in both block rules) seems to require at least two characters.  Should probably be `[^]*?[^\\]`.

Yes ... thanks for catching.

* Shouldn't `math_inline_double` have the same addition to exclude a trailing `\`?

sure ... done.

* `\$((?:[^\r\n\t\f\v \\])|(?:\S.*?[^\r\n\t\f\v \\]))\$` can be simplified to `\$([^\r\n\t\f\v \\]|\S.*?[^\r\n\t\f\v \\])\$` or `\$([^\s\\]|\S.*?[^\s\\])\$`. (I'm not quite sure why you're forbidding spaces next to the `$`s but I assume that's intentional, to avoid some stray matching.)

yes ... again a significant improvement.

Nested $s are for re-entering math mode. \text{$x+y$} is different from \text{\$x+y\$}:
So it's not possible to escape these instances of $, as \$ means something in LaTeX.

Now I understand that reentering math mode effect. Escaping makes no sense indeed. I still consider it a not so relevant edge case in practise. I don't want to invest that significant implementation effort at current. A PR is always welcome though.

thanks again ...

from markdown-it-texmath.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.