The regex for dollar delimiters breaks if the content includes an escaped dollar (<cod

<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data

Minor points: math_inline_double</co

Sorry for the delay. Thanks for your valuable input. <div class="snip

Cannot include escaped dollar when using dollars delimiters about markdown-it-texmath HOT 7 CLOSED

rogino commented on June 29, 2024

Cannot include escaped dollar when using dollars delimiters

from markdown-it-texmath.

Comments (7)

edemaine commented on June 29, 2024

Unfortunately, regexes won't cut it to fix this problem in full generality, e.g. to support

$$ x = \text{what is $x$?} $$

Would you be willing to review a PR that rewrites the parsing engine to count {/}s and check whether $s are escaped with \, when processing in dollars mode?

from markdown-it-texmath.

goessner commented on June 29, 2024

hmm ... interesting.

Usually I am very reluctant with extending the regexes used in texmath. They are performance critical, as they are evaluated once with each user key stroke.

In fact escaped dollar sign \$ is a valid element inside latex math and interestingly it is working flawlessly inside of other non-dollar delimiters.

Dollar delimiters are special beasts. While having a short look into the math-inline regex

/\$((?:\S)|(?:\S.*?\S))\$/

I reidentified the final \S (last character before closing dollar) as a shortcut for [^\r\n\t\f\v ] (character exept), which might be easily extended to [^\r\n\t\f\v \\] presumably without too much performance cost.

After testing it in https://regex101.com/ successfully for several relevant cases, I considered it worth for also use it as a guard in the math-block regexes.

Surprisingly a drastic simplification is helping here ... from

/\${2}([^$]+?)\${2}/

/\${2}(.+?)\${2}/

Expect it available in the next version.

thanks ...

from markdown-it-texmath.

goessner commented on June 29, 2024

Erik, thanks for your potential help.

If there are some problems with my bug fix attempt, I would like to come back to your PR offer.

from markdown-it-texmath.

edemaine commented on June 29, 2024

/\${2}(.+?)\${2}/

Changing from [^$] to . excludes newlines. I think you want to allow single newlines (though you could forbid double newlines, as TeX does) within math. Instead of . you probably want [^], as in:

/\${2}([^]+?)\${2}/

But none of this fixes use of \$ (escaped $) or \text{$...$} (nested $) within a math expression.

One potential fix would be to use the regex as above, and then check whether the closing $ is actually escaped (had an odd number of \s before it), or has more unescaped {s than }s in it, and in that case, doing more work (probably another regex match to get contents until the next \${2}, concatenating, and trying again). This would mostly just take extra time in the weird edge cases of escaped and nested $s, which currently don't work, so seems like a win? But it would involve counting the number of {s and }s (without preceding \s) to make sure it's matched. I imagine this is all way faster than the cost of calling KaTeX, though.

Alternatively (and what I originally had in mind), the regex could match the opening \${2}, and then do a secondary search for { or } or `${2}$, checking for escaping in each case, and repeat until finding the unnested closing notation. I could test to see which is faster in which cases.

from markdown-it-texmath.

goessner commented on June 29, 2024

In fact I had temporarily forgotten that '.' excludes newlines. So taking your proposed '[^]' works fine ...

inline: [
   {   name: 'math_inline_double',
       rex: /\${2}(.+?)\${2}/gy
   },
   {   name: 'math_inline',
       rex: /\$((?:[^\r\n\t\f\v \\])|(?:\S.*?[^\r\n\t\f\v \\]))\$/gy
  }
],
block: [
  {   name: 'math_block_eqno',
      rex: /\${2}([^]+?[^\\])\${2}\s*?\(([^)\s]+?)\)/gmy
  },
  {   name: 'math_block',
     rex: /\${2}([^]+?[^\\])\${2}/gmy
  }
]

... as you can see with this example code (https://github.com/goessner/markdown-it-texmath/blob/master/test/bug-dollars.html)

    const str = `
  # Simple Dollar tests
  ## Inline

  here "$a+\\$ = b$" we "$\\$$" go "$\\text{\\$some...\\$}$"

  ## Inline block (single line only)
  
  $$a+\\$ = \\text{\\$more...\\$} \\$$$  or ...

  ## Block (multiline)

  $$
  a+\\$ = \\text{\\$text...\\$} \\$
  $$
`

resulting in ...

To also handle unescaped dollars inside of \text{$...$} I see effort to use relation as disproportionate. It seems to be reasonable to also escape dollars \text{\$...\$} in that edge case.
So I would prefer to live with that small insufficiency of markdown-it-texmath for performance and simplicity reasons.

I hope I have not overlooked anything ... thanks.

from markdown-it-texmath.

edemaine commented on June 29, 2024

Minor points:

math_inline_double still uses .; should probably switch that to [^] too.
I think you can use [^\s\\] instead of [^\r\n\t\f\v \\]. (The behavior is slightly different: the former treats all Unicode space identically. Probably better?)
[^]+?[^\\] (which occurs in both block rules) seems to require at least two characters. Should probably be [^]*?[^\\].
Shouldn't math_inline_double have the same addition to exclude a trailing \?
\$((?:[^\r\n\t\f\v \\])|(?:\S.*?[^\r\n\t\f\v \\]))\$ can be simplified to \$([^\r\n\t\f\v \\]|\S.*?[^\r\n\t\f\v \\])\$ or \$([^\s\\]|\S.*?[^\s\\])\$. (I'm not quite sure why you're forbidding spaces next to the $s but I assume that's intentional, to avoid some stray matching.)

These new rules seem to deal with \$ properly. Nice!!

Nested $s are for re-entering math mode. \text{$x+y$} is different from \text{\$x+y\$}:

So it's not possible to escape these instances of $, as \$ means something in LaTeX.

Unmatched braces generate errors in KaTeX, though. (I see either Uncaught ParseError: KaTeX parse error: Expected '}', got 'EOF' at end of input or KaTeX parse error: Unexpected end of input in a macro argument, expected '}' at end of input: \text{.) So perhaps that could be detected, which triggers an "extension" regex? I believe the extension regex is exactly math_inline without the leading $, or applying math_inline but starting from the final $. On input $\text{$x+y$}$, after matching $\text{$ , you'd next grab x+y$, fail, and then grab }$, and then succeed. This is quadratic time in the number of $s, but that's probably small... Alternatively, when in extension mode, we could count {/}s, so call KaTeX at most twice.

from markdown-it-texmath.

goessner commented on June 29, 2024

Sorry for the delay. Thanks for your valuable input.

* `math_inline_double` still uses `.`; should probably switch that to `[^]` too.

math_inline should be written on a single line, but I added it to be more forgiving here.

* I think you can use `[^\s\\]` instead of `[^\r\n\t\f\v \\]`. (The behavior is slightly different: the former treats all Unicode space identically. Probably better?)

This is definitely better ... taken.

* `[^]+?[^\\]` (which occurs in both block rules) seems to require at least two characters.  Should probably be `[^]*?[^\\]`.

Yes ... thanks for catching.

* Shouldn't `math_inline_double` have the same addition to exclude a trailing `\`?

sure ... done.

* `\$((?:[^\r\n\t\f\v \\])|(?:\S.*?[^\r\n\t\f\v \\]))\$` can be simplified to `\$([^\r\n\t\f\v \\]|\S.*?[^\r\n\t\f\v \\])\$` or `\$([^\s\\]|\S.*?[^\s\\])\$`. (I'm not quite sure why you're forbidding spaces next to the `$`s but I assume that's intentional, to avoid some stray matching.)

yes ... again a significant improvement.

Nested $s are for re-entering math mode. \text{$x+y$} is different from \text{\$x+y\$}:
So it's not possible to escape these instances of $, as \$ means something in LaTeX.

Now I understand that reentering math mode effect. Escaping makes no sense indeed. I still consider it a not so relevant edge case in practise. I don't want to invest that significant implementation effort at current. A PR is always welcome though.

thanks again ...

from markdown-it-texmath.

Cannot include escaped dollar when using dollars delimiters about markdown-it-texmath HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent