Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LaTeX parsing cannot discern $ when not intended to be used #15846

Closed
turt2live opened this issue Nov 28, 2020 · 22 comments · Fixed by matrix-org/matrix-react-sdk#5515
Closed
Labels
A-Maths Render LaTeX maths in messages P1 S-Major Severely degrades major functionality or product features, with no satisfactory workaround T-Defect

Comments

@turt2live
Copy link
Member

Posting two permalinks in the same message, for instance, will result in a highly mangled message.

@turt2live turt2live changed the title LaTeX parsing cannot discern $ in links LaTeX parsing cannot discern $ when not intended to be used Dec 1, 2020
@turt2live
Copy link
Member Author

also applies when posting bash scripts, things in code blocks, etc. In my mind, this is a blocker to the feature ever leaving labs.

@turt2live turt2live added T-Defect defect P1 S-Major Severely degrades major functionality or product features, with no satisfactory workaround A-Maths Render LaTeX maths in messages labels Dec 1, 2020
@jryans
Copy link
Collaborator

jryans commented Dec 2, 2020

@akissinger @uhoreg Any thoughts on how we might resolve this? I agree with @turt2live that we'd need to have some solution here before LaTeX composing could leave labs.

@akissinger
Copy link
Contributor

This was partly my intention when I originally made the delimiters double and triple $'s, rather than single and double. It doesn't totally avoid the problem, but it at least makes it more uncommon. Another option is to have latex-style delimiters \( \) and \[ \], but in practice people are not so familiar with these.

Another solution could be the following:

  1. make sure latex is never interpreted inside of a markdown link or code block.
  2. literal dollar signs in other contexts need to be escaped

This has the disadvantage that people who don't know/care about latex math might get weird output when talking about money. :)

Perhaps latex math should always be a setting that is off by default on the mainstream servers, but more specialised (e.g. academic) servers can make it on by default.

@uhoreg
Copy link
Member

uhoreg commented Dec 2, 2020

One possibility is to only do the LaTeX substitution only when the first $ not preceded by a word character or number (so either preceded by a non-word, non-numeric character or is at the beginning of the message) and the second $ is not followed by a word character or number. So if someone write "$1 or $2", this will not be interpreted as math since the second $ is followed by a number.

@akissinger
Copy link
Contributor

I could still see this going wrong if someone is writing a bit of $-heavy code (bash, makefile, pearl, ...), but combined with avoiding latex-ing inside of code blocks would probably do the job for 99% of cases.

@turt2live
Copy link
Member Author

does it make sense to not parse it if there's a linebreak somewhere between the dollar signs? I think that would fix the majority of cases.

@akissinger
Copy link
Contributor

It's pretty common to have a linebreak in latex math, e.g. for matrices or long-ish expressions. It's more common for display math than inline, but I still think this would be undesirable for either

@turt2live
Copy link
Member Author

hmm, at the very least it shouldn't try to use dollar signs in code blocks (though parsing for that is not very fun)

@thosgood
Copy link

thosgood commented Dec 8, 2020

not a solution, but just to mention that a lot of TeX syntax highlighters only treat dollar signs as delimiting math mode if there are spaces before the first dollar and after the second, but not after the first or before the second. I think combining this with some of the other suggestions (never render in code blocks or links) and disabling latex by default would be how other projects deal with this problem (does anybody have rocketchat or anything that they can test to see how it works there?)

note that, with the old composer, this could be solved quite easily by having an "enable/disable latex" toggle alongside the markdown toggle button, but that doesn't seem to be a solution any longer

@tobiasBora
Copy link

tobiasBora commented Dec 13, 2020

I can confirm that we would not expect LaTeX to be rendered inside block, this gives quite strange results, like:
image

Also, there is right now a very minor bug when you output 3 dollars instead of two (this bug is very minor since it's even hard to give a meaning to this syntax, since it does not compile on both LaTeX and pandoc):
image

Finally, I would expect (like in latex) that an escaped dollar \$ gives a normal dollar, like in LaTeX. The code

$x = \$3$

gives:
image
while a true LaTeX gives:
image

And concerning the rules, I guess it could be nice to follow the rules used by pandoc, I tried them and they make sense:

$ echo 'Hello, this costs $3 and $6.' | pandoc
<p>Hello, this costs $3 and $6.</p>

$ echo 'Hello, this costs $3 and 6$.' | pandoc
<p>Hello, this costs <span class="math inline">3<em>a</em><em>n</em><em>d</em>6</span>.</p>

$ echo 'Hello, this costs $3 and 6 $.' | pandoc
<p>Hello, this costs $3 and 6 $.</p>

$ echo 'Hello, this costs $ 3 and 6$.' | pandoc
<p>Hello, this costs $ 3 and 6$.</p>

$ echo 'Hello, this costs $3 + \$6$.' | pandoc
<p>Hello, this costs <span class="math inline">3 + $6</span>.</p>

$ echo 'Hello, this costs $3 + 6\$.' | pandoc
<p>Hello, this costs $3 + 6$.</p>

$ echo 'Hello, this costs ```$3 + 6$```.' | pandoc -f markdown
<p>Hello, this costs <code>$3 + 6$</code>.</p>

@polwel
Copy link

polwel commented Dec 14, 2020

As much as I'd love to have LaTex messages, it is orders more common that users just plop a link into their messages. URLs with dollar symbols do exist. We already have issue #4674, so the introduction of math expression must not make this worse.

Since anyway it currently falls back to a code block, how about using `$\alpha$` as the syntax? EDIT: Turns out that is the way Gitlab does it, though with the backticks and the $ reversed. $`\alpha`$.

@jryans
Copy link
Collaborator

jryans commented Dec 14, 2020

@akissinger Would you be interested in working on some changes to ignore $ for LaTeX when inside links and code blocks? That seems like it would address the main points here.

Ideally, we'll find a way to craft something that can be enabled by default (rather than only used on certain deployments).

@tobiasBora
Copy link

tobiasBora commented Dec 15, 2020 via email

@akissinger
Copy link
Contributor

Based on our experience running this on our own server, I don't think the $...$ or $...$ syntax is a very good option. We already switched from $$ to $ because people found using double-$'s too confusing.

Possibly this strategy catches as many cases in the most unobtrusive way:

  1. don't parse latex in links or code blocks
  2. require the opening $ to be preceded by whitespace or BOL and the closing $ to be followed by whitespace or EOL
  3. all other literal dollar signs should be escaped

Out of curiousity, how are underscores handled? These also commonly feature in URLs. I assume commonmark does the right think for proper markdown links, but for a bare URL in a message, does this put random stuff into italics?

@jryans I might get some time for working on this in the coming weeks/months, but it's hard to predict. If someone else wants to get involved, I could point them at the relevant bits.

@inducer
Copy link

inducer commented Dec 15, 2020

Gitlab uses

$`...`$

and that works OK IMO.

@akissinger
Copy link
Contributor

Presumably most people on Gitlab are programmers, who at least know what markdown is. It doesn't seem so suitable for our use case, which is general scientific communication (including e.g. physicists and mathematicians).

@t3chguy
Copy link
Member

t3chguy commented Dec 15, 2020

Out of curiousity, how are underscores handled? These also commonly feature in URLs. I assume commonmark does the right think for proper markdown links, but for a bare URL in a message, does this put random stuff into italics?

Yup, it breaks as documented here: #4674 element-hq/element-meta#1758 #6434

@polwel
Copy link

polwel commented Dec 16, 2020

Presumably most people on Gitlab are programmers, who at least know what markdown is. It doesn't seem so suitable for our use case, which is general scientific communication (including e.g. physicists and mathematicians).

Scientists and engineers are also the exact kind of people that have no trouble learning a new syntax. I think that the Element UX is pretty good in that regard, as the markup buttons in the editor (the ones that appear when selecting a chunk of text) only apply the appropriate Markdown. They do a decent job at teaching how to use Markdown even when you've never heard of it. Why not simply add another button for math?

@rda0
Copy link
Contributor

rda0 commented Dec 21, 2020

I created a PR (matrix-org/matrix-react-sdk#5515) that could potentially solve this. I tried to take all opinions in the comments here into account.

Just to note, there are also some discussions in https://talk.commonmark.org/t/can-math-formula-added-to-the-markdown/3140/2 and https://talk.commonmark.org/t/mathematics-extension/457 about how maths could be a commonmark extension.

@grinapo
Copy link

grinapo commented Jan 27, 2021

As mentioned in #16290 : why does it even try to interpret LaTeX in ``` code blocks? It definitely should not.

@rda0
Copy link
Contributor

rda0 commented Jan 29, 2021

As mentioned in #16290 : why does it even try to interpret LaTeX in ``` code blocks? It definitely should not.

@grinapo The current implementation first parses the raw input for LaTeX and then runs the output through the Markdown parser (commonmark). Solutions could be to use a combined parser or to replace commonmark (#5940) with an extensible markdown parser and add LaTeX syntax as an extension as it is done in FluffyChat.

@randolf-scholz
Copy link

randolf-scholz commented Feb 23, 2021

Gitlab uses

$`...`$

and that works OK IMO.

This is not so nice because you cannot simply copy-paste math formulas from Markdown documents, Jupyter notebooks, Python comments, math.stackexchange posts or just plain LaTeX anymore.

But I think this problem is already kind of solved, just look at how Markdown+MathJax/Katex/Jupyter/VSCode/math.stackexchange.com deal with it: Any dollar sign inside code blocks [`..$..` ] or [ ```..$..``` ] is escaped, and dollar signs for currency denomination can be escaped with a backslash [\$].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Maths Render LaTeX maths in messages P1 S-Major Severely degrades major functionality or product features, with no satisfactory workaround T-Defect
Projects
None yet
Development

Successfully merging a pull request may close this issue.