Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LaTeX is not preserved in TOC text #903

Open
ComFreek opened this issue Feb 17, 2021 · 13 comments
Open

LaTeX is not preserved in TOC text #903

ComFreek opened this issue Feb 17, 2021 · 13 comments
Labels
Area: Math Area: Table of contents Pertaining to table of contents (TOC generation and detection, related heading operations). Help wanted Looking for help. Issue: Bug Needs Discussion We haven't decided what to do.
Milestone

Comments

@ComFreek
Copy link

Problem

TOC entries for headers containing inline LaTeX are wrongly generated:

1. [$\\mathrm{gcd}(a, b)$](#mathrmgcda-b)

# $\mathrm{gcd}(a, b)$

Here, it should have been 1. [$\mathrm{gcd}(a, b)$](#mathrmgcda-b).
I am using Markdown All in One configured with "markdown.extension.math.enabled": false.

I am using the latest dev build as of 2021-02-17.

@Lemmingh Lemmingh added Area: Table of contents Pertaining to table of contents (TOC generation and detection, related heading operations). Issue: Question labels Feb 17, 2021
@Lemmingh
Copy link
Collaborator

This change is by design.

It would take a too large amount of effort to keep the result safe and not aggressively escape backslash at the same time.

/**
* The **rich text** (single line Markdown inline without raw HTML) representation of the rendering result (in strict CommonMark mode) of the heading.
* This must be able to be safely put into a `[]` bracket pair without breaking Markdown syntax.
*/
visibleText: string;

vscode-markdown/src/toc.ts

Lines 447 to 467 in c359515

/**
* Extracts those that can be rendered to visible text from a string of CommonMark **inline** structures,
* to create a single line string which can be safely used as **link text**.
*
* The result cannot be directly used as the content of a paragraph,
* since this function does not escape all sequences that look like block structures.
*
* We roughly take GitLab's `[[_TOC_]]` as reference.
*
* @param raw - The Markdown string.
* @param env - The markdown-it environment sandbox (**mutable**).
* @returns A single line string, which only contains plain textual content,
* backslash escape, code span, and emphasis.
*/
function createLinkText(raw: string, env: object): string {
const inlineTokens: Token[] = commonMarkEngine.engine.parseInline(raw, env)[0].children!;
return inlineTokens.reduce<string>((result, token) => {
switch (token.type) {
case "text":
return result + token.content.replace(/[&*<>\[\\\]_`]/g, "\\$&"); // Escape.

@Lemmingh Lemmingh added the Help wanted Looking for help. label Feb 17, 2021
@yzhang-gh
Copy link
Owner

Guess we need to add a patch for this. From my experience, many users will be affected by this.

@Lemmingh Lemmingh added Res: Answered Discussion closed with no more specific state. and removed Help wanted Looking for help. Issue: Bug labels Feb 17, 2021
@Lemmingh
Copy link
Collaborator

Rechecked.

Unsolvable

@Lemmingh
Copy link
Collaborator

I understand the need of displaying pretty math in headings and TOC, but I'm sorry,

The behavior cannot be changed anymore, unless we take the risk of putting broken links into TOC.


With #176, #194, #531, #540, #552, #570, #862, etc., we eventually lose control of how users embed math in Markdown, and have no reliable means of identifying math area.

We have to perform CommonMark only parsing when generating link text for TOC visible text.

@Lemmingh Lemmingh pinned this issue Feb 17, 2021
@yzhang-gh
Copy link
Owner

I mean we need to be consistent.

  • At the very least, if the math.enabled is true, we should make sure $...$ parts are still valid math envs (as this is the only syntax we support now).
  • If the math.enabled is false, as you said, we don't know whether there are other kinds of math envs like \[\] etc. But I prefer to at least leave $...$ untouched.

By saying a patch, I mean, e.g., a regexp replacing/"protecting" the backslashes in $...$. This makes the solution not "neat", but otherwise we will see more issues complaining about this after v3.5.0 release.

BTW, do you have some examples where we must escape the \? I cannot think of some.

@Lemmingh
Copy link
Collaborator

The Even-odd Problem.

The backslash \ is the very escaping indicator.


return result + token.content.replace(/[&*<>\[\\\]_`]/g, "\\$&"); // Escape.

Look at them. This is the minimal escaping set.

  • & : Entity or numeric character reference.
  • * and _ : Emphasis.
  • < and > : Autolink.
  • [ and ] : Link or image.
  • ` : Code span.
  • \ : Escape.

Their occurrences can change semantics, and eventually break the link.

They have to be escaped. My comments in code are clear enough.


A simple example:

If you're going to display

<a href="#uri">\[</a>

Then, you'll have to write

[\\\[](#uri)

@Lemmingh
Copy link
Collaborator

A math area is a special variant of code span/block, from the perspective of a parser.

Among solutions, I think GitLab is the wisest, and I use it in my daily work.

GitLab actually does not introduce new syntax, instead, it reuses the existing syntax in CommonMark. A math area on GitLab is exactly a code span/block.


On GitLab,

[[_TOC_]]

# $``[(a+b)!]^2``$

````math
\frac{1}{2}
````

gives you

<ul class="section-nav"><li><a href="#ab2">[(a+b)!]^2</a></li></ul>
<h1 data-sourcepos="3:1-3:18" dir="auto">
<a id="user-content-ab2" class="anchor" href="#ab2" aria-hidden="true"></a><span data-math-style="inline"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mo stretchy="false">[</mo><mo stretchy="false">(</mo><mi>a</mi><mo>+</mo><mi>b</mi><mo stretchy="false">)</mo><mo stretchy="false">!</mo><msup><mo stretchy="false">]</mo><mn>2</mn></msup></mrow><annotation encoding="application/x-tex">[(a+b)!]^2</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen">[</span><span class="mopen">(</span><span class="mord mathdefault">a</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1.064108em;vertical-align:-0.25em;"></span><span class="mord mathdefault">b</span><span class="mclose">)</span><span class="mclose">!</span><span class="mclose"><span class="mclose">]</span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8141079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">2</span></span></span></span></span></span></span></span></span></span></span></span>
</h1>
<span data-math-style="display"><span class="katex-display"><span class="katex"><span class="katex-mathml"><math><semantics><mrow><mfrac><mn>1</mn><mn>2</mn></mfrac></mrow><annotation encoding="application/x-tex">\frac{1}{2}</annotation></semantics></math></span><span class="katex-html" aria-hidden="true"><span class="base"><span class="strut" style="height:2.00744em;vertical-align:-0.686em;"></span><span class="mord"><span class="mopen nulldelimiter"></span><span class="mfrac"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.32144em;"><span style="top:-2.314em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">2</span></span></span><span style="top:-3.23em;"><span class="pstrut" style="height:3em;"></span><span class="frac-line" style="border-bottom-width:0.04em;"></span></span><span style="top:-3.677em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.686em;"><span></span></span></span></span></span><span class="mclose nulldelimiter"></span></span></span></span></span></span></span>

@Lemmingh
Copy link
Collaborator

I prefer to at least leave $...$ untouched.

It is not possible.

# $[(a+b)!](c+d)^2$

With pandoc-syntax, this is a math area.
Without, this is an inline link.

@Lemmingh
Copy link
Collaborator

/**
* The **rich text** (single line Markdown inline without raw HTML) representation of the rendering result (in strict CommonMark mode) of the heading.
* This must be able to be safely put into a `[]` bracket pair without breaking Markdown syntax.
*/
visibleText: string;

On my draft, it was

/**
 * The **single line plain text** representation of the rendering result (in CommonMark mode) of the heading.
 * This must be able to be safely put into a `[]` bracket pair as **link text** without breaking Markdown syntax.
 */
visibleText: string;

the same as GitLab's [[_TOC_]].

Later, I spent a hard time in createLinkText() to maintain backward compatibility. I think the current implementation is the best we can do, already the maximum backward compatibility.

@yzhang-gh
Copy link
Owner

I agree with you in theory (although I'm not very convinced by the \\\[ example which is a bit "unrealistic"...).

The problem is then we can only advocate the users to use the GitLab style math syntax? (I'm not against it but not sure whether other users will like it.)


Let's keep the change and leave this issue open to collect more feedback.

@yzhang-gh yzhang-gh added the Needs Discussion We haven't decided what to do. label Feb 18, 2021
@Lemmingh Lemmingh added this to the v3.5.0 milestone Jun 20, 2021
@Lemmingh Lemmingh unpinned this issue Jul 9, 2021
@Lemmingh Lemmingh mentioned this issue Jul 9, 2021
12 tasks
@Lemmingh Lemmingh added Res: As expected The existing behavior is by design or as expected. and removed Res: Answered Discussion closed with no more specific state. labels Aug 4, 2021
@Lemmingh Lemmingh changed the title TOC entries for headers containing inline LaTeX are invalid LaTeX is not preserved in TOC Aug 16, 2021
@Lemmingh Lemmingh changed the title LaTeX is not preserved in TOC LaTeX is not preserved in TOC text Aug 30, 2021
@Lemmingh Lemmingh modified the milestones: v3.5.0, v3.6.0 Oct 25, 2021
@coin8086
Copy link

coin8086 commented Apr 23, 2023

So the time is April 23, 2023 now. Do we have any workaround for this problem? I'm using Markdown All in One v3.5.1 in VS Code. I got generated TOC text

[ABC $\\mathbf{A} \\mathbf{x}$](#abc-mathbfa-mathbfx)

for

### ABC $\mathbf{A} \mathbf{x}$

The escaped text in TOC is ugly. I'd rather have them removed from TOC, like

[ABC](#abc)

@yzhang-gh
Copy link
Owner

any workaround

Unfortunately no.

I don't have sufficient time to change this. Might need some help from the community.

@coin8086
Copy link

any workaround

Unfortunately no.

I don't have sufficient time to change this. Might need some help from the community.

Sorry to hear that. Thanks for your effort anyway!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Math Area: Table of contents Pertaining to table of contents (TOC generation and detection, related heading operations). Help wanted Looking for help. Issue: Bug Needs Discussion We haven't decided what to do.
Projects
None yet
Development

No branches or pull requests

4 participants