robust placeholder #1425

mbostock · 2024-06-04T02:10:30Z

This implements more robust parsing for inline expressions ${…}.

Inline expressions are now parsed prior to Markdown tokenization using a similar strategy to Hypertext Literal, based on the HTML5 tokenization specification, and replaced with comments (e.g., <--:803886cc:-->). By removing the inline expression’s code from the Markdown before it is tokenized, we ensure that the code does not corrupt the meaning of the surrounding Markdown, and that the code itself is not corrupted.

The expressions themselves are parsed with Acorn’s JavaScript tokenizer, allowing correct parsing of comments and template literals within inline expressions. The expressions need not be valid syntax, but if they are valid tokens, we can correctly terminate the inline expression on the closing right brace token. If the expressions are invalid tokens (for example, invalid Unicode escape sequences of unterminated template literals), then we fall back to simply counting curly braces.

The parser also now correctly removes backslashes that are used to escape inline expressions, such as \${…} or $\{…}.

Here are some examples of inline expressions that now work correctly:

${1 + /*}*/ 2}

${1
+ 2}

${html`<div>
  ${[1, 2, 3].map((i) => i)}
</div>`}

${`

hello world

`}

Here’s an example of an inline expression now being ignored in a RAWTEXT context:

<textarea>${1 + 2}</textarea>

Here are two examples of backslash removal:

<pre>\${1}</pre>

<pre>$\{1}</pre>

Fixes #375.
Fixes #396.
Fixes #597.
Fixes #636.

src/markdown.ts

src/placeholder.ts

Fil

Spotted a few errors that appear on the documentation pages; mostly, files are not loading, and (related?) there is the following error in the console:

Uncaught TypeError: Cannot read properties of undefined (reading 'previousSibling')
    at findLoading (client.js:243:24)
    at define (client.js:134:19)
    at loaders:69:1

It seems to happen because some ids are not properly collected into rootsById.

Also, the data loader code given as an example in http://127.0.0.1:3000/loaders (with run=false) now shows a <o-loading>… snippet:

// Fetch GeoJSON from the USGS.
const response = await fetch("https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson");
if (!response.ok) throw new Error(`fetch failed: <o-loading></o-loading><!--:dde7b529:-->`);
const collection = await response.json();

On http://127.0.0.1:3000/getting-started there is an error about digraph missing. I've tried to trim it down to a minimal repro:

```js
const fortytwo = 42.4242;
```

${fortytwo}

<code class="language-js">${fortytwo.toFixed(2)}</code>

test/placeholder-test.ts

mbostock · 2024-06-04T14:57:25Z

Oh ya, this is interesting:

<code class="language-js">${2}</span>

It’s because of the syntax highlighting we do statically; it’s dropping the  placeholder. Presumably this is a preexisting bug in the parent branch.

mbostock · 2024-06-04T15:01:36Z

Another problem is that we need to ignore inline expressions within fenced code blocks. Currently we’re trying to replace them e.g. inside of

```js
const s = `1 + ${2}`;
```

mbostock · 2024-06-04T16:00:40Z

I need to take another crack at this. The problem with fenced code blocks demonstrates that we can’t simply give inline expressions the highest precedence (by processing them prior to Markdown tokenization); at a minimum, we need to process them after tokenizing fenced code blocks.

But also, we can’t simply process fenced code blocks before inline expressions because you could have a degenerate case like this:

${/*

```
commented-out fenced code block within an inline expression?
```

*/}

But maybe we simply ignore this case — it doesn’t have any practical value, and it’s a bit like how script elements don’t allow you to have </script> anywhere? Maybe we can process inline expressions after fenced code blocks… I haven’t found yet where markdown-it is breaking up tokens.

mbostock · 2024-06-04T16:12:10Z

I have to run, but I don’t think it’s going to be practical to do this within markdown-it. The problem is that it uses its other block rules to determine when a block ends; so, to detect when a table block ends, it looks for its other “blockquote” terminator rules, and if any of them matches a line, that’s where the table block ends. But if we’re within an inline expression, then we don’t want it testing the lines of code.

So I think the simplest option (which is not so simple) is to replicate just the fenced code block parsing logic:

https://github.com/markdown-it/markdown-it/blob/master/lib/rules_block/fence.mjs

Our inline expression parser will need to detect fenced code blocks, and ignore inline expressions within those code blocks.

Co-authored-by: Philippe Rivière <fil@rezo.net>

mbostock · 2024-06-05T03:32:05Z

Okay, I think everything is good now! I’ve tried to cover everything with tests, but here’s what I was also testing manually:

## hello ${1 + 2}

```js
const fortytwo = 42.4242;
```

${1/*

```
commented-out fenced code block within an inline expression?
```

*/}

${fortytwo}

<code class="language-js">${fortytwo.toFixed(2)}</code>

`te${1}st`

<!--

```

testing

```

-->

another test

<h2>hello</h2> ${"world"}

<!-- ${'world'} -->

<!--```

testing

```-->

<pre>$\\{1}</pre>

<pre>$\{1}</pre>

<pre>\\${1}</pre>

<pre>\${1}</pre>

<pre>\$</pre>

<pre>\{</pre>

<pre>$\{</pre>

<textarea>${1 + 2}</textarea>

${`

hello world

`}

${1+/*

```
ignore me
```

*/+2}

${html`<div>
  ${[1, 2, 3].map((i) => i)}
</div>`}

${1
+ 2}

${1 + /*}*/ 2}

````js echo
`${Math.random()}`
````

````js echo
${Math.random()}
````

foo${1 + 2}bar

<pre>${Math.random()},${Math.random()}</pre>

${1}

src/placeholder.ts

Fil · 2024-06-05T10:17:25Z

Almost there! on http://127.0.0.1:3000/reactivity the console now says:

TypeError: Failed to execute 'observe' on 'IntersectionObserver': parameter 1 is not of type 'Element'.
    at variable_intersector (runtime.js:1276:14)
    at define (runtime.js:1309:23)
reject @ client.js:182
rejected @ client.js:139
variable_rejected @ runtime.js:963
(anonymous) @ runtime.js:1336

mbostock · 2024-06-05T13:25:37Z

You’re finding all the bugs in the parent PR you already approved. 😅

The problem here is that the visibility promise assumes that the variable’s _root is an element that can be observed with IntersectionObserver, but now the _root is a comment. So either we need to change the definition of visibility (which somewhat painfully is within the Observable Runtime), or visibility needs to watch the parent element. Probably the latter but that’s maybe not quite right for inline expressions? 🤔

Fil

I didn't find any more issues (I even tried it on pangea).

mbostock added 8 commits June 3, 2024 16:02

robust placeholder parser

9609f85

replacement strategy

d621303

Merge branch 'mbostock/no-wrapper-span' into mbostock/robust-placeholder

3a578af

Merge branch 'mbostock/no-wrapper-span' into mbostock/robust-placeholder

bc781e4

fix backslashes

2dfc67e

drop invalid interpolations

d6a4e01

comments

2cc9769

simplify

10f40cd

mbostock marked this pull request as ready for review June 4, 2024 03:18

remove unused codeErrors

59bb346

mbostock requested a review from Fil June 4, 2024 03:19

mbostock commented Jun 4, 2024

View reviewed changes

src/markdown.ts Show resolved Hide resolved

mbostock commented Jun 4, 2024

View reviewed changes

src/placeholder.ts Show resolved Hide resolved

Fil requested changes Jun 4, 2024

View reviewed changes

test/placeholder-test.ts Outdated Show resolved Hide resolved

preserve comments when highlighting

7f92a35

mbostock marked this pull request as draft June 4, 2024 15:14

mbostock added 5 commits June 4, 2024 16:15

handle fenced code blocks

259d945

more better parser

9073f6d

clean and comment

15b7a80

cleaner

117f1a0

fix fences within comments

113222e

mbostock marked this pull request as ready for review June 5, 2024 03:29

Update test/placeholder-test.ts

1ec4c7b

Co-authored-by: Philippe Rivière <fil@rezo.net>

mbostock commented Jun 5, 2024

View reviewed changes

src/placeholder.ts Outdated Show resolved Hide resolved

ignore invalid tokens

27720c8

root.parentNode for visibility

060eb23

mbostock requested a review from Fil June 5, 2024 14:10

Fil approved these changes Jun 5, 2024

View reviewed changes

mbostock merged commit 060eb23 into mbostock/no-wrapper-span Jun 5, 2024

mbostock deleted the mbostock/robust-placeholder branch June 5, 2024 15:39

This was referenced Jun 5, 2024

no wrapper span #1416

Merged

revert parser #1438

Merged

no wrapper span, take two #1439

Merged

ignore inline expressions in HTML rawtext and comment blocks #1440

Closed

inline tokenizer, take two #1441

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

robust placeholder #1425

robust placeholder #1425

mbostock commented Jun 4, 2024 •

edited

Loading

Fil left a comment

mbostock commented Jun 4, 2024

mbostock commented Jun 4, 2024

mbostock commented Jun 4, 2024

mbostock commented Jun 4, 2024

mbostock commented Jun 5, 2024

Fil commented Jun 5, 2024 •

edited

Loading

mbostock commented Jun 5, 2024

Fil left a comment

robust placeholder #1425

robust placeholder #1425

Conversation

mbostock commented Jun 4, 2024 • edited Loading

Fil left a comment

Choose a reason for hiding this comment

mbostock commented Jun 4, 2024

mbostock commented Jun 4, 2024

mbostock commented Jun 4, 2024

mbostock commented Jun 4, 2024

mbostock commented Jun 5, 2024

Fil commented Jun 5, 2024 • edited Loading

mbostock commented Jun 5, 2024

Fil left a comment

Choose a reason for hiding this comment

mbostock commented Jun 4, 2024 •

edited

Loading

Fil commented Jun 5, 2024 •

edited

Loading