Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

robust placeholder #1425

Merged
merged 18 commits into from
Jun 5, 2024
Merged

Conversation

mbostock
Copy link
Member

@mbostock mbostock commented Jun 4, 2024

This implements more robust parsing for inline expressions ${…}.

Inline expressions are now parsed prior to Markdown tokenization using a similar strategy to Hypertext Literal, based on the HTML5 tokenization specification, and replaced with comments (e.g., <--:803886cc:-->). By removing the inline expression’s code from the Markdown before it is tokenized, we ensure that the code does not corrupt the meaning of the surrounding Markdown, and that the code itself is not corrupted.

The expressions themselves are parsed with Acorn’s JavaScript tokenizer, allowing correct parsing of comments and template literals within inline expressions. The expressions need not be valid syntax, but if they are valid tokens, we can correctly terminate the inline expression on the closing right brace token. If the expressions are invalid tokens (for example, invalid Unicode escape sequences of unterminated template literals), then we fall back to simply counting curly braces.

The parser also now correctly removes backslashes that are used to escape inline expressions, such as \${…} or $\{…}.

Here are some examples of inline expressions that now work correctly:

${1 + /*}*/ 2}
${1
+ 2}
${html`<div>
  ${[1, 2, 3].map((i) => i)}
</div>`}
${`

hello world

`}

Here’s an example of an inline expression now being ignored in a RAWTEXT context:

<textarea>${1 + 2}</textarea>

Here are two examples of backslash removal:

<pre>\${1}</pre>
<pre>$\{1}</pre>

Fixes #375.
Fixes #396.
Fixes #597.
Fixes #636.

@mbostock mbostock marked this pull request as ready for review June 4, 2024 03:18
@mbostock mbostock requested a review from Fil June 4, 2024 03:19
Copy link
Contributor

@Fil Fil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spotted a few errors that appear on the documentation pages; mostly, files are not loading, and (related?) there is the following error in the console:

Uncaught TypeError: Cannot read properties of undefined (reading 'previousSibling')
    at findLoading (client.js:243:24)
    at define (client.js:134:19)
    at loaders:69:1

It seems to happen because some ids are not properly collected into rootsById.

Also, the data loader code given as an example in http://127.0.0.1:3000/loaders (with run=false) now shows a <o-loading>… snippet:

// Fetch GeoJSON from the USGS.
const response = await fetch("https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson");
if (!response.ok) throw new Error(`fetch failed: <o-loading></o-loading><!--:dde7b529:-->`);
const collection = await response.json();

On http://127.0.0.1:3000/getting-started there is an error about digraph missing. I've tried to trim it down to a minimal repro:

```js
const fortytwo = 42.4242;
```

${fortytwo}

<code class="language-js">${fortytwo.toFixed(2)}</code>

test/placeholder-test.ts Outdated Show resolved Hide resolved
@mbostock
Copy link
Member Author

mbostock commented Jun 4, 2024

Oh ya, this is interesting:

<code class="language-js">${2}</span>

It’s because of the syntax highlighting we do statically; it’s dropping the <!--:id:--> placeholder. Presumably this is a preexisting bug in the parent branch.

@mbostock
Copy link
Member Author

mbostock commented Jun 4, 2024

Another problem is that we need to ignore inline expressions within fenced code blocks. Currently we’re trying to replace them e.g. inside of

```js
const s = `1 + ${2}`;
```

@mbostock mbostock marked this pull request as draft June 4, 2024 15:14
@mbostock
Copy link
Member Author

mbostock commented Jun 4, 2024

I need to take another crack at this. The problem with fenced code blocks demonstrates that we can’t simply give inline expressions the highest precedence (by processing them prior to Markdown tokenization); at a minimum, we need to process them after tokenizing fenced code blocks.

But also, we can’t simply process fenced code blocks before inline expressions because you could have a degenerate case like this:

${/*

```
commented-out fenced code block within an inline expression?
```

*/}

But maybe we simply ignore this case — it doesn’t have any practical value, and it’s a bit like how script elements don’t allow you to have </script> anywhere? Maybe we can process inline expressions after fenced code blocks… I haven’t found yet where markdown-it is breaking up tokens.

@mbostock
Copy link
Member Author

mbostock commented Jun 4, 2024

I have to run, but I don’t think it’s going to be practical to do this within markdown-it. The problem is that it uses its other block rules to determine when a block ends; so, to detect when a table block ends, it looks for its other “blockquote” terminator rules, and if any of them matches a line, that’s where the table block ends. But if we’re within an inline expression, then we don’t want it testing the lines of code.

So I think the simplest option (which is not so simple) is to replicate just the fenced code block parsing logic:

https://github.com/markdown-it/markdown-it/blob/master/lib/rules_block/fence.mjs

Our inline expression parser will need to detect fenced code blocks, and ignore inline expressions within those code blocks.

@mbostock mbostock marked this pull request as ready for review June 5, 2024 03:29
Co-authored-by: Philippe Rivière <fil@rezo.net>
@mbostock
Copy link
Member Author

mbostock commented Jun 5, 2024

Okay, I think everything is good now! I’ve tried to cover everything with tests, but here’s what I was also testing manually:

## hello ${1 + 2}

```js
const fortytwo = 42.4242;
```

${1/*

```
commented-out fenced code block within an inline expression?
```

*/}

${fortytwo}

<code class="language-js">${fortytwo.toFixed(2)}</code>

`te${1}st`

<!--

```

testing

```

-->

another test

<h2>hello</h2> ${"world"}

<!-- ${'world'} -->

<!--```

testing

```-->

<pre>$\\{1}</pre>

<pre>$\{1}</pre>

<pre>\\${1}</pre>

<pre>\${1}</pre>

<pre>\$</pre>

<pre>\{</pre>

<pre>$\{</pre>

<textarea>${1 + 2}</textarea>

${`

hello world

`}

${1+/*

```
ignore me
```

*/+2}

${html`<div>
  ${[1, 2, 3].map((i) => i)}
</div>`}

${1
+ 2}

${1 + /*}*/ 2}

````js echo
`${Math.random()}`
````

````js echo
${Math.random()}
````

foo${1 + 2}bar

<pre>${Math.random()},${Math.random()}</pre>

${1}

src/placeholder.ts Outdated Show resolved Hide resolved
@Fil
Copy link
Contributor

Fil commented Jun 5, 2024

Almost there! on http://127.0.0.1:3000/reactivity the console now says:

TypeError: Failed to execute 'observe' on 'IntersectionObserver': parameter 1 is not of type 'Element'.
    at variable_intersector (runtime.js:1276:14)
    at define (runtime.js:1309:23)
reject @ client.js:182
rejected @ client.js:139
variable_rejected @ runtime.js:963
(anonymous) @ runtime.js:1336

@mbostock
Copy link
Member Author

mbostock commented Jun 5, 2024

You’re finding all the bugs in the parent PR you already approved. 😅

The problem here is that the visibility promise assumes that the variable’s _root is an element that can be observed with IntersectionObserver, but now the _root is a comment. So either we need to change the definition of visibility (which somewhat painfully is within the Observable Runtime), or visibility needs to watch the parent element. Probably the latter but that’s maybe not quite right for inline expressions? 🤔

@mbostock mbostock requested a review from Fil June 5, 2024 14:10
Copy link
Contributor

@Fil Fil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any more issues (I even tried it on pangea).

@mbostock mbostock merged commit 060eb23 into mbostock/no-wrapper-span Jun 5, 2024
@mbostock mbostock deleted the mbostock/robust-placeholder branch June 5, 2024 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants