Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add attribute support on fenced code blocks #75

Merged
merged 20 commits into from
Nov 17, 2023
Merged

Add attribute support on fenced code blocks #75

merged 20 commits into from
Nov 17, 2023

Conversation

wiltsecarpenter
Copy link
Contributor

@wiltsecarpenter wiltsecarpenter commented Oct 26, 2023

This PR changes our fenced code block syntax to support attributes in { } sections, like

 ```js {run:false}
 const x = 2;
 \```

The markdown syntax for attributes includes singleton attributes, like show, key/value attributes like show: TRUE, quoted attributes like tip="Click here", class names like .toc-level1, and ids like #id. In this PR, we are parsing these different forms, but only the "show" and "no-run" attributes do anything.

An approximate grammar for the fenced code block info string is as follows:

    info: empty | language_id | attributes | language_id ows attributes
    language_id: [\w]+
    attributes: '{' attribute_list '}'
    attribute_list: attribute | attribute_list ',' attribute
    attribute: key | key ':' value | class_name | id_name
    key: ows identifier ows
    value: ows value_content ows
    value_content: quoted_string | identifier | true | false | "true" | "false"
    class_name: '.' identifier
    id_name: '#' identifier
    identifier: [a-zA-Z] [a-zA-Z0-9-_.]*
    quoted_string: '"' ([^"] | \")* '"' | "'" ([^'] | \')* "'"
    ows: [ \t]*

Notes from design review

Decisions: Going for a more JavaScript-y / json-esque syntax for attributes

Language outside of curly braces, after 3 backticks (keeps Prettier compatibility), e.g. ```js
Attributes are in curly braces, e.g. {echo: false}
Use : (instead of =) to define attributes (e.g. echo: false, not echo=false)
Separate attributes by comma (e.g. {echo: false, eval: false})
We need to formalize attribute handling (and do a bunch of tests), write down representation of grammar
Because of expected use cases, stick with echo: false as default
Undecided: What options do we really want to support? Now & in the future?

So, all together, our current decisions for code chunk and attribute styling stand at:

// awesome JavaScript code here

References

@mootari
Copy link
Member

mootari commented Oct 28, 2023

Do we need to get to a decision in #28 before moving on here?

It seems to me that

```js {no-run}

would be equivalent to

```js

once we switch to only parsing

```{js}

@wiltsecarpenter wiltsecarpenter linked an issue Oct 30, 2023 that may be closed by this pull request
@wiltsecarpenter
Copy link
Contributor Author

Yes, in this pr js and {js} or ```js {attributes...} are equivalent. I believe that's consistent with pandoc and others, but we could make further restrictions if we want.

@mootari
Copy link
Member

mootari commented Oct 30, 2023

To add to the above, the CommonMark spec calls it the "info string" and describes it as follows:

The first word of the info string is typically used to specify the language of the code sample, and rendered in the class attribute of the code tag. However, this spec does not mandate any particular treatment of the info string.

The only restriction is:

The info string may not contain any backtick characters. (The reason for this restriction is that otherwise some inline code would be incorrectly interpreted as the beginning of a fenced code block.)

@wiltsecarpenter
Copy link
Contributor Author

I mis-stated the response before to #75 (comment),

With this change,

```js {no-run}

is equivalent to

```{js no-run}

meaning that the language can be before the { } attributes, or it can be the first attribute. This is intended to follow the syntax of pandoc, although it is not quite the same since pandoc allows the language to be specified as a class, like

```{.js}

which I didn't add here, but we could later if we wanted to.

@wiltsecarpenter wiltsecarpenter marked this pull request as ready for review October 30, 2023 22:23
@mootari
Copy link
Member

mootari commented Oct 30, 2023

Will we still be able to make

```js

behave like

```{js no-run}

at a later point, so that authors don't get drowned in errors when they add a chunk of text with static fenced code blocks to their notebooks?

@mbostock
Copy link
Member

mbostock commented Oct 30, 2023

Let’s get together and discuss this in realtime. Everyone will have opinions on this!

If we want to follow Quarto, this is the same as in vanilla Markdown:

```js
1 + 2 // this code echoes, but doesn’t execute
```

You add curly braces if you want the code to run (“execute”), the code shows (“echoes”) by default, and also implicitly outputs in the case of an expression:

```{js}
1 + 2 // this code echoes, executes, and outputs a three
```

If you don’t want to echo the code, you can either use a top-level frontmatter:

---
execute:
  echo: false
---

Or you can use a comment option:

```{js}
//| echo: false
1 + 2 // this code runs and displays a three, but doesn’t show the code
```

I’m not entirely clear whether output: false means to disable only the implicit display, or if it disables explicit displays, too… that’s also something we should figure out.

I don’t personally like the comment options (slashpipe?) — they seem quite verbose — and it sounds like @allisonhorst doesn’t either. I do like the top-level YAML frontmatter for changing the behavior for the entire notebook page, though I wouldn’t try to follow Quarto 100% (e.g., I don’t know if we need to scope the echo option under execute, or if it’s fine for it to live at the top level like the title).

So, maybe we want to deviate from Quarto in favor of the R Markdown style, where the options are part of the info string? In that case, the echo = FALSE syntax that R Markdown uses feels like an R-ism; it’s how you would pass a named argument in R (or Python). The syntax used by Quarto feels more like YAML, and more similar to JavaScript for a named option. Maybe that means something like this?

```{js echo: false}
1 + 2 // this code runs and displays a three, but doesn’t show the code
```

Or perhaps:

```{js} echo: false
1 + 2 // this code runs and displays a three, but doesn’t show the code
```

But then I’m not sure how we might also mix-in arbitrary CSS classes, ids, or HTML attributes — I’m also not sure if we even need that capability, and feel the execution options are more pressing, but of course it would be nice to have a syntax that could potentially support both if we need it. I guess another option would be CSS selector syntax:

```{js} [echo=false] .toc #example
1 + 2 // this code runs and displays a three, but doesn’t show the code
```

And then we’d run a CSS selector parser on it, like parsel — or just extend the parser you’ve already written slightly to use brackets for attributes?

Quarto also supports double curlies for unexecuted blocks but I’m not sure if we need that. (That seems the same as no curlies, except that the triple backticks are also shown in the output?)

@wiltsecarpenter
Copy link
Contributor Author

I added a few references to the PR comments to other implementations. This PR is based on the pandoc syntax (mostly), which is used in a number of places, sometimes with comma separators. I don't see any other places that use Json-style syntax though.

@allisonhorst
Copy link
Contributor

A bit more context from Quarto docs re: comment options #| and //|: "Quarto uses this approach to both better accommodate longer options like fig-cap, fig-subcap, and fig-alt as well as to make it straightforward to edit chunk options within more structured editors that don’t have an easy way to edit chunk metadata (e.g. most traditional notebook UIs)."

I think both are useful considerations.

@mbostock
Copy link
Member

Thanks @allisonhorst, that’s helpful context! If I understand correctly, those motivations don’t apply to us: we use Plot options for figure captions, titles, and alt text; and we don’t have a cell-based editor, but instead now use a file-based editor for code blocks.

Looking forward to discussing in real-time.

@allisonhorst
Copy link
Contributor

@mbostock
Copy link
Member

mbostock commented Nov 16, 2023

I added more tests, in part so I can understand the logic here. My current thinking is:

  • Remove support for specifying classes (not currently needed)
  • Remove support for specifying ids (not currently needed)
  • Think more about whether our options should mirror Quarto (execute, display, echo)
  • Think more about unquoted attribute values
  • Think more about degenerate cases

In particular, I think the current interpretation of these inputs are probably confusing:

{string: "true"}
{date: 2021/01/01}
{a b, c}

I’d like to lean on precedent here, but I haven’t decided yet if the attributes should behave more like YAML (but comma-separated?), HTML, or JavaScript/JSON. It feels a little none of the above right now, and which increases the burden on us to document, explain, and test. I’ll think more before deciding.

@mbostock mbostock self-assigned this Nov 16, 2023
@mbostock
Copy link
Member

I also don’t know yet if we want to support arbitrary attributes, or if we should instead parse for known attributes — for example knowing that the show attribute must be a boolean, and warning if you give it an invalid value. And for that matter, whether we want to issue warnings or errors for invalid inputs, as I think folks may be confused why the attributes aren’t having an effect if we silently ignore malformed input.

@wiltsecarpenter
Copy link
Contributor Author

As you know, my initial assumption was that we would want to follow industry precedents for markdown syntax such as what is supported by pandoc and markdown-it and that this would be easiest for people to learn about and potentially fit in with existing editor extensions. Those other systems allow attributes to be specified on heading tags, lists, and inline code as well as fenced code blocks. The design of the code here is to separate the syntax parsing of attributes from their semantic application with the further assumption that there would be follow-on PRs that would add more options over time. It sounds though like you have some concerns that there could be some kind of un-intended consequences.

@mbostock mbostock changed the title Add { } attribute support on fenced code blocks Add attribute support on fenced code blocks Nov 16, 2023
@mbostock
Copy link
Member

mbostock commented Nov 16, 2023

I’ve rewritten this so that it now parses attributes using HTML-style syntax, borrowing code from Hypertext Literal to ensure that this matches the behavior of the HTML5 specification (as much as possible — obviously we’re not parsing HTML tags here).

The syntax looks like this:

```tag name1 name2=value
code
```

You can also use quoted values if desired, either single or double quotes. For example:

```tag name="value"
code
```

I agree there is a strong desire for precedent. But there are many, many subtleties when writing a parser, and while the last proposal was similar to Pandoc’s fenced code block options, it had many quirks that we would have had to document and test.

I don’t want to invent a new DSL here. And while I considered parsing the info string as JavaScript or JSON, that would be quite a bit more verbose and strict than we intend. So I’ve chosen to adopt HTML attribute syntax which supports shorthand attributes (with no values), doesn’t require commas between attributes, and allows quoted values if needed. This gives us support for arbitrary options, even though we only need show, run, and display (or equivalents) to start.

As for exposing the ability to set arbitrary CSS classes, identifier, etc., we can do that by saying:

```tag class=whatever id=foo
code
```

we could also support data attributes this way, if desired. But I want to be careful about arbitrary passthrough of attributes, since the special ones we’ve added (show, run, and display) don’t map to HTML attributes. So it’s better to think of this as a fixed set of attributes that we expose than a way to set arbitrary attributes to the generated HTML. If you really need the latter, you want an inline expression ${…} and write the tags yourself.

@mbostock mbostock merged commit 747e6ae into main Nov 17, 2023
1 check passed
@mbostock mbostock deleted the wiltse/attrs branch November 17, 2023 05:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Finalize syntax (e.g., ```js {show}) to denote live code blocks
4 participants