Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support custom extensions "interrupting" built-in tokens #3435

Open
calculuschild opened this issue Aug 29, 2024 · 4 comments
Open

Support custom extensions "interrupting" built-in tokens #3435

calculuschild opened this issue Aug 29, 2024 · 4 comments
Labels

Comments

@calculuschild
Copy link
Contributor

calculuschild commented Aug 29, 2024

What pain point are you perceiving?.
Not sure the best way to describe this. So currently, custom extensions have the start property which we use to interrupt the paragraph element. But there are other tokens that are interruptable according to the Commonmark/GFM spec. For example, GFM Tables must end when they encounter another block-level token.

The difficulty comes with enforcing that rule for custom extensions. Say I make a new block-level token via custom extensions

{{block
....
}}

If this were placed immediately after a Table, the table would just consume it, because it does not interact with the start property in the same way that paragraph does. You could roll your own Table tokenizer that does nothing but except add a few more characters to the Rules regex, but this seems like a lot of effort just to make your extension compatible with GFM rules.

Describe the solution you'd like
I really don't know how this would be implemented, but the desire would be a way for an extension to signal which tokens it can interrupt. Or, maybe better the other way around, allow a token to specify which types of other tokens can interrupt it.

One thing to consider, is that each token is also a little different in terms of at what points it can be interrupted. Blockquotes can only be interrupted during the "lazy continuation" step. Paragraphs can be interrupted any time. Tables can only interrupted if the line starts without |. Not every token can be interrupted by the same kinds of tokens.

I kind of hacked my way around this for Tables using my own extension Marked-Extended-Tables by allowing the user to input "termination" regex that would be appended to the tokenizer and cause table to stop lexing on that line.

https://github.com/calculuschild/marked-extended-tables/blob/9e56b24598e07de71e225d6c50a50d40c366965f/src/index.js#L23-L25

Not sure if this is the easiest way to go about it, but the trickiest part is somehow applying that to the built-in tokens without just ending up rewriting every tokenizer anyway.

Mostly I'm just kind of stumped on any better way to do this.

@calculuschild calculuschild changed the title Support custom extensions "interrupting" other tokens Support custom extensions "interrupting" built-in tokens Aug 29, 2024
@UziTech
Copy link
Member

UziTech commented Aug 29, 2024

The way we interrupt paragraph is by clipping src when passing it into the tokenizer

// prevent paragraph consuming extensions by clipping 'src' to extension start

We could do something similar with other tokenizers.

Although I'm not sure this is needed if we just say built in tokens take precedence over custom tokens. In well formatted markdown every block token should be separated by a blank line. The only reason start is actually needed is for inline tokens.

@UziTech
Copy link
Member

UziTech commented Aug 29, 2024

For example the katex extensions block tokenizer does not have a start function because we are expecting a blank line before it so even a paragraph takes precedence.

https://github.com/UziTech/marked-katex-extension/blob/main/src/index.js#L63

@calculuschild
Copy link
Contributor Author

calculuschild commented Aug 29, 2024

The way we interrupt paragraph is by clipping src when passing it into the tokenizer

I remember. I wrote that. 😜

In well formatted markdown every block token should be separated by a blank line.

Pretty markdown might, but the specs still make it clear that it is valid to place certain block tokens directly against each other. demo example

The only reason start is actually needed is for inline tokens.

Remember, we have separate handling for paragraphs and inline text. Paragraphs are clipped by block tokens

if (this.options.extensions && this.options.extensions.startBlock) {
, and inline text is clipped by inline tokens
if (this.options.extensions && this.options.extensions.startInline) {
. They are both needed.

We could do something similar with other tokenizers.

If we did, I think it would only need to be tables and blockquotes to keep with the GFM spec. The other block tokens have a clear ending symbol (fences), or are allowed to just absorb the block tokens (lists). Maybe that's not too bad?

@UziTech
Copy link
Member

UziTech commented Aug 29, 2024

Remember, we have separate handling for paragraphs and inline text. Paragraphs are clipped by block tokens. They are both needed.

The block tokenizer start function is not needed if you don't need to interrupt a paragraph. Paragraphs are automatically interrupted by blank lines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants