Question: How to prevent parsing inside web components that have empty lines? #2513

hawkticehurst · 2022-06-26T19:50:13Z

Marked version: v4.0.3

Describe the issue/question
I originally created this issue as a bug, but after comparing the Marked Demo and CommonMark Dingus HTML outputs it seems the current behavior may actually be within spec, so I'm rephrasing this as a question in the hopes that there might be a way to customize marked behavior to achieve the desired results.

I know there was a fix to #1090 a few years back, however, I think I've run into a related issue where some content inside a custom element will still be parsed if there's an empty line between the content.

To Reproduce
My particular use case is that I'm trying use a web component I made called code-block which will render a custom code block with syntax highlighting.

When there are no empty lines in the child content of a web component, the output will be parsed correctly (i.e. no parsing occurs).

Input:

<code-block>
const greeting = "Hello world";
</code-block>

Output:

<code-block>
const greeting = "Hello world";
</code-block>

However, if an empty line is introduced parsing will occur and will result in an output that breaks how the final HTML gets rendered.

Input:

<code-block>
const greeting = "Hello world";

function sayHello() {
  console.log(greeting);
}
</code-block>

Output:

<code-block>
const greeting = "Hello world";

<p>function sayHello() {
  console.log(greeting);
}
</code-block></p>

Expected behavior
All content between the starting and closing tag of a custom element should not be parsed, regardless of empty lines.

Any insights or guidance on how to achieve this would be greatly appreciated! 😊

The text was updated successfully, but these errors were encountered:

UziTech · 2022-06-26T21:06:06Z

The easiest way would be to create a custom extension that looks for HTML tags.

hawkticehurst · 2022-06-28T04:21:33Z

Thanks for the direction! I've been trying to implement a custom extension based off the docs/example and think I understand what's going on/am getting close.

As a base case/sanity check, I've created regex that successfully can capture a web component (with no new lines), verified that the token that is generated looks correct, and then render it––unaltered from the input.

I've also tested my regex and confirmed that I am correctly matching against web components that have empty lines between child content, but once I add a new line I run right back into the same issue as explained above.

It almost feels like an empty line is taking priority over a custom extension, but based off the docs that shouldn't be possible?

Regardless, anymore thoughts you might have on this would be appreciated. Thanks!

Also here's my current extension if that helps:

export const webComponent = {
  name: 'webComponent',
  level: 'block',
  start(src) {
    // Match opening web component tag with zero or more attribute/value pairs
    return src.match(/^(<[a-z]+-[a-z]+\s(.+=["'].+["'])*>)/)?.index;
  },
  tokenizer(src, _) {
    // Regex for the complete token––match web component opening tag, child 
    // content (i.e. zero or more of any character or newlines), and closing tag)
    const rule = /^(<[a-z]+-[a-z]+\s(.+=["'].+["'])*>)(.*|\n)*(<\/[a-z]+-[a-z]+>)$/;
    const match = rule.exec(src);
    if (match) {
      const token = {
        type: 'webComponent',
        raw: match[0],
        text: match[0].trim(),
        tokens: [],
      };
      return token;
    }
  },
  renderer(token) {
    // Render web component––unaltered from input
    return `${token.text}`;
  },
};

hawkticehurst · 2022-06-28T04:34:14Z

Nevermind.. while my regex is being matched in the regex tester I was using, I just realized it's not actually being matched when run in the custom extension 🤦🏻‍♂️😅 Thus it looks like the behavior I was seeing above was probably just the default.

Time to dive a bit deeper.

hawkticehurst · 2022-06-28T04:39:58Z

Actually one question I do have from the docs that I think I may be misunderstanding, what does it mean when the docs say the following?

if using a Regular Expression to detect a token, it should be anchored to the string start (^).

UziTech · 2022-06-28T16:12:18Z

src is the whole rest of the markdown string and how marked knows what to remove for the next tokenizer is based on the token.raw length value. If the extension finds a token in the middle of src instead of the beginning it will send the wrong src to the next tokenizer.

calculuschild · 2022-06-28T16:15:44Z

To add to @UziTech, in case the docs weren't clear, the way to make that happen is to use a carat symbol ^ at the beginning of your regex. You already seem to be doing this but it's easy to miss.

hawkticehurst · 2022-06-28T20:09:40Z

Thank you both for the clarification!

I still haven't been able to figure out exactly where I'm going awry, but have decided to step away from this and instead just add a script/config that makes vanilla markdown code blocks work for my needs at the moment.

I'll go ahead and close this issue at this time and maybe come back and revisit it later.

hawkticehurst changed the title ~~Question: How to prevent parsing inside web components that have empty lines~~ Question: How to prevent parsing inside web components that have empty lines? Jun 26, 2022

UziTech added the question label Jun 26, 2022

hawkticehurst closed this as completed Jun 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question: How to prevent parsing inside web components that have empty lines? #2513

Question: How to prevent parsing inside web components that have empty lines? #2513

hawkticehurst commented Jun 26, 2022 •

edited

Loading

UziTech commented Jun 26, 2022

hawkticehurst commented Jun 28, 2022 •

edited

Loading

hawkticehurst commented Jun 28, 2022 •

edited

Loading

hawkticehurst commented Jun 28, 2022

UziTech commented Jun 28, 2022

calculuschild commented Jun 28, 2022

hawkticehurst commented Jun 28, 2022

Question: How to prevent parsing inside web components that have empty lines? #2513

Question: How to prevent parsing inside web components that have empty lines? #2513

Comments

hawkticehurst commented Jun 26, 2022 • edited Loading

UziTech commented Jun 26, 2022

hawkticehurst commented Jun 28, 2022 • edited Loading

hawkticehurst commented Jun 28, 2022 • edited Loading

hawkticehurst commented Jun 28, 2022

UziTech commented Jun 28, 2022

calculuschild commented Jun 28, 2022

hawkticehurst commented Jun 28, 2022

hawkticehurst commented Jun 26, 2022 •

edited

Loading

hawkticehurst commented Jun 28, 2022 •

edited

Loading

hawkticehurst commented Jun 28, 2022 •

edited

Loading