Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: How to prevent parsing inside web components that have empty lines? #2513

Closed
hawkticehurst opened this issue Jun 26, 2022 · 7 comments
Labels

Comments

@hawkticehurst
Copy link

hawkticehurst commented Jun 26, 2022

Marked version: v4.0.3

Describe the issue/question
I originally created this issue as a bug, but after comparing the Marked Demo and CommonMark Dingus HTML outputs it seems the current behavior may actually be within spec, so I'm rephrasing this as a question in the hopes that there might be a way to customize marked behavior to achieve the desired results.

I know there was a fix to #1090 a few years back, however, I think I've run into a related issue where some content inside a custom element will still be parsed if there's an empty line between the content.

To Reproduce
My particular use case is that I'm trying use a web component I made called code-block which will render a custom code block with syntax highlighting.

When there are no empty lines in the child content of a web component, the output will be parsed correctly (i.e. no parsing occurs).

Input:

<code-block>
const greeting = "Hello world";
</code-block>

Output:

<code-block>
const greeting = "Hello world";
</code-block>

However, if an empty line is introduced parsing will occur and will result in an output that breaks how the final HTML gets rendered.

Input:

<code-block>
const greeting = "Hello world";

function sayHello() {
  console.log(greeting);
}
</code-block>

Output:

<code-block>
const greeting = "Hello world";

<p>function sayHello() {
  console.log(greeting);
}
</code-block></p>

Expected behavior
All content between the starting and closing tag of a custom element should not be parsed, regardless of empty lines.

Any insights or guidance on how to achieve this would be greatly appreciated! 😊

@hawkticehurst hawkticehurst changed the title Question: How to prevent parsing inside web components that have empty lines Question: How to prevent parsing inside web components that have empty lines? Jun 26, 2022
@UziTech
Copy link
Member

UziTech commented Jun 26, 2022

The easiest way would be to create a custom extension that looks for HTML tags.

@hawkticehurst
Copy link
Author

hawkticehurst commented Jun 28, 2022

Thanks for the direction! I've been trying to implement a custom extension based off the docs/example and think I understand what's going on/am getting close.

As a base case/sanity check, I've created regex that successfully can capture a web component (with no new lines), verified that the token that is generated looks correct, and then render it––unaltered from the input.

I've also tested my regex and confirmed that I am correctly matching against web components that have empty lines between child content, but once I add a new line I run right back into the same issue as explained above.

It almost feels like an empty line is taking priority over a custom extension, but based off the docs that shouldn't be possible?

Regardless, anymore thoughts you might have on this would be appreciated. Thanks!

Also here's my current extension if that helps:

export const webComponent = {
  name: 'webComponent',
  level: 'block',
  start(src) {
    // Match opening web component tag with zero or more attribute/value pairs
    return src.match(/^(<[a-z]+-[a-z]+\s(.+=["'].+["'])*>)/)?.index;
  },
  tokenizer(src, _) {
    // Regex for the complete token––match web component opening tag, child 
    // content (i.e. zero or more of any character or newlines), and closing tag)
    const rule = /^(<[a-z]+-[a-z]+\s(.+=["'].+["'])*>)(.*|\n)*(<\/[a-z]+-[a-z]+>)$/;
    const match = rule.exec(src);
    if (match) {
      const token = {
        type: 'webComponent',
        raw: match[0],
        text: match[0].trim(),
        tokens: [],
      };
      return token;
    }
  },
  renderer(token) {
    // Render web component––unaltered from input
    return `${token.text}`;
  },
};

@hawkticehurst
Copy link
Author

hawkticehurst commented Jun 28, 2022

Nevermind.. while my regex is being matched in the regex tester I was using, I just realized it's not actually being matched when run in the custom extension 🤦🏻‍♂️😅 Thus it looks like the behavior I was seeing above was probably just the default.

Time to dive a bit deeper.

@hawkticehurst
Copy link
Author

Actually one question I do have from the docs that I think I may be misunderstanding, what does it mean when the docs say the following?

if using a Regular Expression to detect a token, it should be anchored to the string start (^).

@UziTech
Copy link
Member

UziTech commented Jun 28, 2022

src is the whole rest of the markdown string and how marked knows what to remove for the next tokenizer is based on the token.raw length value. If the extension finds a token in the middle of src instead of the beginning it will send the wrong src to the next tokenizer.

@calculuschild
Copy link
Contributor

To add to @UziTech, in case the docs weren't clear, the way to make that happen is to use a carat symbol ^ at the beginning of your regex. You already seem to be doing this but it's easy to miss.

@hawkticehurst
Copy link
Author

Thank you both for the clarification!

I still haven't been able to figure out exactly where I'm going awry, but have decided to step away from this and instead just add a script/config that makes vanilla markdown code blocks work for my needs at the moment.

I'll go ahead and close this issue at this time and maybe come back and revisit it later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants