Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-alphanumeric with format is not properly parsed when connected to an alphanumeric string #773

Open
tomerlichtash opened this issue Jul 22, 2024 · 1 comment

Comments

@tomerlichtash
Copy link

String with non-alphanumeric formatted content which has a next-char of an alpha-numeric is tokenized as text node, instead of into a series of format nodes as expected.

Problem reproduced on CommonMark online demo (to reproduce just paste **@**A there and compare with **@** A).

Example:
While all these samples are tokenize as expected:

**@**@ => formatted non-alphanumeric + non-alphanumeric
@**@** => non-alphanumeric + formatted non-alphanumeric
@**A** => formatted non-alphanumeric + non-alphanumeric
**A** @ => formatted alphanumeric + space + non-alphanumeric

This sample will be tokenized into a text node and will not be parsed: **@**A (formatted non-alphanumeric + alphanumeric)

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document SYSTEM "CommonMark.dtd">

<document xmlns="http://commonmark.org/xml/1.0">
  <paragraph>
    <text>**</text>
    <text>@</text>
    <text>**</text>
    <text>A</text>
  </paragraph>
</document>

Add a space between formatted non-alphanumeric and alpha-numeric and compare tokenization for string **A** @:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document SYSTEM "CommonMark.dtd">

<document xmlns="http://commonmark.org/xml/1.0">
  <paragraph>
    <strong>
      <text>@</text>
    </strong>
    <text> A</text>
  </paragraph>
</document>
@jgm
Copy link
Member

jgm commented Jul 28, 2024

Are you claiming that the parser doesn't properly implement the spec, or are you suggesting a change to the spec? If the latter, please examine the current rules and be specific about the change you'd recommend, recognizing that any change that "fixes" this case may break other things.

Unfortunately, the way commonmark / Markdown is designed, it is difficult to avoid some "blind spots" like this. See my essay Beyond Markdown, item 1. My project https://djot.net attempts to fix some of these issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants