Non-alphanumeric with format is not properly parsed when connected to an alphanumeric string #773

tomerlichtash · 2024-07-22T12:41:56Z

String with non-alphanumeric formatted content which has a next-char of an alpha-numeric is tokenized as text node, instead of into a series of format nodes as expected.

Problem reproduced on CommonMark online demo (to reproduce just paste **@**A there and compare with **@** A).

Example:
While all these samples are tokenize as expected:

**@**@ => formatted non-alphanumeric + non-alphanumeric
@**@** => non-alphanumeric + formatted non-alphanumeric
@**A** => formatted non-alphanumeric + non-alphanumeric
**A** @ => formatted alphanumeric + space + non-alphanumeric

This sample will be tokenized into a text node and will not be parsed: **@**A (formatted non-alphanumeric + alphanumeric)

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document SYSTEM "CommonMark.dtd">

<document xmlns="http://commonmark.org/xml/1.0">
  <paragraph>
    <text>**</text>
    <text>@</text>
    <text>**</text>
    <text>A</text>
  </paragraph>
</document>

Add a space between formatted non-alphanumeric and alpha-numeric and compare tokenization for string **A** @:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document SYSTEM "CommonMark.dtd">

<document xmlns="http://commonmark.org/xml/1.0">
  <paragraph>
    <strong>
      <text>@</text>
    </strong>
    <text> A</text>
  </paragraph>
</document>

The text was updated successfully, but these errors were encountered:

jgm · 2024-07-28T16:26:34Z

Are you claiming that the parser doesn't properly implement the spec, or are you suggesting a change to the spec? If the latter, please examine the current rules and be specific about the change you'd recommend, recognizing that any change that "fixes" this case may break other things.

Unfortunately, the way commonmark / Markdown is designed, it is difficult to avoid some "blind spots" like this. See my essay Beyond Markdown, item 1. My project https://djot.net attempts to fix some of these issues.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-alphanumeric with format is not properly parsed when connected to an alphanumeric string #773

Non-alphanumeric with format is not properly parsed when connected to an alphanumeric string #773

tomerlichtash commented Jul 22, 2024

jgm commented Jul 28, 2024

Non-alphanumeric with format is not properly parsed when connected to an alphanumeric string #773

Non-alphanumeric with format is not properly parsed when connected to an alphanumeric string #773

Comments

tomerlichtash commented Jul 22, 2024

jgm commented Jul 28, 2024