Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Italic/bold is treated as a list, When Italic/bold is written on the next line of the list #1980

Closed
ghost opened this issue Mar 23, 2021 · 9 comments · Fixed by #2431
Closed
Labels
category: lists L2 - annoying Similar to L1 - broken but there is a known workaround available for the issue released

Comments

@ghost
Copy link

ghost commented Mar 23, 2021

Marked version:

master

Describe the bug

To be exact, this problem is
After recognizing the list, determine whether the next line is the next list only by the symbol(without space) of the list.

In detail
The next line after the line recognized as a list
The next line is the symbol(* ,+ ,- ,1.) used in the list followed by a character without space.
In the above case, marked determines that the next line is also a list.

Therefore, "*" is used in the list symbol, so if there is an "*" in the next line, it is judged as a list.

To Reproduce

editor

Expected behavior

I applied the following self-made patch to this problem.

diff --git a/src/rules.js b/src/rules.js
index d3175e8..a83780d 100644
--- a/src/rules.js
+++ b/src/rules.js
@@ -43,7 +43,7 @@ block.def = edit(block.def)
   .getRegex();
 
 block.bullet = /(?:[*+-]|\d{1,9}[.)])/;
-block.item = /^( *)(bull) ?[^\n]*(?:\n(?! *bull ?)[^\n]*)*/;
+block.item = /^( *)(bull) +[^\n]*(?:\n(?! *bull +)[^\n]*)*/;
 block.item = edit(block.item, 'gm')
   .replace(/bull/g, block.bullet)
   .getRegex();

note

This sentence is through google translate.
Did you get the content?

@UziTech
Copy link
Member

UziTech commented Mar 23, 2021

You "editor" link doesn't have a * in it. I am confused by what the problem markdown is.

Can you provide the markdown that has a problem?

@UziTech
Copy link
Member

UziTech commented Mar 23, 2021

Thanks for the reproduction.

marked demo
commonmark demo

It looks like this may be unspecified behavior. Are you looking for something closer to the commonmark demo where the bold line is part of the previous list item or is the bold not supposed to be in the list at all?

@UziTech UziTech added category: lists L2 - annoying Similar to L1 - broken but there is a known workaround available for the issue and removed need more info labels Mar 23, 2021
@ghost
Copy link
Author

ghost commented Mar 24, 2021

I think this issue is a list-only issue.
Because, as I wrote in In detail, marked doesn't care about the space needed in the list.
This is the main problem.

As a test, compare "marked" and "commonmark" with list problems (without Italic / bold)

marked demo

commonmark demo

For commonmark, each list has 2 lists.
For markd, each list has 3 lists.
The number of lists is different.

This is the cause of "Italic/bold is treated as a list"

@UziTech
Copy link
Member

UziTech commented Mar 24, 2021

If you would like to create a PR we could get your patch merged.

@calculuschild
Copy link
Contributor

Looks like this is mostly solved by #2112. There is still a difference of Marked creating <p> tags around the "not-list" items and Commomark leaving them as inline text, but at least Italics / Bold are no longer treated as list items.

@jangxyz
Copy link
Contributor

jangxyz commented Mar 28, 2022

I have came upon this issue as well, having problems like the demo above.

TLDR;

  1. Why marked is not parsing unspaced bullets correclty.
  2. What others are involved in the problem.
  3. I have a fix.

What is wrong

Here's what I have understood through reading the list block parser in Tokenzier (Tokenzier.js#165).

  1. while looping through lines,
  2. checks whether this line looks like an item with itemRegex pattern.
  3. check the next line if it looks like an item with nextBulletRegex. (Another while loop)
    If it is, that's it for this item. If it isn't, keep adding it up to the current item.

I've found that while itemRegex testing for an item bullet is okay, nextBulletRegex is incomplete, thus mistakenly stopping when meeting a bullet without a space.

var itemRegex = new RegExp(`^( {0,3}${bull})((?: [^\\n]*)?(?:\\n|$))`);
// |^( {0,3}${bull})((?: [^\\n]*)?(?:\\n|$))|
//   1. |^|          : start of sentence
//   2. | {0,3}|     : 0~3 spaces available
//   3. |${bull}|    : pre-computed bullet character
//   4. |( [^\\n]*)?|: a space, optionally followed by non-newline characters (optional)
//   5. |(\\n|$)|    : end of sentence, or new line

const nextBulletRegex = new RegExp(`^ {0,${Math.min(3, indent - 1)}}(?:[*+-]|\\d{1,9}[.)])`);
//  |^ {0,${Math.min(3, indent - 1)}}(?:[*+-]|\\d{1,9}[.)])|
//    1. |^ |                            : start of sentence, followed by a space
//    2. |{0,${Math.min(3, indent - 1)}}|: zero to some predefined indent number of spaces (string evaluation)
//    3. |([*+-]|\\d{1,9}[.)])|          : any item bullet, either ordered or unordered
//
//    NOTE does not check whether there is any whitespace after the bullet.

Say we have a smaller sample (demo):

- item1
- item2
-Not a list item(without space)
- item3

The current implementation of nextBulletRegex does not check whether there is a whitespace after the bullet, and breaks out the loop as it meets "-Not a list item(without space)".

marked/src/Tokenizer.js

Lines 241 to 243 in 4c5b974

if (nextBulletRegex.test(line)) {
break;
}

On next iteration it figures that "-Not a list item(without space)" is just a text and ends the list, and starts a new paragraph.

The fix (or, fixes)

I tried changing nextBulletRegex to check for an additional whitespace like itemRegex does.

//const nextBulletRegex = new RegExp(`^ {0,${Math.min(3, indent - 1)}}(?:[*+-]|\\d{1,9}[.)])`);
const nextBulletRegex = new RegExp(`^ {0,${Math.min(3, indent - 1)}}(?:[*+-]|\\d{1,9}[.)])((?: [^\\n]*)?(?:\\n|$))`);

Indeed this will render the above sample correctly,
but unfortunately broke other tests, colliding with

thematic breaks (---):

Commonmark Example#57

- foo
***
- bar

Example#99

- foo
-----
setext headings (---):

Commonmark Example#300

- # Foo
- Bar
  ---
  baz

These all have the form of 'bullet-like' block syntax inside the list, mingled with various indent options and precedences over each other (setext heading has higher order than thematic breaks).

Maybe the reason nextBulletRegex did not have any whitespace in the first place was to make it compatible with these other syntaxes. However, it does not solve the problem in this issue, and at minimum is a misnomer. I think it should be fixed.

Additional Touch

I did manage to make all tests pass, with adding additional logic inside the second loop.

while (src) {
  rawLine = src.split('\n', 1)[0];
  prevLine = line;
  line = rawLine;

  // ...

  if (this.rules.block.hr.test(line)) {
    // make sure it is not a setext heading, which takes precedence
    const twoLines = [prevLine, line].join('\n');
    const matchLheadingPattern = this.rules.block.lheading.test(twoLines);
    const lineIdentIsLargerThanPrev = line.search(/[^ ]/) >= indent;
    const isNextLineSetextHeading = matchLheadingPattern && lineIdentIsLargerThanPrev;
    if (!isNextLineSetextHeading) {
      break;
    }
  }

  // End list item if found start of new bullet
  if (nextBulletRegex.test(line)) {
    break;
  }

  // ...
}

Adding this code solves the issue, and make all the tests pass as well.

I am a bit afraid that the list tokenizer is getting more and more complex, but honestly I don't know if there is any other way (Frankly speaking it seems that markdown syntax is complicated by nature). If there is a better way to do it in marked, I would like to hear about it. Or if this is fine to go on and make a PR?

Thanks.

jangxyz added a commit to jangxyz/marked that referenced this issue Mar 28, 2022
This makes marked render correctly with the following text:

```markdown
- item1
- item2
-Not a list item(without space)
- item3
```

Currently:
```html
<ul>
<li>item1</li>
<li>item2</li>
</ul>
<p>-Not a list item(without space)</p>
<ul>
<li>item3</li>
</ul>
```

Which should be:
```html
<ul>
<li>item1</li>
<li>item2
-Not a list item(without space)</li>
<li>item3</li>
</ul>
```

This is the rendered result for commonmark as well.

Changes:
- `nextBulletRegex` checks for whitespace after the bullet
- separately checks for hr or lheading
@UziTech
Copy link
Member

UziTech commented Mar 30, 2022

@jangxyz a PR would be appreciated 😁👍

jangxyz added a commit to jangxyz/marked that referenced this issue Apr 3, 2022
This makes marked render correctly with the following text:

```markdown
- item1
- item2
-Not a list item(without space)
- item3
```

Currently:
```html
<ul>
<li>item1</li>
<li>item2</li>
</ul>
<p>-Not a list item(without space)</p>
<ul>
<li>item3</li>
</ul>
```

Which should be:
```html
<ul>
<li>item1</li>
<li>item2
-Not a list item(without space)</li>
<li>item3</li>
</ul>
```

This is the rendered result for commonmark as well.

Changes:
- `nextBulletRegex` checks for whitespace after the bullet
- separately checks for hr or lheading
@github-actions
Copy link

github-actions bot commented May 2, 2022

🎉 This issue has been resolved in version 4.0.15 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: lists L2 - annoying Similar to L1 - broken but there is a known workaround available for the issue released
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants