Italic/bold is treated as a list, When Italic/bold is written on the next line of the list #1980

ghost · 2021-03-23T13:24:26Z

Marked version:

master

Describe the bug

To be exact, this problem is
After recognizing the list, determine whether the next line is the next list only by the symbol(without space) of the list.

In detail
The next line after the line recognized as a list
The next line is the symbol(* ,+ ,- ,1.) used in the list followed by a character without space.
In the above case, marked determines that the next line is also a list.

Therefore, "*" is used in the list symbol, so if there is an "*" in the next line, it is judged as a list.

To Reproduce

editor

Expected behavior

I applied the following self-made patch to this problem.

diff --git a/src/rules.js b/src/rules.js
index d3175e8..a83780d 100644
--- a/src/rules.js
+++ b/src/rules.js
@@ -43,7 +43,7 @@ block.def = edit(block.def)
   .getRegex();
 
 block.bullet = /(?:[*+-]|\d{1,9}[.)])/;
-block.item = /^( *)(bull) ?[^\n]*(?:\n(?! *bull ?)[^\n]*)*/;
+block.item = /^( *)(bull) +[^\n]*(?:\n(?! *bull +)[^\n]*)*/;
 block.item = edit(block.item, 'gm')
   .replace(/bull/g, block.bullet)
   .getRegex();

note

This sentence is through google translate.
Did you get the content?

The text was updated successfully, but these errors were encountered:

UziTech · 2021-03-23T13:57:08Z

You "editor" link doesn't have a * in it. I am confused by what the problem markdown is.

Can you provide the markdown that has a problem?

ghost · 2021-03-23T14:33:27Z

Sorry,

To Reproduce

https://marked.js.org/demo/?text=*%20list1%0A*Italic%2Fnot%20list*%0A*%20list2%0A%0A%0A*%20list1%0A**bold%2Fnot%20list**%0A*%20list2%0A%0A%0A1.%20list1%0A1.not%20list%0A1.%20list2%0A%0A%0A%0A1.%20%5Bx%5D%20list1%0A1.%5Bx%5D%20not%20list%0A1.%20%5Bx%5D%20list2&options=%7B%0A%20%22baseUrl%22%3A%20null%2C%0A%20%22breaks%22%3A%20false%2C%0A%20%22gfm%22%3A%20true%2C%0A%20%22headerIds%22%3A%20true%2C%0A%20%22headerPrefix%22%3A%20%22%22%2C%0A%20%22highlight%22%3A%20null%2C%0A%20%22langPrefix%22%3A%20%22language-%22%2C%0A%20%22mangle%22%3A%20true%2C%0A%20%22pedantic%22%3A%20false%2C%0A%20%22sanitize%22%3A%20false%2C%0A%20%22sanitizer%22%3A%20null%2C%0A%20%22silent%22%3A%20false%2C%0A%20%22smartLists%22%3A%20false%2C%0A%20%22smartypants%22%3A%20false%2C%0A%20%22tokenizer%22%3A%20null%2C%0A%20%22walkTokens%22%3A%20null%2C%0A%20%22xhtml%22%3A%20false%0A%7D&version=master

UziTech · 2021-03-23T19:12:34Z

Thanks for the reproduction.

marked demo
commonmark demo

It looks like this may be unspecified behavior. Are you looking for something closer to the commonmark demo where the bold line is part of the previous list item or is the bold not supposed to be in the list at all?

ghost · 2021-03-24T03:12:37Z

I think this issue is a list-only issue.
Because, as I wrote in In detail, marked doesn't care about the space needed in the list.
This is the main problem.

As a test, compare "marked" and "commonmark" with list problems (without Italic / bold)

marked demo

commonmark demo

For commonmark, each list has 2 lists.
For markd, each list has 3 lists.
The number of lists is different.

This is the cause of "Italic/bold is treated as a list"

UziTech · 2021-03-24T13:03:31Z

If you would like to create a PR we could get your patch merged.

calculuschild · 2021-07-28T02:11:35Z

Looks like this is mostly solved by #2112. There is still a difference of Marked creating  tags around the "not-list" items and Commomark leaving them as inline text, but at least Italics / Bold are no longer treated as list items.

jangxyz · 2022-03-28T10:50:30Z

I have came upon this issue as well, having problems like the demo above.

TLDR;

Why marked is not parsing unspaced bullets correclty.
What others are involved in the problem.
I have a fix.

What is wrong

Here's what I have understood through reading the list block parser in Tokenzier (Tokenzier.js#165).

while looping through lines,
checks whether this line looks like an item with itemRegex pattern.
check the next line if it looks like an item with nextBulletRegex. (Another while loop)
If it is, that's it for this item. If it isn't, keep adding it up to the current item.

I've found that while itemRegex testing for an item bullet is okay, nextBulletRegex is incomplete, thus mistakenly stopping when meeting a bullet without a space.

var itemRegex = new RegExp(`^( {0,3}${bull})((?: [^\\n]*)?(?:\\n|$))`);
// |^( {0,3}${bull})((?: [^\\n]*)?(?:\\n|$))|
//   1. |^|          : start of sentence
//   2. | {0,3}|     : 0~3 spaces available
//   3. |${bull}|    : pre-computed bullet character
//   4. |( [^\\n]*)?|: a space, optionally followed by non-newline characters (optional)
//   5. |(\\n|$)|    : end of sentence, or new line

const nextBulletRegex = new RegExp(`^ {0,${Math.min(3, indent - 1)}}(?:[*+-]|\\d{1,9}[.)])`);
//  |^ {0,${Math.min(3, indent - 1)}}(?:[*+-]|\\d{1,9}[.)])|
//    1. |^ |                            : start of sentence, followed by a space
//    2. |{0,${Math.min(3, indent - 1)}}|: zero to some predefined indent number of spaces (string evaluation)
//    3. |([*+-]|\\d{1,9}[.)])|          : any item bullet, either ordered or unordered
//
//    NOTE does not check whether there is any whitespace after the bullet.

Say we have a smaller sample (demo):

- item1
- item2
-Not a list item(without space)
- item3

The current implementation of nextBulletRegex does not check whether there is a whitespace after the bullet, and breaks out the loop as it meets "-Not a list item(without space)".

marked/src/Tokenizer.js

Lines 241 to 243 in 4c5b974

    
           if (nextBulletRegex.test(line)) { 
        
             break; 
        
           }

On next iteration it figures that "-Not a list item(without space)" is just a text and ends the list, and starts a new paragraph.

The fix (or, fixes)

I tried changing nextBulletRegex to check for an additional whitespace like itemRegex does.

//const nextBulletRegex = new RegExp(`^ {0,${Math.min(3, indent - 1)}}(?:[*+-]|\\d{1,9}[.)])`);
const nextBulletRegex = new RegExp(`^ {0,${Math.min(3, indent - 1)}}(?:[*+-]|\\d{1,9}[.)])((?: [^\\n]*)?(?:\\n|$))`);

Indeed this will render the above sample correctly,
but unfortunately broke other tests, colliding with

thematic breaks (---):

Commonmark Example#57

- foo
***
- bar

Example#99

- foo
-----

setext headings (---):

Commonmark Example#300

- # Foo
- Bar
  ---
  baz

These all have the form of 'bullet-like' block syntax inside the list, mingled with various indent options and precedences over each other (setext heading has higher order than thematic breaks).

Maybe the reason nextBulletRegex did not have any whitespace in the first place was to make it compatible with these other syntaxes. However, it does not solve the problem in this issue, and at minimum is a misnomer. I think it should be fixed.

Additional Touch

I did manage to make all tests pass, with adding additional logic inside the second loop.

while (src) {
  rawLine = src.split('\n', 1)[0];
  prevLine = line;
  line = rawLine;

  // ...

  if (this.rules.block.hr.test(line)) {
    // make sure it is not a setext heading, which takes precedence
    const twoLines = [prevLine, line].join('\n');
    const matchLheadingPattern = this.rules.block.lheading.test(twoLines);
    const lineIdentIsLargerThanPrev = line.search(/[^ ]/) >= indent;
    const isNextLineSetextHeading = matchLheadingPattern && lineIdentIsLargerThanPrev;
    if (!isNextLineSetextHeading) {
      break;
    }
  }

  // End list item if found start of new bullet
  if (nextBulletRegex.test(line)) {
    break;
  }

  // ...
}

Adding this code solves the issue, and make all the tests pass as well.

I am a bit afraid that the list tokenizer is getting more and more complex, but honestly I don't know if there is any other way (Frankly speaking it seems that markdown syntax is complicated by nature). If there is a better way to do it in marked, I would like to hear about it. Or if this is fine to go on and make a PR?

Thanks.

This makes marked render correctly with the following text: ```markdown - item1 - item2 -Not a list item(without space) - item3 ``` Currently: ```html <ul> <li>item1</li> <li>item2</li> </ul> -Not a list item(without space) <ul> <li>item3</li> </ul> ``` Which should be: ```html <ul> <li>item1</li> <li>item2 -Not a list item(without space)</li> <li>item3</li> </ul> ``` This is the rendered result for commonmark as well. Changes: - `nextBulletRegex` checks for whitespace after the bullet - separately checks for hr or lheading

UziTech · 2022-03-30T05:58:15Z

@jangxyz a PR would be appreciated 😁👍

This makes marked render correctly with the following text: ```markdown - item1 - item2 -Not a list item(without space) - item3 ``` Currently: ```html <ul> <li>item1</li> <li>item2</li> </ul> -Not a list item(without space) <ul> <li>item3</li> </ul> ``` Which should be: ```html <ul> <li>item1</li> <li>item2 -Not a list item(without space)</li> <li>item3</li> </ul> ``` This is the rendered result for commonmark as well. Changes: - `nextBulletRegex` checks for whitespace after the bullet - separately checks for hr or lheading

github-actions · 2022-05-02T06:15:25Z

🎉 This issue has been resolved in version 4.0.15 🎉

The release is available on:

Your semantic-release bot 📦🚀

UziTech added the need more info label Mar 23, 2021

UziTech added category: lists L2 - annoying Similar to L1 - broken but there is a known workaround available for the issue and removed need more info labels Mar 23, 2021

jangxyz mentioned this issue Apr 3, 2022

fix: list item bullet without whitespace #2431

Merged

5 tasks

UziTech closed this as completed in #2431 May 2, 2022

github-actions bot added the released label May 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Italic/bold is treated as a list, When Italic/bold is written on the next line of the list #1980

Italic/bold is treated as a list, When Italic/bold is written on the next line of the list #1980

ghost commented Mar 23, 2021 •

edited by ghost

Loading

UziTech commented Mar 23, 2021

ghost commented Mar 23, 2021 •

edited by ghost

Loading

UziTech commented Mar 23, 2021

ghost commented Mar 24, 2021 •

edited by ghost

Loading

UziTech commented Mar 24, 2021

calculuschild commented Jul 28, 2021

jangxyz commented Mar 28, 2022

UziTech commented Mar 30, 2022

github-actions bot commented May 2, 2022

Italic/bold is treated as a list, When Italic/bold is written on the next line of the list #1980

Italic/bold is treated as a list, When Italic/bold is written on the next line of the list #1980

Comments

ghost commented Mar 23, 2021 • edited by ghost Loading

UziTech commented Mar 23, 2021

ghost commented Mar 23, 2021 • edited by ghost Loading

UziTech commented Mar 23, 2021

ghost commented Mar 24, 2021 • edited by ghost Loading

UziTech commented Mar 24, 2021

calculuschild commented Jul 28, 2021

jangxyz commented Mar 28, 2022

What is wrong

The fix (or, fixes)

Additional Touch

UziTech commented Mar 30, 2022

github-actions bot commented May 2, 2022

ghost commented Mar 23, 2021 •

edited by ghost

Loading

ghost commented Mar 23, 2021 •

edited by ghost

Loading

ghost commented Mar 24, 2021 •

edited by ghost

Loading