Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quadratic behavior when parsing inlines #373

Closed
nwellnhof opened this issue Dec 19, 2020 · 5 comments · Fixed by #378
Closed

Quadratic behavior when parsing inlines #373

nwellnhof opened this issue Dec 19, 2020 · 5 comments · Fixed by #378

Comments

@nwellnhof
Copy link
Contributor

Found by OSS-Fuzz: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=28805

Reduced test case:

python3 -c 'print("- "*7000+"s"+" -"*7000)' |build/src/cmark --smart >/dev/null
@jgm
Copy link
Member

jgm commented Dec 19, 2020

My tests:

N = time size of output
7000 0.715 168000
14000 2.63 336000
21000 6.32 504000

Confirming, then, that output size is linear with N, time is quadratic.

@jgm
Copy link
Member

jgm commented Dec 19, 2020

Interestingly, my Haskell commonmark library (which I had thought was implementing the same algorithm as cmark) is much faster for this case and has linear performance.

@jgm
Copy link
Member

jgm commented Dec 19, 2020

For reference, the pattern in the input is:

- - - - - - - s - - - - - - -

(for N = 7)

@jgm
Copy link
Member

jgm commented Dec 20, 2020

Puzzling. One would think this is due to nested list parsing, because if the pattern is changed by the addition of an s at the front, causing it to be a paragraph, parsing is nearly instant even for large N. But if the problem is list parsing, why on earth is --smart needed to trigger the slowdown?

nwellnhof added a commit to nwellnhof/cmark that referenced this issue Mar 9, 2021
The inline parsing code would call cmark_node_append_child to append
nodes. This public function has a sanity check which is linear in the
depth of the tree. Repeated calls could show quadratic behavior in
degenerate trees. Use a special function to append nodes without this
check.

Fixes commonmark#373. Found by OSS-Fuzz.
@nwellnhof
Copy link
Contributor Author

This can also be triggered without --smart. For example:

python3 -c 'print(">"*20000+"<"*20000)' |build/src/cmark >/dev/null

@jgm jgm closed this as completed in #378 Mar 19, 2021
jgm pushed a commit that referenced this issue Mar 19, 2021
The inline parsing code would call cmark_node_append_child to append
nodes. This public function has a sanity check which is linear in the
depth of the tree. Repeated calls could show quadratic behavior in
degenerate trees. Use a special function to append nodes without this
check.

Fixes #373. Found by OSS-Fuzz.
@nwellnhof nwellnhof changed the title Quadratic behavior with smart quotes Quadratic behavior when parsing inlines Jul 13, 2021
anticomputer pushed a commit to github/cmark-gfm that referenced this issue Jan 23, 2023
The inline parsing code would call cmark_node_append_child to append
nodes. This public function has a sanity check which is linear in the
depth of the tree. Repeated calls could show quadratic behavior in
degenerate trees. Use a special function to append nodes without this
check.

Fixes commonmark#373. Found by OSS-Fuzz.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants