Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes catastrophic backtracking for long strings of spaces #629

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

caliskanmehmet
Copy link

@caliskanmehmet caliskanmehmet commented Nov 25, 2024

Solves #473

With a very long input of spaces and the current CODE_BLOCK_R regex, the number of possible combinations becomes huge, causing the regex engine to take an extremely long time to test the input. This small change to the regex reduces the backtracking and prevents catastrophic backtracking when faced with a long string of spaces.

Tester for catastrophic backtracing: https://regex101.com/r/85LH2r/1
Reproduction: https://stackblitz.com/edit/markdown-to-jsx-reproduction-473?file=src%2FApp.tsx

Copy link

changeset-bot bot commented Nov 25, 2024

⚠️ No Changeset found

Latest commit: 76bbc4a

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@caliskanmehmet caliskanmehmet marked this pull request as ready for review November 25, 2024 17:49
Copy link
Owner

@quantizor quantizor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Could you add a changeset please?

@caliskanmehmet
Copy link
Author

caliskanmehmet commented Nov 29, 2024

Hey @quantizor, thanks for the review!

Unfortunately, I've discovered that the new regex doesn't conform to the CommonMark specs and fails the test case below.

it('should not interrupt paragraphs', () => {
  render(compiler('foo\n    bar'))

  expect(root.innerHTML).toMatchInlineSnapshot(`
    <p>
      foo
        bar
    </p>
  `)
})

- Snapshot  -  4
+ Received  + 10

- <p>
+ <div>
+   <p>
-   foo
+     foo
-     bar
- </p>
+   </p>
+   <pre>
+     <code>
+       bar
+     </code>
+   </pre>
+ </div>

I think changing the regex won't be the final solution here, and the block parsing algorithm should be improved to handle indented code blocks.

It looks like we have two options:

  1. Merge the PR and allow indented code blocks to interrupt paragraphs until the block parsing algorithm is fixed.
  2. Don’t merge the PR, and risk catastrophic backtracking.

@quantizor
Copy link
Owner

I'm pretty sure there is already code to handle indentation

@caliskanmehmet
Copy link
Author

Sorry for the confusion. I meant using block parsing to identify indented code blocks, instead of using regex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants