Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate Applications of deeper RegExp analysis #525

Closed
bd82 opened this issue Jul 1, 2017 · 5 comments
Closed

Investigate Applications of deeper RegExp analysis #525

bd82 opened this issue Jul 1, 2017 · 5 comments

Comments

@bd82
Copy link
Member

bd82 commented Jul 1, 2017

There seems more and more issues that can could be better resolved if Chevrotain could
perform deep analysis of RegExp patterns.

  • Identifying patterns that could contain line terminators automatically instead of forcing users
    to specify those explicitly using the line_breaks flag, see details.

  • Correctly identifying usage of (disallowed) start/end anchors in regExps, see details.

  • Possible performance improvements by compiling simple RegExps to state machines using
    CharCodeAt.

    • Mainly relevant where content security policies do not prevent eval/ Function constructor.
  • Possible performance optimization to create a large switch case which tests the current charCode
    to reduce the number of possible regExps to be checked.

    • The greater the number of types of tokens the greater the benefit.
@bd82
Copy link
Member Author

bd82 commented Jul 1, 2017

@RReverser wrote:

but you can just use some existing regexp parser like https://github.com/DmitrySoshnikov/regexp-tree to isolate changes, and perform analysis only on the interesting bits.

bd82 added a commit that referenced this issue Jul 1, 2017
to help resolve the false positive detection of RegExp anchors.

A "perfect" solution which would be transparent to the users
(no need for a workaround). Could possibily come from
investigating using a RegExp parser (see #525).

Fixes #157
@bd82
Copy link
Member Author

bd82 commented Apr 14, 2018

Performance optimizations using the first charCode to filter out irrelevant patterns
provided a large boost.

#682

@bd82
Copy link
Member Author

bd82 commented May 10, 2018

#709

@bd82
Copy link
Member Author

bd82 commented May 10, 2018

#710

@bd82
Copy link
Member Author

bd82 commented May 10, 2018

Issues were open for each subtask.

@bd82 bd82 closed this as completed May 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant