Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🧪 Add Regexp.linear_time? tests; ✅ Update BEG_REGEXP to pass #145

Merged
merged 2 commits into from
Apr 18, 2023

Conversation

nevans
Copy link
Collaborator

@nevans nevans commented Apr 16, 2023

BEG_REGEXP has been significantly changed to run in linear-time when running in ruby 3.2. All lookahead has been eliminated.

A correct regexp for ATOM is implemented but unused. ATOMISH describes the current behavior, which ignores "[" chars. The msg-att field labels require the ATOMISH definition, for now...

A regexp for TAG is implemented but also unused for now.

@nevans nevans force-pushed the regexp_linear_time branch from 353a210 to ae09bc5 Compare April 16, 2023 14:11
@nevans nevans requested review from hsbt and shugo April 16, 2023 14:12
@nevans
Copy link
Collaborator Author

nevans commented Apr 16, 2023

The tests I've added collect every Regexp const and every Regexp literal that is inside method bodies, for all of Net::IMAP... or at least they attempt too. Collecting the Regep from iseq will only work on CRuby.

@hsbt & @shugo: have you seen any tests like I've implemented here, elsewhere? It seems very useful for automatically detecting and preventing ReDoS vulnerabilities.

Do you know any way to detect whether a constant has been deprecated?

@nevans
Copy link
Collaborator Author

nevans commented Apr 16, 2023

@hsbt & @shugo Also, what do you think about the changes to BEG_REGEXP and #next_token? I wonder if there is a cleaner way to accomplish this. I could just grab a full atom in the regexp and then decide between number, nil, or "+" in #next_token, but I think this was faster... I need to benchmark it again.

I have some other updates planned for our lexer, for both simplification and performance. But those will come later. 🙂

@nevans nevans force-pushed the regexp_linear_time branch from ae09bc5 to b904e90 Compare April 16, 2023 22:05
@nevans nevans changed the title Add Regexp.linear_time? tests, & Update BEG_REGEXP to run in linear time Add Regexp.linear_time? tests, & update BEG_REGEXP to run in linear time Apr 17, 2023
@nevans nevans force-pushed the regexp_linear_time branch from b904e90 to 85832e8 Compare April 17, 2023 14:07
@nevans nevans changed the title Add Regexp.linear_time? tests, & update BEG_REGEXP to run in linear time 🧪 Add Regexp.linear_time? tests; ✅ Update BEG_REGEXP to run in linear time Apr 17, 2023
@nevans nevans changed the title 🧪 Add Regexp.linear_time? tests; ✅ Update BEG_REGEXP to run in linear time 🧪 Add Regexp.linear_time? tests; ✅ Update BEG_REGEXP to pass Apr 17, 2023
@nevans nevans force-pushed the regexp_linear_time branch from 85832e8 to b656450 Compare April 18, 2023 01:28
nevans added 2 commits April 18, 2023 01:38
`BEG_REGEXP` has been significantly changed to run in linear-time when
running in ruby 3.2.  All lookahead has been eliminated.

A correct regexp for `ATOM` is implemented but unused.  `ATOMISH`
describes the current behavior, which ignores "[" chars.  The `msg-att`
field labels require the `ATOMISH` definition, for now...

A regexp for `TAG` is implemented but also unused for now.
@nevans nevans force-pushed the regexp_linear_time branch from b656450 to 68fdef1 Compare April 18, 2023 05:39
@hsbt
Copy link
Member

hsbt commented Apr 18, 2023

@hsbt & @shugo: have you seen any tests like I've implemented here, elsewhere? It seems very useful for automatically detecting and preventing ReDoS vulnerabilities.

I've not seen that yet. We have an idea about it on rubocop rule when Ruby 3.2 released.

/cc @makenowjust

@nevans nevans merged commit 92db350 into master Apr 18, 2023
@nevans nevans deleted the regexp_linear_time branch April 18, 2023 22:38
@nevans
Copy link
Collaborator Author

nevans commented Apr 18, 2023

Yeah, another good approach would be to use parser to test all regexp literals. That should work well for rubocop, but it misses out on consts which are created with dynamically constructed regexps and dynamic method definitions created using eval with a string.

Another check would be to look at local vars on method bindings. That would work, right? It should catch dynamic definitions such as: re = /.../; define_method foo { re.match? _1 }. I'll add that in another commit.

But a parser-based approach could test regexp literals that use simple regexp-escaped interpolation, like /foo#{Regexp.escape bar}/. We could probably do that with an iseq-based approach, but it would be more complex than using a parser, I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants