-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Regex::CompileOptions::MULTILINE_ONLY
#14870
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: The translation needs to be implemented in pcre_compile_options
as well.
Also would be nice to have some specs for testing the behaviour.
|
||
# Multiline matching. | ||
# | ||
# Equivalent to `MULTILINE | DOTALL` in PCRE and PCRE2. | ||
MULTILINE = 0x0000_0006 | ||
# Equivalent to `MULTILINE`in PCRE and PCRE2. | ||
MULTILINE_ONLY = 0x0000_0040 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: Theoretically, we're free to chose whatever values we want for this enum because they're specific to Crystal and get translated for any engine implementation. But since all other members are based on the libpcre1 values, we should follow this scheme:
MULTILINE_ONLY = 0x0000_0040 | |
MULTILINE_ONLY = 0x0000_0002 |
Also this should ensure that MULTILINE_ONLY | DOTALL == MULTILINE
which is a useful property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@straight-shoota - Note that DOTALL 2 lines below is already 0x0000_0002
, so if we change this value, do we have to change the DOTALL value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this should ensure that MULTILINE_ONLY | DOTALL == MULTILINE which is a useful property.
It's probably worth adding that as an explicit test and/or encoding that directly in the definition of MULTILINE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooh the DOTALL = 2
is surprising. Guess that never worked correctly when it was mapped directly to libpcre1 values 🙈
I suppose it's probably best to add MULTILINE_ONLY = 4
then. This breaks value equivalence with libpcre1, but keeps the Crystal values consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried using the MULTILINE without DOT_ALL value at first but it also breaks inspect (it gets confused and thinks MULTILINE_ONLY is MULTILINE)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should work (somehow). If it doesn't, there's something wrong. Worst case we override the implementation of CompileOptions#inspect
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, changing the values could break existing code, even if I doubt anyone is instantiating an Options[4] or whatever :-D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's unlikely. But better to keep this compatibility in case someone relies on the integer values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change this to 0x4 then. It'll be the MULTILINE and DOTALL values of PCRE1 swapped but still ensure MULTILINE = MULTILINE_ONLY | DOTALL
.
MULTILINE_ONLY = 0x0000_0040 | |
MULTILINE_ONLY = 0x0000_0004 |
Co-authored-by: Sijawusz Pur Rahnama <sija@sija.pl>
Regex::CompileOptions::MULTILINE_ONLY
Re-added the special handling for MULTILINE (setting MULTILINE and DOTALL) because removing it broke a test. The remaining test failure happens on main too (at spec/std/regex_spec.cr:19) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The case for MULTILINE
is only necessary because the value of MULTILINE_ONLY
is not a composite of MULTILINE
. Change it to 0x0000_0004
and it should work.
And the test failure is in fact introduced by this change because the test uses 0x0000_0040
as an undefined option value (but is now defined).
|
||
# Multiline matching. | ||
# | ||
# Equivalent to `MULTILINE | DOTALL` in PCRE and PCRE2. | ||
MULTILINE = 0x0000_0006 | ||
# Equivalent to `MULTILINE`in PCRE and PCRE2. | ||
MULTILINE_ONLY = 0x0000_0040 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change this to 0x4 then. It'll be the MULTILINE and DOTALL values of PCRE1 swapped but still ensure MULTILINE = MULTILINE_ONLY | DOTALL
.
MULTILINE_ONLY = 0x0000_0040 | |
MULTILINE_ONLY = 0x0000_0004 |
Adds a new MULTILINE_ONLY flag to regexes.
I created an issue here:
#14869
As discussed here:
https://forum.crystal-lang.org/t/regex-that-is-multiline-but-not-dotall-how/7054/12