-
Notifications
You must be signed in to change notification settings - Fork 446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RegexSet and Regex give different results for the same pattern in 1.9 #1070
Comments
One interesting bit here is that while fn main() {
env_logger::init();
let pattern = r"(?m)^ *v [0-9]";
let text = "v 0";
let rs = regex::RegexSet::new([pattern]).unwrap();
println!("rs is: {rs:?}");
println!("{}", rs.is_match(text)); // false (incorrect)
println!("{}", rs.matches(text).matched_any()); // true!
} |
BurntSushi
added a commit
that referenced
this issue
Aug 26, 2023
This fixes a bug in how prefilters were applied for multi-regexes compiled with "all" semantics. It turns out that this corresponds to the regex crate's RegexSet API, but only its `is_match` routine. See the comment on the regression test added in this PR for an explanation of what happened. Basically, it came down to incorrectly using Aho-Corasick's "standard" semantics, which doesn't necessarily report leftmost matches. Since the regex crate is really all about leftmost matching, this can lead to skipping over parts of the haystack and thus lead to missing matches. Fixes #1070
BurntSushi
added a commit
that referenced
this issue
Aug 26, 2023
This fixes a bug in how prefilters were applied for multi-regexes compiled with "all" semantics. It turns out that this corresponds to the regex crate's RegexSet API, but only its `is_match` routine. See the comment on the regression test added in this PR for an explanation of what happened. Basically, it came down to incorrectly using Aho-Corasick's "standard" semantics, which doesn't necessarily report leftmost matches. Since the regex crate is really all about leftmost matching, this can lead to skipping over parts of the haystack and thus lead to missing matches. Fixes #1070
BurntSushi
added a commit
that referenced
this issue
Aug 26, 2023
This fixes a bug in how prefilters were applied for multi-regexes compiled with "all" semantics. It turns out that this corresponds to the regex crate's RegexSet API, but only its `is_match` routine. See the comment on the regression test added in this PR for an explanation of what happened. Basically, it came down to incorrectly using Aho-Corasick's "standard" semantics, which doesn't necessarily report leftmost matches. Since the regex crate is really all about leftmost matching, this can lead to skipping over parts of the haystack and thus lead to missing matches. Fixes #1070
This is fixed in |
obviously, killing it here. just wow. i don't have the words |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
What version of regex are you using?
1.9.3
. The issue is present in regex1.9.0
and later.Describe the bug at a high level.
RegexSet::new([r"(?m)^ *v [0-9]"]).unwrap().is_match("v 0")
incorrectly returns false in version 1.9.0 and later.It returns true in 1.8.4.
It returns true if I use a
Regex
instead of aRegexSet
.What are the steps to reproduce the behavior?
(playground link)
What is the actual behavior?
What is the expected behavior?
The last line should be
true
.The text was updated successfully, but these errors were encountered: