Skip to content

Commit

Permalink
literal: fix reverse suffix optimization
Browse files Browse the repository at this point in the history
This commit fixes a bug where the reverse suffix literal optimization
wasn't quite right. It was too eagerly skipping past parts of the input
without verifying that there was no match. We fix this by being a bit more
careful with what we're searching by keeping track of the starting position
of the last literal matched. Subsequent literal searches then start
immediately after the last one.

This is necessary in particular when the suffix literal can have
overlapping matches. e.g., searching `000` in `0000` can match at either
positions 0 or 1, but searching `abc` in `abcd` can only match as position
0.

This was initially reported as a bug against ripgrep:
BurntSushi/ripgrep#1203
  • Loading branch information
BurntSushi committed Feb 27, 2019
1 parent 60d087a commit 661bf53
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 4 deletions.
13 changes: 9 additions & 4 deletions src/exec.rs
Original file line number Diff line number Diff line change
Expand Up @@ -745,12 +745,13 @@ impl<'c> ExecNoSync<'c> {
debug_assert!(lcs.len() >= 1);
let mut start = original_start;
let mut end = start;
let mut last_literal_match = 0;
while end <= text.len() {
start = end;
end += match lcs.find(&text[end..]) {
last_literal_match += match lcs.find(&text[last_literal_match..]) {
None => return Some(NoMatch(text.len())),
Some(start) => start + lcs.len(),
Some(i) => i,
};
end = last_literal_match + lcs.len();
match dfa::Fsm::reverse(
&self.ro.dfa_reverse,
self.cache,
Expand All @@ -760,7 +761,11 @@ impl<'c> ExecNoSync<'c> {
) {
Match(0) | NoMatch(0) => return None,
Match(s) => return Some(Match((s + start, end))),
NoMatch(_) => continue,
NoMatch(i) => {
start = i;
last_literal_match += 1;
continue;
}
Quit => return Some(Quit),
};
}
Expand Down
3 changes: 3 additions & 0 deletions tests/regression.rs
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,9 @@ mat!(wb_start_x, r"(?u:\b)^(?-u:X)", "X", Some((0, 1)));
ismatch!(strange_anchor_non_complete_prefix, r"a^{2}", "", false);
ismatch!(strange_anchor_non_complete_suffix, r"${2}a", "", false);

// See: https://github.com/BurntSushi/ripgrep/issues/1203
ismatch!(wat1, r"[0-4][0-4][0-4]000", "153.230000", true);

// See: https://github.com/rust-lang/regex/issues/334
mat!(captures_after_dfa_premature_end, r"a(b*(X|$))?", "abcbX",
Some((0, 1)), None, None);
Expand Down

0 comments on commit 661bf53

Please sign in to comment.