Skip to content

Commit

Permalink
Fix #186.
Browse files Browse the repository at this point in the history
This enables RegexSets to short-circuit when:

1. All patterns are anchored to the beginning of the input.
2. All patterns have either matched or will never match.

We make this happen by checking whether all NFA states in a DFA state
are match states, when a DFA match is observed. If all NFA states are
match states, and since all match states are final states, we know that
the current set of matches will never change. Since we don't care about
reporting location information, we can quit.

N.B. If no matches can be found, then the DFA will short circuit using its
normal mechanism.
  • Loading branch information
BurntSushi committed May 1, 2016
1 parent 090655b commit 445c834
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 0 deletions.
13 changes: 13 additions & 0 deletions src/dfa.rs
Original file line number Diff line number Diff line change
Expand Up @@ -581,6 +581,19 @@ impl<'a> Fsm<'a> {
self.last_match_si = next_si;
prev_si = next_si;

// This permits short-circuiting when matching a regex set.
// In particular, if this DFA state contains only match states,
// then it's impossible to extend the set of matches since
// match states are final. Therefore, we can quit.
if self.prog.matches.len() > 1 {
let state = self.state(next_si);
let just_matches = state.insts.iter()
.all(|&ip| self.prog[ip as usize].is_match());
if just_matches {
return result;
}
}

// Another inner loop! If the DFA stays in this particular
// match state, then we can rip through all of the input
// very quickly, and only recording the match location once
Expand Down
10 changes: 10 additions & 0 deletions src/prog.rs
Original file line number Diff line number Diff line change
Expand Up @@ -289,6 +289,16 @@ pub enum Inst {
Bytes(InstBytes),
}

impl Inst {
/// Returns true if and only if this is a match instruction.
pub fn is_match(&self) -> bool {
match *self {
Inst::Match(_) => true,
_ => false,
}
}
}

/// Representation of the Save instruction.
#[derive(Clone, Debug)]
pub struct InstSave {
Expand Down

0 comments on commit 445c834

Please sign in to comment.