Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor perf improvements and code touch ups. #222

Merged
merged 3 commits into from
May 1, 2016
Merged

Minor perf improvements and code touch ups. #222

merged 3 commits into from
May 1, 2016

Conversation

BurntSushi
Copy link
Member

  1. Use Unicode instructions in a bytes::Regex more aggressively. We could do more on this front. Unicode instructions are quite a bit faster for the NFA engines to execute.
  2. Remove the DFA's StateKey type, since it is exactly equivalent to State.

BurntSushi added 3 commits May 1, 2016 12:45
Previously, a byte based regex would always use byte based instructions.
This isn't actually necessary. If the regex has the Unicode flag enabled,
for example, then Unicode instructions can and should be used.

We can't mix and match though. If any part of the regex could match invalid
UTF-8, then we need to drop back to byte-based matching.

We could be more aggressive in the parser. In particular, we could check
if arbitrary character classes only match UTF-8, and if so, represent them
as Unicode classes instead of byte classes.

This optimization is only applicable to the NFA engines, since the DFA
operates exclusively on bytes. In particular, the NFA engines are much
faster with Unicode instructions.
It has exactly the same information and therefore was purely redundant.
We now uses states themselves as keys in the compiled map.
@BurntSushi BurntSushi merged commit 090655b into master May 1, 2016
@BurntSushi BurntSushi deleted the misc-perf branch May 1, 2016 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant