-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using of sparse and dense is backwards #18
Comments
Strange that I made this error! You are of course right. Next time I do a semver bump I'll correct this. Thanks for the report! |
I presume the error came from the thought that sparse data has a lot of "zeroes" (not necessarily literal zeroes), and this crate's version of the sparse data-structure also explicitly contains those "zeroes". At least that makes sense to me. |
This commit introduces a ground-up rewrite of the entire crate. Most or all use cases served by `aho-corasick 0.6` should be served by this rewrite as well. Pretty much everything has been improved. The API is simpler, and much more flexible with many new configuration knobs for controlling the space-vs-time tradeoffs of Aho-Corasick automatons. In particular, there are several tunable optimizations for controlling space usage such as state ID representation and byte classes. The API is simpler in that there is now just one type that encapsulates everything: `AhoCorasick`. Support for streams has been improved quite a bit, with new APIs for stream search & replace. Test and benchmark coverage has increased quite a bit. This also fixes a subtle but important bug: empty patterns are now handled correctly. Previously, they could never match, but now they can match at any position. Finally, I believe this is now the only Aho-Corasick implementation to support leftmost-first and leftmost-longest semantics by using what I think is a novel alteration to the Aho-Corasick construction algorithm. I surveyed some other implementations, and there are a few Java libraries that support leftmost-longest match semantics, but they implement it by adding a sliding queue at search time. I also looked into Perl's regex implementation which has an Aho-Corasick optimization for `foo|bar|baz|...|quux` style regexes, and therefore must somehow implement leftmost-first semantics. The code is a bit hard to grok, but it looks like this is being handled at search time as opposed to baking it into the automaton. Fixes #18, Fixes #19, Fixes #26, Closes #34
This commit introduces a ground-up rewrite of the entire crate. Most or all use cases served by `aho-corasick 0.6` should be served by this rewrite as well. Pretty much everything has been improved. The API is simpler, and much more flexible with many new configuration knobs for controlling the space-vs-time tradeoffs of Aho-Corasick automatons. In particular, there are several tunable optimizations for controlling space usage such as state ID representation and byte classes. The API is simpler in that there is now just one type that encapsulates everything: `AhoCorasick`. Support for streams has been improved quite a bit, with new APIs for stream search & replace. Test and benchmark coverage has increased quite a bit. This also fixes a subtle but important bug: empty patterns are now handled correctly. Previously, they could never match, but now they can match at any position. Finally, I believe this is now the only Aho-Corasick implementation to support leftmost-first and leftmost-longest semantics by using what I think is a novel alteration to the Aho-Corasick construction algorithm. I surveyed some other implementations, and there are a few Java libraries that support leftmost-longest match semantics, but they implement it by adding a sliding queue at search time. I also looked into Perl's regex implementation which has an Aho-Corasick optimization for `foo|bar|baz|...|quux` style regexes, and therefore must somehow implement leftmost-first semantics. The code is a bit hard to grok, but it looks like this is being handled at search time as opposed to baking it into the automaton. Fixes #18, Fixes #19, Fixes #26, Closes #34
This commit introduces a ground-up rewrite of the entire crate. Most or all use cases served by `aho-corasick 0.6` should be served by this rewrite as well. Pretty much everything has been improved. The API is simpler, and much more flexible with many new configuration knobs for controlling the space-vs-time tradeoffs of Aho-Corasick automatons. In particular, there are several tunable optimizations for controlling space usage such as state ID representation and byte classes. The API is simpler in that there is now just one type that encapsulates everything: `AhoCorasick`. Support for streams has been improved quite a bit, with new APIs for stream search & replace. Test and benchmark coverage has increased quite a bit. This also fixes a subtle but important bug: empty patterns are now handled correctly. Previously, they could never match, but now they can match at any position. Finally, I believe this is now the only Aho-Corasick implementation to support leftmost-first and leftmost-longest semantics by using what I think is a novel alteration to the Aho-Corasick construction algorithm. I surveyed some other implementations, and there are a few Java libraries that support leftmost-longest match semantics, but they implement it by adding a sliding queue at search time. I also looked into Perl's regex implementation which has an Aho-Corasick optimization for `foo|bar|baz|...|quux` style regexes, and therefore must somehow implement leftmost-first semantics. The code is a bit hard to grok, but it looks like this is being handled at search time as opposed to baking it into the automaton. Fixes #18, Fixes #19, Fixes #26, Closes #34
This commit introduces a ground-up rewrite of the entire crate. Most or all use cases served by `aho-corasick 0.6` should be served by this rewrite as well. Pretty much everything has been improved. The API is simpler, and much more flexible with many new configuration knobs for controlling the space-vs-time tradeoffs of Aho-Corasick automatons. In particular, there are several tunable optimizations for controlling space usage such as state ID representation and byte classes. The API is simpler in that there is now just one type that encapsulates everything: `AhoCorasick`. Support for streams has been improved quite a bit, with new APIs for stream search & replace. Test and benchmark coverage has increased quite a bit. This also fixes a subtle but important bug: empty patterns are now handled correctly. Previously, they could never match, but now they can match at any position. Finally, I believe this is now the only Aho-Corasick implementation to support leftmost-first and leftmost-longest semantics by using what I think is a novel alteration to the Aho-Corasick construction algorithm. I surveyed some other implementations, and there are a few Java libraries that support leftmost-longest match semantics, but they implement it by adding a sliding queue at search time. I also looked into Perl's regex implementation which has an Aho-Corasick optimization for `foo|bar|baz|...|quux` style regexes, and therefore must somehow implement leftmost-first semantics. The code is a bit hard to grok, but it looks like this is being handled at search time as opposed to baking it into the automaton. Fixes #18, Fixes #19, Fixes #26, Closes #34
Normally in math and computing, 'dense' regarding storage means that memory is allocated for all elements, even if not used or zero.
Sparse
means that only used or non-zero elements occupy space. See dense matrix vs. sparse matrix, dense hash map vs sparse hash map, etc. You usage is other way around, which is confusing.The text was updated successfully, but these errors were encountered: