Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using of sparse and dense is backwards #18

Closed
ilyapopov opened this issue Nov 16, 2017 · 2 comments · Fixed by #40
Closed

Using of sparse and dense is backwards #18

ilyapopov opened this issue Nov 16, 2017 · 2 comments · Fixed by #40

Comments

@ilyapopov
Copy link

Normally in math and computing, 'dense' regarding storage means that memory is allocated for all elements, even if not used or zero. Sparse means that only used or non-zero elements occupy space. See dense matrix vs. sparse matrix, dense hash map vs sparse hash map, etc. You usage is other way around, which is confusing.

@BurntSushi
Copy link
Owner

Strange that I made this error! You are of course right. Next time I do a semver bump I'll correct this. Thanks for the report!

@Apanatshka
Copy link
Contributor

I presume the error came from the thought that sparse data has a lot of "zeroes" (not necessarily literal zeroes), and this crate's version of the sparse data-structure also explicitly contains those "zeroes". At least that makes sense to me.
It may be better to use other terminology altogether to avoid confusion. I like compressed for data-structures that suppress "zeroes".

BurntSushi added a commit that referenced this issue Mar 28, 2019
This commit introduces a ground-up rewrite of the entire crate. Most or
all use cases served by `aho-corasick 0.6` should be served by this
rewrite as well. Pretty much everything has been improved. The API is
simpler, and much more flexible with many new configuration knobs for
controlling the space-vs-time tradeoffs of Aho-Corasick automatons. In
particular, there are several tunable optimizations for controlling
space usage such as state ID representation and byte classes.

The API is simpler in that there is now just one type that encapsulates
everything: `AhoCorasick`.

Support for streams has been improved quite a bit, with new APIs for
stream search & replace.

Test and benchmark coverage has increased quite a bit.

This also fixes a subtle but important bug: empty patterns are now
handled correctly. Previously, they could never match, but now they can
match at any position.

Finally, I believe this is now the only Aho-Corasick implementation to
support leftmost-first and leftmost-longest semantics by using what I
think is a novel alteration to the Aho-Corasick construction algorithm.
I surveyed some other implementations, and there are a few Java
libraries that support leftmost-longest match semantics, but they
implement it by adding a sliding queue at search time. I also looked
into Perl's regex implementation which has an Aho-Corasick optimization
for `foo|bar|baz|...|quux` style regexes, and therefore must somehow
implement leftmost-first semantics. The code is a bit hard to grok, but
it looks like this is being handled at search time as opposed to baking
it into the automaton.

Fixes #18, Fixes #19, Fixes #26, Closes #34
BurntSushi added a commit that referenced this issue Mar 28, 2019
This commit introduces a ground-up rewrite of the entire crate. Most or
all use cases served by `aho-corasick 0.6` should be served by this
rewrite as well. Pretty much everything has been improved. The API is
simpler, and much more flexible with many new configuration knobs for
controlling the space-vs-time tradeoffs of Aho-Corasick automatons. In
particular, there are several tunable optimizations for controlling
space usage such as state ID representation and byte classes.

The API is simpler in that there is now just one type that encapsulates
everything: `AhoCorasick`.

Support for streams has been improved quite a bit, with new APIs for
stream search & replace.

Test and benchmark coverage has increased quite a bit.

This also fixes a subtle but important bug: empty patterns are now
handled correctly. Previously, they could never match, but now they can
match at any position.

Finally, I believe this is now the only Aho-Corasick implementation to
support leftmost-first and leftmost-longest semantics by using what I
think is a novel alteration to the Aho-Corasick construction algorithm.
I surveyed some other implementations, and there are a few Java
libraries that support leftmost-longest match semantics, but they
implement it by adding a sliding queue at search time. I also looked
into Perl's regex implementation which has an Aho-Corasick optimization
for `foo|bar|baz|...|quux` style regexes, and therefore must somehow
implement leftmost-first semantics. The code is a bit hard to grok, but
it looks like this is being handled at search time as opposed to baking
it into the automaton.

Fixes #18, Fixes #19, Fixes #26, Closes #34
BurntSushi added a commit that referenced this issue Mar 28, 2019
This commit introduces a ground-up rewrite of the entire crate. Most or
all use cases served by `aho-corasick 0.6` should be served by this
rewrite as well. Pretty much everything has been improved. The API is
simpler, and much more flexible with many new configuration knobs for
controlling the space-vs-time tradeoffs of Aho-Corasick automatons. In
particular, there are several tunable optimizations for controlling
space usage such as state ID representation and byte classes.

The API is simpler in that there is now just one type that encapsulates
everything: `AhoCorasick`.

Support for streams has been improved quite a bit, with new APIs for
stream search & replace.

Test and benchmark coverage has increased quite a bit.

This also fixes a subtle but important bug: empty patterns are now
handled correctly. Previously, they could never match, but now they can
match at any position.

Finally, I believe this is now the only Aho-Corasick implementation to
support leftmost-first and leftmost-longest semantics by using what I
think is a novel alteration to the Aho-Corasick construction algorithm.
I surveyed some other implementations, and there are a few Java
libraries that support leftmost-longest match semantics, but they
implement it by adding a sliding queue at search time. I also looked
into Perl's regex implementation which has an Aho-Corasick optimization
for `foo|bar|baz|...|quux` style regexes, and therefore must somehow
implement leftmost-first semantics. The code is a bit hard to grok, but
it looks like this is being handled at search time as opposed to baking
it into the automaton.

Fixes #18, Fixes #19, Fixes #26, Closes #34
BurntSushi added a commit that referenced this issue Mar 28, 2019
This commit introduces a ground-up rewrite of the entire crate. Most or
all use cases served by `aho-corasick 0.6` should be served by this
rewrite as well. Pretty much everything has been improved. The API is
simpler, and much more flexible with many new configuration knobs for
controlling the space-vs-time tradeoffs of Aho-Corasick automatons. In
particular, there are several tunable optimizations for controlling
space usage such as state ID representation and byte classes.

The API is simpler in that there is now just one type that encapsulates
everything: `AhoCorasick`.

Support for streams has been improved quite a bit, with new APIs for
stream search & replace.

Test and benchmark coverage has increased quite a bit.

This also fixes a subtle but important bug: empty patterns are now
handled correctly. Previously, they could never match, but now they can
match at any position.

Finally, I believe this is now the only Aho-Corasick implementation to
support leftmost-first and leftmost-longest semantics by using what I
think is a novel alteration to the Aho-Corasick construction algorithm.
I surveyed some other implementations, and there are a few Java
libraries that support leftmost-longest match semantics, but they
implement it by adding a sliding queue at search time. I also looked
into Perl's regex implementation which has an Aho-Corasick optimization
for `foo|bar|baz|...|quux` style regexes, and therefore must somehow
implement leftmost-first semantics. The code is a bit hard to grok, but
it looks like this is being handled at search time as opposed to baking
it into the automaton.

Fixes #18, Fixes #19, Fixes #26, Closes #34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants