Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Add a HLL implementation #1223

Merged
merged 2 commits into from
Oct 31, 2020
Merged

[MRG] Add a HLL implementation #1223

merged 2 commits into from
Oct 31, 2020

Conversation

luizirber
Copy link
Member

@luizirber luizirber commented Oct 28, 2020

Implement a HyperLogLog sketch based on the khmer implementation but using the estimator from "New cardinality estimation algorithms for HyperLogLog sketches" (also implemented in dashing).

This PR also moves add_sequence and add_protein to SigsTrait, closing #1057.

The encoding data and methods (hp, dayhoff, aa and HashFunctions) was in the MinHash source file, and since it is more general-purpose it was moved to a new module encodings, which is then used by SigsTrait.

(these changes are both spun off #1201)

Checklist

  • Is it mergeable?
  • make test Did it pass the tests?
  • make coverage Is the new code covered?
  • Did it change the command-line interface? Only additions are allowed
    without a major version increment. Changing file formats also requires a
    major version number increment.
  • Was a spellchecker run on the source code and documentation after
    changes were made?

@codecov
Copy link

codecov bot commented Oct 28, 2020

Codecov Report

Merging #1223 into latest will decrease coverage by 0.87%.
The diff coverage is 68.54%.

Impacted file tree graph

@@            Coverage Diff             @@
##           latest    #1223      +/-   ##
==========================================
- Coverage   84.20%   83.33%   -0.88%     
==========================================
  Files          99      103       +4     
  Lines        9233     9609     +376     
==========================================
+ Hits         7775     8008     +233     
- Misses       1458     1601     +143     
Flag Coverage Δ
rusttests 67.54% <68.73%> (-0.63%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/core/src/cmd.rs 86.88% <ø> (ø)
src/core/src/errors.rs 0.00% <0.00%> (ø)
src/core/src/ffi/hyperloglog.rs 0.00% <0.00%> (ø)
src/core/src/ffi/minhash.rs 0.00% <ø> (ø)
src/core/src/ffi/mod.rs 0.00% <ø> (ø)
src/core/src/ffi/signature.rs 0.00% <ø> (ø)
src/core/src/ffi/utils.rs 0.00% <0.00%> (ø)
src/core/src/from.rs 100.00% <ø> (ø)
src/core/src/lib.rs 100.00% <ø> (ø)
src/core/src/wasm.rs 0.00% <0.00%> (ø)
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1e94bde...6b60f5c. Read the comment docs.

@luizirber luizirber changed the title [MRG] Add a HLL implementation [WIP] Add a HLL implementation Oct 28, 2020
@ctb
Copy link
Contributor

ctb commented Oct 28, 2020

need this reviewed?

@luizirber
Copy link
Member Author

need this reviewed?

it would be good to take a look, I tried to reproduce the Nodegraph API too when possible

@luizirber luizirber changed the title [WIP] Add a HLL implementation [MRG] Add a HLL implementation Oct 28, 2020
@luizirber luizirber force-pushed the hll_impl branch 2 times, most recently from e518833 to 9b0b7c7 Compare October 28, 2020 19:59
Copy link
Contributor

@ctb ctb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, although I have not reviewed the Rust code in detail 👀

It looks like you do some encodings refactoring in the Rust code that is not mentioned in the PR. Maybe update the PR description a bit with that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants