-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Add a HLL implementation #1223
Conversation
Codecov Report
@@ Coverage Diff @@
## latest #1223 +/- ##
==========================================
- Coverage 84.20% 83.33% -0.88%
==========================================
Files 99 103 +4
Lines 9233 9609 +376
==========================================
+ Hits 7775 8008 +233
- Misses 1458 1601 +143
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
need this reviewed? |
it would be good to take a look, I tried to reproduce the Nodegraph API too when possible |
e518833
to
9b0b7c7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, although I have not reviewed the Rust code in detail 👀
It looks like you do some encodings refactoring in the Rust code that is not mentioned in the PR. Maybe update the PR description a bit with that?
Implement a HyperLogLog sketch based on the
khmer
implementation but using the estimator from "New cardinality estimation algorithms for HyperLogLog sketches" (also implemented indashing
).This PR also moves
add_sequence
andadd_protein
toSigsTrait
, closing #1057.The encoding data and methods (
hp
,dayhoff
,aa
andHashFunctions
) was in the MinHash source file, and since it is more general-purpose it was moved to a new moduleencodings
, which is then used bySigsTrait
.(these changes are both spun off #1201)
Checklist
make test
Did it pass the tests?make coverage
Is the new code covered?without a major version increment. Changing file formats also requires a
major version number increment.
changes were made?