Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] rename covered_bp property to better reflect function #2050

Merged
merged 15 commits into from
May 13, 2022

Conversation

bluegenes
Copy link
Contributor

@bluegenes bluegenes commented May 13, 2022

This PR renames covered_bp property to unique_dataset_hashes to reflect the fact that it does not take into account abundances and does not include +(k-1) for bp conversion. While this is currently an estimate of scaled * len(self.hashes), this could be replaced by a more accurate count/estimate in the future (e.g. HLL, ref #2030).

I also rename the intersect_bp property of FracMinHashComparison to total_unique_intersect_hashes to be a bit clearer that we're working in hashes. Wrote a comment that this is approx equal to intersect_bp.

Open to better names, ofc -- please suggest if you have them.

This is the simplified version of #2027.

@codecov
Copy link

codecov bot commented May 13, 2022

Codecov Report

Merging #2050 (83a0d83) into latest (7826fbc) will increase coverage by 7.46%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           latest    #2050      +/-   ##
==========================================
+ Coverage   84.27%   91.73%   +7.46%     
==========================================
  Files         130       99      -31     
  Lines       15240    10959    -4281     
  Branches     2151     2151              
==========================================
- Hits        12843    10053    -2790     
+ Misses       2098      607    -1491     
  Partials      299      299              
Flag Coverage Δ
python 91.73% <100.00%> (-0.01%) ⬇️
rust ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/sourmash/minhash.py 94.16% <100.00%> (-0.02%) ⬇️
src/sourmash/search.py 97.89% <100.00%> (ø)
src/sourmash/sketchcomparison.py 95.23% <100.00%> (ø)
src/core/src/sketch/hyperloglog/estimators.rs
src/core/src/index/revindex.rs
src/core/src/ffi/mod.rs
src/core/src/index/sbt/mhbt.rs
src/core/src/sketch/minhash.rs
src/core/src/index/linear.rs
src/core/tests/test.rs
... and 24 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7826fbc...83a0d83. Read the comment docs.

@bluegenes bluegenes changed the title [WIP] rename covered_bp property to better reflect function [MRG] rename covered_bp property to better reflect function May 13, 2022
Copy link
Contributor

@ctb ctb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming tests pass, LGTM!

And just to triple confirm, there is no change in the CSV output headers here, right?

@bluegenes
Copy link
Contributor Author

Thanks!

Yep - no change to output, just more accurate/specific/intuitive(?) internal names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants