Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Add FrozenSourmashSignature #1610

Merged
merged 30 commits into from
Aug 29, 2022
Merged

[MRG] Add FrozenSourmashSignature #1610

merged 30 commits into from
Aug 29, 2022

Conversation

ctb
Copy link
Contributor

@ctb ctb commented Jun 19, 2021

Adds a FrozenSourmashSignature class, and provides sensible to_mutable() and to_frozen() methods on SourmashSignature and FrozenSourmashSignature. Provides an update() context manager that wraps changes so that a FrozenSourmashSignature is left at the end.

Complement to #1508, which did this with MinHash.

As with #1508, very few changes are required to make all the tests pass. The ease of doing this suggests to me that this is a Good Change.

Also as with #1508, I really like this updated code; it's so nice and clear!

I am hoping that these changes make it easier for @luizirber in Rust/Python interaction.

On the other hand, this could break some 3rd party code (as #1508 did :). Admittedly we are the owners of most of that code, but 🤷 not sure if we should delay this 'til 5.0. On the other hand, could fit nicely with a 4.5.0 release.

TODO:

  • update places that unnecessarily call flatten and downsample (b/c signatures are already flat and/or downsampled)
  • check that SourmashSignature by default owns a frozen MinHash (I think it does, but confirm)
  • check that SourmashSignature.to_frozen() also freezes the MinHash object.
  • maybe make a new method that does the ss = ss.to_mutable(); ss.minhash = xyz; ss = ss.to_frozen() dance;
  • OR, use a context manager? with sig.mutate() as sig:?
  • revisit into_frozen() and into_mutable()
  • add docstrings :)
  • consider making error messages clearer & providing guidance around using to_mutable()

@ctb ctb changed the title Add FrozenSourmashSignature [EXP] Add FrozenSourmashSignature Jun 19, 2021
@codecov-commenter
Copy link

codecov-commenter commented Jun 19, 2021

Codecov Report

Merging #1610 (bee45ad) into latest (b1b4ef7) will increase coverage by 7.40%.
The diff coverage is 91.34%.

@@            Coverage Diff             @@
##           latest    #1610      +/-   ##
==========================================
+ Coverage   84.70%   92.10%   +7.40%     
==========================================
  Files         131      100      -31     
  Lines       15544    11343    -4201     
  Branches     2219     2236      +17     
==========================================
- Hits        13166    10448    -2718     
+ Misses       2085      601    -1484     
- Partials      293      294       +1     
Flag Coverage Δ
python 92.10% <91.34%> (+0.04%) ⬆️
rust ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/sourmash/signature.py 93.86% <84.09%> (+2.05%) ⬆️
src/sourmash/minhash.py 93.76% <90.90%> (-0.27%) ⬇️
src/sourmash/commands.py 90.78% <96.87%> (+0.06%) ⬆️
src/sourmash/index/__init__.py 96.71% <100.00%> (ø)
src/sourmash/lca/command_classify.py 81.48% <100.00%> (+0.71%) ⬆️
src/sourmash/lca/lca_db.py 93.29% <100.00%> (+0.01%) ⬆️
src/sourmash/search.py 97.94% <100.00%> (+<0.01%) ⬆️
src/sourmash/sig/__main__.py 94.02% <100.00%> (+0.02%) ⬆️
src/core/src/cmd.rs
... and 31 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@ctb
Copy link
Contributor Author

ctb commented Jun 20, 2021

@luizirber opinions?

@ctb ctb changed the title [EXP] Add FrozenSourmashSignature [MRG] Add FrozenSourmashSignature Aug 3, 2022
@ctb
Copy link
Contributor Author

ctb commented Aug 3, 2022

Ready for review & merge @sourmash-bio/devs !

@ctb
Copy link
Contributor Author

ctb commented Aug 26, 2022

hi @luizirber I think this PR is best reviewed by you, if you have time. It's a quick read, I promise ;). And you might have a suggestion for how to improve, too - I'm not thrilled with the context manager syntax I use.

@luizirber
Copy link
Member

I have a bunch of commits in #2230 that flip the MinHash to default to large ones (b-tree based) and expose the Frozen ones (vec-based), should I add the commits here or make a new PR merging into this one?

(I can also just merge them together in #2230, since that will be a breaking change version bump on the Rust side and I would like to avoid doing two in a row 😬 )

@ctb
Copy link
Contributor Author

ctb commented Aug 28, 2022

I have a bunch of commits in #2230 that flip the MinHash to default to large ones (b-tree based) and expose the Frozen ones (vec-based), should I add the commits here or make a new PR merging into this one?

(I can also just merge them together in #2230, since that will be a breaking change version bump on the Rust side and I would like to avoid doing two in a row 😬 )

your call - if last option is easiest, go for it!

@luizirber
Copy link
Member

I tested out with the pile of changes coming in #2230 and it works. In the future might want to avoid all the copying, but as you pointed in the comment for the context manager it is hard to track bugs otherwise...

Another future optimization is being more strict on the Mutable/Frozen on the Rust side, right now they map to

  • Mutable: KmerMinHashBTree
  • Frozen: KmerMinHash
    but they are both mutable (it's just that the Frozen one is much slower to mutate with large sketches).

@ctb
Copy link
Contributor Author

ctb commented Aug 28, 2022

awesome, thanks!! should I merge, then?

@luizirber
Copy link
Member

awesome, thanks!! should I merge, then?

Yup, go for it!

@ctb ctb merged commit bc5ae03 into latest Aug 29, 2022
@ctb ctb deleted the add/frozen_signature branch August 29, 2022 01:03
@ctb
Copy link
Contributor Author

ctb commented Aug 29, 2022

🎉

@ctb
Copy link
Contributor Author

ctb commented Aug 29, 2022

In the future might want to avoid all the copying, but as you pointed in the comment for the context manager it is hard to track bugs otherwise...

I would have put more effort in to this, but note that SourmashSignature really only contains name and filename. The .minhash property is a FrozenMinHash that is copied around by reference. So all the copying doesn't really matter for efficiency, or at least that's what I decided after reading through the code a bunch of times :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants