-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MRG: add skipmer capacity to sourmash python layer via ffi #3446
Conversation
Make skipmers robust, but keep #3395 functional in the meantime. This PR: - enables second skipmer types, so we have m1n3 in addition to m2n3 - switches to a reading frame approach for both translation + skipmers, which means we first build the reading frame, then kmerize, rather than building kmers + translating/skipping on the fly - avoids "extended length" needed for skipping on the fly Since this changes the `SeqToHashes` strategy a bit, there's one python test where we now see a different error. Future thoughts: - with the new structure, it would be straightforward to add validation to exclude protein k-mers with invalid amino acids (`X`). I guess I'm not entirely sure what happens to those atm...
for more information, see https://pre-commit.ci
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## latest #3446 +/- ##
==========================================
- Coverage 86.37% 86.26% -0.11%
==========================================
Files 137 137
Lines 16196 16226 +30
Branches 2219 2225 +6
==========================================
+ Hits 13989 13998 +9
- Misses 1900 1915 +15
- Partials 307 313 +6
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@@ -102,7 +102,7 @@ def _set_num_scaled(mh, num, scaled): | |||
# Number of hashes is 0th parameter | |||
mh_params[0] = num | |||
# Scale is 8th parameter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Scale is 8th parameter | |
# Scale is 10th parameter |
😅
## [0.18.0] - 2024-12-20 MSRV: 1.66 Changes/additions: * add skipmer capacity to sourmash python layer via ffi (#3446) * add skipmers; switch to reading frame approach for translation, skipmers (#3395) * Refactor: Use to_writer/from_reader across the codebase (#3443) * adjust `Signature::name()` to return `Option<String>` instead of `filename()` and `md5sum()` (#3434) * propagate zipfile errors (#3431) Updates: * Bump proptest from 1.5.0 to 1.6.0 (#3437) * Bump roaring from 0.10.8 to 0.10.9 (#3438) * Bump serde from 1.0.215 to 1.0.216 (#3436) * Bump statrs from 0.17.1 to 0.18.0 (#3426) * Bump roaring from 0.10.7 to 0.10.8 (#3423) * Bump needletail from 0.6.0 to 0.6.1 (#3427) * Bump web-sys from 0.3.72 to 0.3.74 (#3411) * Bump js-sys from 0.3.72 to 0.3.74 (#3412) * Bump roaring from 0.10.6 to 0.10.7 (#3413) * Bump serde_json from 1.0.132 to 1.0.133 (#3402) * Bump serde from 1.0.214 to 1.0.215 (#3403)
See #3446 (review) Fixes comment in *code* fixed in #3446.
This PR updates the FFI and python layer to allow the skipmer moltypes (
skipm1n3
,skipm2n3
). We will keep this as an undocumented experimental feature for now. There are no guarantees at the moment that skipmers will work with all sourmash commands, as there are no explicit tests in place. There are tests for using skipmers with branchwater, so all skipmer searching is best done in the plugin for now. This PR enables critical handy utilities, though:sig cat
,sig summarize
,sig describe
, etc.Documentation and additional tests should be added prior to release(#3449).