-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] add column 3 to kreport #2306
Conversation
My only thought here is that while sourmash assigns k-mers to genomes, the actual level of our resolution depends on alphabet and k-mer size. We don't do taxonomic assignments any differently, but I wouldn't necessarily recommend using, e.g. protein k7, to get species-level assignments. Perhaps I should add a sentence about this in the doc? e.g.
I'm not really sure though, because 1. this is really just general sourmash info/recommendation, and 2. there are lots of reasons to try other ksizes. |
Codecov Report
@@ Coverage Diff @@
## latest #2306 +/- ##
==========================================
+ Coverage 84.84% 92.14% +7.30%
==========================================
Files 131 100 -31
Lines 15687 11416 -4271
Branches 2189 2190 +1
==========================================
- Hits 13309 10519 -2790
+ Misses 2083 602 -1481
Partials 295 295
Flags with carried forward coverage won't be shown. Click here to find out more. 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
ok @ctb @sourmash-bio/devs ready for review |
I like the idea of adding this text, actually. We're seeing more and more evidence that people are reading, sometimes scouring, the text for details - let's give it to 'em! The one addition I would make is to say "sourmash gather makes all assignments to genomes, and then sourmash tax integrates taxonomy information...". |
Fixes #2305
Updates
kreport
to include information in column 3, which is the bp "assigned" to a particular taxon. Since we make all assignments to the genome level and then apply taxonomy later, this column just contains the species-level assignments from column 2. I think this best mimics the intention of the output format / outputs from other tools.Updated documentation with this and the recommendation to generate sketches with abundance.
This is basically a bug fix, since this is closer to what the format should have been to begin with.