Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gather_at_rank handling ties in taxonomic assignment #174

Open
taylorreiter opened this issue May 21, 2021 · 0 comments
Open

gather_at_rank handling ties in taxonomic assignment #174

taylorreiter opened this issue May 21, 2021 · 0 comments

Comments

@taylorreiter
Copy link
Member

#171 updated charcoal to sourmash>=4.1.0, including switching from sourmash search to sourmash prefetch.
The taxonomy output for one contig in test file LoombaR_2017__SID1050_bax__bin.11.fa.gz changed. As recorded in that issue:

jq . < tests/test-data/loomba/LoombaR_2017__SID1050_bax__bin.11.fa.gz.contigs-tax.json > out.old
jq . < tests/test-data/loomba/LoombaR_2017__SID1050_bax__bin.11.fa.gz.contigs-tax.json > out.new

diff out.old out.new
2629c2629
<             "f__Acutalibacteraceae"
---
>             "f__Oscillospiraceae"
2633c2633
<             "g__Anaeromassilibacillus"
---
>             "g__Flavonifractor"

@ctb surmised:

This is likely because gather doesn't report ties, per sourmash-bio/sourmash#1366 and sourmash-bio/sourmash#278. It is slightly surprising in this case that the tie here is above the family level (!!) but these things happen.

It's probably a good idea for gather_at_rank to detect and handle/report such ties, and probably pull the taxonomic assignment up to the level above the tie.

@bluegenes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant