Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency between sourmash lca classify and sourmash lca gather #707

Closed
ctSkennerton opened this issue Aug 2, 2019 · 3 comments
Closed
Labels

Comments

@ctSkennerton
Copy link
Contributor

I created a signature and sourmash lca classify gives a particular genus, Eubacterium, but when I run sourmash lca gather with the same signature and database is not from a genome from Eubacterium. I was wondering if the "1 equal match" of the top match from sourmash lca gather was from a Eubacterium and if there was any way I could find out?

sourmash lca classify --query scaffolds_min1000.fasta.a.sig --db ~/Downloads/genbank-k31.lca.json.gz 
== This is sourmash version 2.0.1. ==
== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==

loaded 1 LCA databases. ksize=31, scaled=10000
finding query signatures...
outputting classifications to stdout
ID,status,superkingdom,phylum,class,order,family,genus,species,strain
scaffolds_min1000.fasta.a,disagree,Bacteria,Firmicutes,Clostridia,Clostridiales,Eubacteriaceae,Eubacterium,,
 sourmash lca gather scaffolds_min1000.fasta.a.sig ~/Downloads/genbank-k31.lca.json.gz 
== This is sourmash version 2.0.1. ==
== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==

loaded 1 LCA databases. ksize=31, scaled=10000
loaded query: /Users/connor.skennerton/Downl... (k=31)

overlap     p_query p_match 
---------   ------- --------
3.6 Mbp      77.8%   79.6%      Clostridium [Butyribacterium] methylotrophicum (** 1 equal matches)
320.0 kbp     6.9%    6.9%      Eubacterium callanderi
130.0 kbp     2.8%    2.8%      Eubacterium limosum
@ctb
Copy link
Contributor

ctb commented Aug 23, 2019

hi @ctSkennerton, good question! There is no easy (command line) way to figure this out... the slightly hacky way to do it today would be to use sourmash search to pull out all the matches, and then check against them with gather. We could add --save-matches to lca gather as well, and/or support output of equal matches to --csv.

@ctb
Copy link
Contributor

ctb commented Mar 4, 2021

#1310 and #1226 will make this much easier.

@ctb
Copy link
Contributor

ctb commented Jun 25, 2021

sourmash prefetch (#1370, released in sourmash 4.1.0) makes this straightforward!

@ctb ctb closed this as completed Jun 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants