Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for identifying equal matches in lca gather #714

Closed
ctb opened this issue Aug 23, 2019 · 3 comments
Closed

add support for identifying equal matches in lca gather #714

ctb opened this issue Aug 23, 2019 · 3 comments

Comments

@ctb
Copy link
Contributor

ctb commented Aug 23, 2019

Right now there's no way to figure out what the "equal matches" are with sourmash lca gather.

We could / should support output of equal matches to --csv and --save-matches. We should probably add --save-matches to lca gather, too.

see #707 (comment).

@nmb85
Copy link

nmb85 commented Jul 29, 2020

"We could / should support output of equal matches to --csv and --save-matches. We should probably add --save-matches to lca gather, too." <-- This would be suuuuper cool.

Also, v3.4 is telling me not to use "sourmash lca gather" anymore because it's deprecated, but when I use "sourmash gather" with an lca db, I get the regular gather function giving me specific signature matches instead of the taxonomic output with the lineage. Has "sourmash lca gather" taxonomic function not been fully merged into the "sourmash gather" or am I missing an option? Do you have an example command line for using "sourmash gather" to give taxonomic lineages?

One more thing: is it possible/advisable to fill out the lineage fields in the csv as far as the algo is confident for queries with multiple equal matches? That is, if all equal matches are from the Staphylococcus genus, to report the lineage down to the genus?

@ctb
Copy link
Contributor Author

ctb commented Jul 31, 2020

hi @nmb85 re sourmash lca gather see this comment,

#1011 (comment)

and the larger issue. Note that there's a script in #1011 that does the taxonomic assignment.

Re removing lca gather, I forgot why and had to go hunting for why we made that decision :). 'twas here, #728. While it is not well explained in that issue, the logic was that sourmash lca gather uses an inferior and slightly hacked version of the underlying gather algorithm, and the correct way to move forward was to first do a regular gather and then do the taxonomy stuff. This has since been supplemented by a realization that traditional LCA approaches are a bit suboptimal (described briefly in the #1011 above).

We are thinking about a sourmash classify (see #1099) that would use gather underneath and then take in a taxonomy to do assignments.

One more thing: is it possible/advisable to fill out the lineage fields in the csv as far as the algo is confident for queries with multiple equal matches? That is, if all equal matches are from the Staphylococcus genus, to report the lineage down to the genus?

@bluegenes is actively working on now. The answer seems to be that it's quite robust to do what you say (but we are working on putting error bars on it).

@ctb ctb mentioned this issue Nov 18, 2020
@ctb
Copy link
Contributor Author

ctb commented May 8, 2021

since lca gather is now gone, linking to similar issues #278 #707 #1366 and closing here.

also see #1099 for our thinking about a sourmash classify command

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants