Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

building large LCA databases for genbank subsets #1264

Closed
ctb opened this issue Dec 28, 2020 · 1 comment
Closed

building large LCA databases for genbank subsets #1264

ctb opened this issue Dec 28, 2020 · 1 comment

Comments

@ctb
Copy link
Contributor

ctb commented Dec 28, 2020

from #462 (comment), @J-I-P says:

OK, I try to build a LCA database ( only bacteria in NCBI) now. I want to use this database to sourmash lca summarize or sourmash lca classify my zymo data.

and I wanted to continue that conversation in a new issue. here we are! my response below:

cool - we have various LCA databases already available here, but started running into scalability challenges as the number of microbial genomes in genbank increased to over 500,000!

we've been working on resolving those scalability issues as well as changing the way we handle taxonomy to better support various approaches.

some links you might find interesting -

@ctb
Copy link
Contributor Author

ctb commented Mar 30, 2022

This is now easy to do with picklists and taxonomies; closing for now.

@ctb ctb closed this as completed Mar 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant