Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

translating between taxonomies - maybe a sourmash tax translate? #2201

Open
ctb opened this issue Aug 13, 2022 · 2 comments
Open

translating between taxonomies - maybe a sourmash tax translate? #2201

ctb opened this issue Aug 13, 2022 · 2 comments
Labels

Comments

@ctb
Copy link
Contributor

ctb commented Aug 13, 2022

there is some interest in translating between taxonomies (GTDB, NCBI, and maybe LINS), and this is something that we should be able to do somewhat straightforwardly in sourmash.

relevant issues -

a few brainstorming notes and thoughts -

GTDB only provides taxonomy for bacteria and archaea, not euks or viruses; same for LINS.

here I'm mostly thinking about using sourmash to translate between taxonomic annotations that have been made elsewhere (with or without sourmash);

note GTDB is included within Genbank, so for ~300,000 genomes there is already a 1:1 mapping.

my basic idea is to build mapping tables for NCBI lineages into GTDB lineages, by using sourmash gather and sourmash tax genome on NCBI genomes, and then... publish them!

this would need a new command, maybe sourmash tax translate, that would take two taxonomy spreadsheets (--from-tax and --to-tax maybe?) in a variety of formats (currently accepted, as well as biom #2199 and semicolon separated #2185?) and do the translation for ya.

in the fullness of time this could become a way for people with results in one taxonomy to do a mapping to another; this should maybe be discouraged in situations where you have genomes (just use sourmash gather on those genomes!) but could be useful for people who are using other tax classification programs.

@ctb ctb added the taxonomy label Aug 13, 2022
@ctb
Copy link
Contributor Author

ctb commented Aug 15, 2022

would be good to identify places where translation was not round-trip (A->B->A)

@ctb
Copy link
Contributor Author

ctb commented Aug 18, 2022

see discussion in "LINgroups as a Principled Approach to Compare and Integrate Multiple Bacterial Taxonomies" paper!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant