You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
there is some interest in translating between taxonomies (GTDB, NCBI, and maybe LINS), and this is something that we should be able to do somewhat straightforwardly in sourmash.
GTDB only provides taxonomy for bacteria and archaea, not euks or viruses; same for LINS.
here I'm mostly thinking about using sourmash to translate between taxonomic annotations that have been made elsewhere (with or without sourmash);
note GTDB is included within Genbank, so for ~300,000 genomes there is already a 1:1 mapping.
my basic idea is to build mapping tables for NCBI lineages into GTDB lineages, by using sourmash gather and sourmash tax genome on NCBI genomes, and then... publish them!
this would need a new command, maybe sourmash tax translate, that would take two taxonomy spreadsheets (--from-tax and --to-tax maybe?) in a variety of formats (currently accepted, as well as biom #2199 and semicolon separated #2185?) and do the translation for ya.
in the fullness of time this could become a way for people with results in one taxonomy to do a mapping to another; this should maybe be discouraged in situations where you have genomes (just use sourmash gather on those genomes!) but could be useful for people who are using other tax classification programs.
The text was updated successfully, but these errors were encountered:
there is some interest in translating between taxonomies (GTDB, NCBI, and maybe LINS), and this is something that we should be able to do somewhat straightforwardly in sourmash.
relevant issues -
sourmash tax
? #1603a few brainstorming notes and thoughts -
GTDB only provides taxonomy for bacteria and archaea, not euks or viruses; same for LINS.
here I'm mostly thinking about using sourmash to translate between taxonomic annotations that have been made elsewhere (with or without sourmash);
note GTDB is included within Genbank, so for ~300,000 genomes there is already a 1:1 mapping.
my basic idea is to build mapping tables for NCBI lineages into GTDB lineages, by using sourmash gather and sourmash tax genome on NCBI genomes, and then... publish them!
this would need a new command, maybe
sourmash tax translate
, that would take two taxonomy spreadsheets (--from-tax
and--to-tax
maybe?) in a variety of formats (currently accepted, as well as biom #2199 and semicolon separated #2185?) and do the translation for ya.in the fullness of time this could become a way for people with results in one taxonomy to do a mapping to another; this should maybe be discouraged in situations where you have genomes (just use sourmash gather on those genomes!) but could be useful for people who are using other tax classification programs.
The text was updated successfully, but these errors were encountered: