Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow more ranks for viral taxonomy #1

Open
bluegenes opened this issue Jun 25, 2023 · 0 comments
Open

allow more ranks for viral taxonomy #1

bluegenes opened this issue Jun 25, 2023 · 0 comments

Comments

@bluegenes
Copy link

bluegenes commented Jun 25, 2023

I'm doing some virus work and running up against an issue with our use of only 7 ranks (superkingdom -->species).

For cowpox genome (GCA_000839185.1; random example):

The taxonomic information listed on NCBI is:

Viruses; Varidnaviria; Bamfordvirae; Nucleocytoviricota; Pokkesviricetes; \
Chitovirales; Poxviridae; Chordopoxvirinae; Orthopoxvirus; Cowpox virus

Using this script, we get the following:

GCA_000839185.1,10243,Viruses,Nucleocytoviricota,Pokkesviricetes, \
Chitovirales,Poxviridae,Orthopoxvirus,Cowpox virus,

(missing realm/clade Varidnaviria, kingdom Bamfordvirae and subfamily Chordopoxvirinae)

at ICTV, Varidnaviria is rank realm while on NCBI, it's rank clade

The full taxonomy, if we include the empty ICTV ranks, is:

GCF_000839185.1,Varidnaviria,,Bamfordvirae,,Nucleocytoviricota,, \
Pokkesviricetes,,Chitovirales,,Poxviridae,Chordopoxvirinae,Orthopoxvirus,,Cowpox virus,

where Varidnaviria is Realm (we would need to add 'Viruses' at the front if we want it)

Might be worth enabling use of all ICTV ranks, which I'm working on for sourmash tax here: sourmash-bio/sourmash#2608.

This doesn't necessarily need to be fixed here, but it seemed a decent spot to file this issue, since this was used to build our last GenBank tax files.

quick fix:

in make-lineage-csv.py, use the following for viruses:

want_taxonomy = ['superkingdom', 'clade', 'subrealm', 'kingdom', 'subkingdom', 'phylum', 'subphylum', 'class', 'subclass', 'order', 'suborder', 'family', 'subfamily', 'genus', 'subgenus', 'species']

superkingdom adds 'Viruses' while clade adds Varidnaviria

This produces the following lineage:

GCA_000839185.1,10243,Viruses,Varidnaviria,,Bamfordvirae,,Nucleocytoviricota,, \
Pokkesviricetes,,Chitovirales,,Poxviridae,Chordopoxvirinae,Orthopoxvirus,,Cowpox virus

I'll make a PR just to have it somewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant