Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

augur curate format-host #1586

Open
joverlee521 opened this issue Aug 20, 2024 · 2 comments
Open

augur curate format-host #1586

joverlee521 opened this issue Aug 20, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@joverlee521
Copy link
Contributor

Inspired by @kimandrews 's work in nextstrain/rabies#8

Copying my comment as an official proposal for a new augur curate format-host command

The host rules can be formatted like <group>/<family>/<genus>/<species>\t<new_host_label>
Picking a couple examples from your script:

host_hierarchy new_host_label
odd-toed ungulates/*/*/* Other Ungulate
*/Mephitidae/*/* Skunk
*/Canidae/Vulpes/* Fox (Vulpes sp.)
*/Procyonidae/Procyon/* Raccoon
*/*/*/Canis lupus familiaris Domestic Dog

Then the generalized script would match starting from group down to species.

I don't think Augur would have any default host rules since the useful groupings will vary widely by pathogen.

@kimandrews
Copy link

This is great! It's possible that for other pathogens we may also want to use criteria based on higher Linnaean taxonomic categories (e.g. Class or Order). But I also see downsides to adding more/*/* because it can get confusing.

@joverlee521
Copy link
Contributor Author

Noting that a lot of the host map in avian-flu is for the common name that is included in the SRA/BioSample record. Seems like host only gets converted to the scientific host when pulling the data directly from NCBI Virus/NCBI Datasets.

For example, NCBI Virus shows "Bos taurus" for PQ468542.1:

Screenshot 2025-01-14 at 2 20 27 PM

The linked GenBank record shows "cattle":

Screenshot 2025-01-14 at 2 24 19 PM

The linked BioSample record shows "CATTLE":

Screenshot 2025-01-14 at 2 26 08 PM

The linke SRA record shows "CATTLE":

Screenshot 2025-01-14 at 2 27 21 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants