Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what should our signature names look like for databases? #9

Closed
ctb opened this issue Jul 18, 2020 · 2 comments
Closed

what should our signature names look like for databases? #9

ctb opened this issue Jul 18, 2020 · 2 comments

Comments

@ctb
Copy link
Contributor

ctb commented Jul 18, 2020

for genbank, we do --name-from-first, so we get output like this:

CP001941.1 Aciduliprofundum boonei T4...

for gtdb, we do trickier name setting, so we get output like this:

GCF_000025665 s__Aciduliprofundum boonei

with the main difference here being that the GCF_ identifier points to the identifier for the whole genome, not just the first sequence. That seems better.

We could add an optional identifier string to signatures. Hrm. Ref sourmash-bio/sourmash#268 for more such questions.

ref #7

@luizirber
Copy link
Member

For wort I generated a name closer to GTBD:
GCF_000246355.1 Leptospira kirschneri serovar Mozdok str. 'B 81/7 type 3/Tsaratsovo' strain=B 81/7 type 3/Tsaratsovo, CLC_glsol0

I generate it from assembly_summary.txt, using assembly_accession, organism_name, infraspecific_name (if there is one) and finally a comma and asm_name.

This example is pretty much the worst case I found: long name, with ' in the middle (so I need to escape properly in the shell). But the crucial point is using GCF_000246355.1 in the first position, because --name-from-first in NCBI assemblies is a mess for our use cases.

@ctb
Copy link
Contributor Author

ctb commented Apr 2, 2022

we've standardized over the last two years on putting the identifier first, as above, and we use this for pretty everything (including sourmash taxonomy, and picklists). Everything seems to work fine ;). Closing as resolved 🎉 !

@ctb ctb closed this as completed Apr 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants