-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature request] name2taxid optional warnings with duplicates, name2taxid go 'up' taxonomy if no taxonomy ID #103
Comments
Optional warnings with duplicatesI'll add a warning for that. name2taxid go 'up' taxonomy if no taxonomy IDIt seems hard to do that. While, there's a fuzzy mode, I've tried it (it's slow). It looks like there's a species named "Unio crassus"
|
shenwei356
added a commit
that referenced
this issue
Sep 25, 2024
The warning added.
You can also just count the names and filter duplicated ones.
|
Wow, super fast. Thank you very much!!! Love your work! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Prerequisites
taxonkit version
Describe your issue
Optional warnings with duplicates
When having several options, name2taxid returns all. As it says in the help:
which is totally fine. However, there are a few cases on NCBI where two different kingdoms share a species name! Here's an example:
That's a fish and a plant.
With large lists of fish species this happens once in a while, and all my lists are suddenly off by one or two rows. It would be very useful to have an optional flag that identifies these cases to STDERR, like 'WARNING: Found duplicate IDs for 'Centropogon australis', 'other species' etc'. That would let me filter these cases manually (deciding on the kingdom I'd prefer).
name2taxid go 'up' taxonomy if no taxonomy ID
There are many fish in BOLD and other non-NCBI databases that have no NCBI taxonomy ID. Often these are also weird 'sp.' or 'cf.' species. An example is BOLD:AAF5083, Unio cf. crassus.
I usually manually tend to replace these by the genus-level:
would it be possible to add an optional flag that moves 'up' the taxonomic levels until it has found something, with some logging to STDERR? As in, 'Replaced Unio cf. crassus by Unio' or something like that.
Thank you for all of your hard work on this amazing tool!
The text was updated successfully, but these errors were encountered: