id2taxid mapping file format #766
-
Hi. I build custom databases from a bunch of proteomes, mostly Uniprot but also random stuff from individual websites. Since I can't just download the NCBI prot.accession2taxid file, it would be helpful to learn more about the format for the file given to diamond makedb --taxonmap.
I got hints from reports like "xxx|" in the log file, but any more info would be helpful. I'm happy to munge FASTA or taxonmap files as needed, but I need the taxon mappings to work. Thanks. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 4 replies
-
I forgot to add that using --no-parse-seqids didn't help, and in fact stopped the one single species that was mapping (because it had simpler IDs with no period of pipe characters) from mapping. |
Beta Was this translation helpful? Give feedback.
-
Only accession.version
It will always ignore everything after the last dot unless you use
Probably not, for your use case it should be easier with
It should work, can you make a simple test case where it fails and send it to me? |
Beta Was this translation helpful? Give feedback.
Only accession.version
It will always ignore everything after the last dot unless you use
--no-parse-seqids
.Probably not, for your use case it should be easier with
--no-parse-seqids
.