Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatches in taxonomic ranks with Sintax #573

Closed
ashleyp1 opened this issue Sep 13, 2024 · 6 comments
Closed

Mismatches in taxonomic ranks with Sintax #573

ashleyp1 opened this issue Sep 13, 2024 · 6 comments
Assignees
Labels

Comments

@ashleyp1
Copy link

I encountered some confusing results while testing sintax on my data. I'm running v 2.28.1 on near full length 16S amplicons against a custom database. For some of my samples (mostly ones without high confidence values) I get mixed taxonomies that seem to jump around, like below.

0faf4970-8f6a-4a6c-9d55-26f7c80d50fc d:Bacteria(1.00),p:Firmicutes(1.00),c:Bacilli(1.00),o:Bacillales(0.83),g:Exiguobacterium(0.48),s:Exiguobacterium_acetylicum(0.24)
5f9d0909-fe7d-409d-9da8-26c2749bb0cc d:Bacteria(1.00),p:Firmicutes(1.00),c:Bacilli(1.00),o:Bacillales(1.00),g:Exiguobacterium(1.00),s:Exiguobacterium_acetylicum(0.74)
37270a98-6e0c-4130-ae8d-8c47399abcdd d:Bacteria(1.00),p:Firmicutes(1.00),c:Bacilli(1.00),o:Bacillales(0.99),f:Listeriaceae(0.60),g:Listeria(0.60),s:Exiguobacterium_acetylicum(0.25)
0aa23c22-ff54-4b20-8663-ef25a6338227 d:Bacteria(1.00),p:Proteobacteria(0.59),c:Gammaproteobacteria(0.58),o:Enterobacterales(0.57),f:Enterobacteriaceae(0.52),g:Exiguobacterium(0.36),s:Salmonella_enterica(0.29)

The first two show the lineage that I would expect for Exiguobacterium, but how did it go from Listeria to Exiguo and Exiguo to Salmonella on the next two?

I thought it was an error in my database at first, but I checked and confirmed that the lineages are all correct and formatted properly. At this point, I assume this is most likely a fault in my understanding of how sintax works and I know that the bootstrap values for those two are low enough I probably won't use them, but I'd still like to understand how this is happening.

Thanks!

@torognes
Copy link
Owner

Hi, thank you for reporting this issue!

This does not look right.

Although taxonomic ranks with low-confidence, e.g. with values below 0.8, should not be trusted, the classifications should not jump between different clades in the tree as you go down to the species level.

I'll look deeper into the issue as soon as possible.

Could you please send me the exact command you ran?

Would it be possible to send me (a subset of) the queries and the database used? Or is it confidential?

@ashleyp1
Copy link
Author

ashleyp1 commented Sep 16, 2024

Here is the command I used. I sent you an invite to a dropbox folder with my database and the sample I first found the issue in. Thanks for looking into this!

vsearch --sintax \
    1-filt-trimmed-HL068_FW.fastq.gz \
    --db sintax_db.fasta \
    --tabbedout 1-68_sintax.tsv \
    --sintax_cutoff 0.7 --strand both -notrunclabels

@torognes
Copy link
Owner

Thank you, I'll look into it. Got the data.

@torognes torognes self-assigned this Sep 17, 2024
@torognes torognes added the bug label Sep 17, 2024
torognes added a commit that referenced this issue Sep 19, 2024
@torognes
Copy link
Owner

There was a logical bug in the selection of the best lineages. It should be fixed now in commit aa94d1c. I think it should only appear when the confidence is below 0.5, so it shouldn't matter much in most cases, although it was confusing.

I will make a new release soon with this fix.

Sorry for the bug and thank you very much for reporting this issue!

@torognes
Copy link
Owner

BTW, I'll recommend using the --sintax_random option to avoid length bias in the taxonomic classification.

@torognes
Copy link
Owner

The fixes are available now in release 2.29.0:

https://github.com/torognes/vsearch/releases/tag/v2.29.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants