You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is the bug primarily related to salmon (bulk mode) or alevin (single-cell mode)?
bulk
Describe the bug
I am working with UMI-tagged Lexogen QuantSeq data. Since salmon does not (yet?) support handling UMIs with bulk RNA-seq directly (see #306), I am using umi_tools + STAR to generate a deduplicated transcriptome BAM file and run Salmon in alignment mode as implemented in the nf-core/rnaseq pipeline.
Unfortunately, Salmon does not seem to handle the deduplicated BAM well. A lot of genes have zero reads which shouldn't.
For instance, for ENSMUSG00000029657 I get the following results (last column denotes counts in all cases)
# Salmon on transcriptome BAM, without umit_tools dedup: quant.genes.no_umi.sf
ENSMUSG00000029657.15 3803.74 3650.23 17.3078 438
# Salmon on deduplicated transcriptome BAM: quant.genes.sf
ENSMUSG00000029657.15 1947.36 1614.62 0 0
# Feature counts on genome BAM, without umi_tools dedup:
ENSMUSG00000029657.15 [...] 7266 415
# Feature counts on deduplicated genome BAM:
ENSMUSG00000029657.15 [...] 7266 289
Here's a scatterplot of log1p(counts) of the salmon quant results for a single sample with and without umi_tools dedup
To Reproduce
Run Salmon quant on the aligned transcriptome BAM file. I provide subsampled versions of both the deduplicated and non-deduplicated BAM files. If you need the full BAM files, LMK and we can arrange a transfer.
Closing this as it's not an issue with Salmon.
The main problem was that the nf-core/rnaseq pipeline didn't call umi_tools with the --paired flag.
Some more fine-tuning of the UMI-tools output might be necessary to make sure unpaired reads are properly counted by Salmon. For this, see CGATOxford/UMI-tools#465.
Is the bug primarily related to salmon (bulk mode) or alevin (single-cell mode)?
bulk
Describe the bug
I am working with UMI-tagged Lexogen QuantSeq data. Since salmon does not (yet?) support handling UMIs with bulk RNA-seq directly (see #306), I am using umi_tools + STAR to generate a deduplicated transcriptome BAM file and run Salmon in alignment mode as implemented in the nf-core/rnaseq pipeline.
Unfortunately, Salmon does not seem to handle the deduplicated BAM well. A lot of genes have zero reads which shouldn't.
For instance, for
ENSMUSG00000029657
I get the following results (last column denotes counts in all cases)Here's a scatterplot of log1p(counts) of the salmon quant results for a single sample with and without umi_tools dedup
To Reproduce
Run Salmon quant on the aligned transcriptome BAM file. I provide subsampled versions of both the deduplicated and non-deduplicated BAM files. If you need the full BAM files, LMK and we can arrange a transfer.
Specifically, please provide at least the following information:
Expected behavior
Correctly quantify results on deduplicated BAM.
Desktop (please complete the following information):
3.10.0-1160.11.1.el7.x86_64 #1 SMP Fri Dec 18 16:34:56 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Additional context
There's already an issue with RSEM described in the UMI-tools repository (CGATOxford/UMI-tools#465), maybe that's related.
CC @chripla
The text was updated successfully, but these errors were encountered: