Option --no-sort-output with dedup #669

Rayan21100 · 2024-11-15T23:47:38Z

Hi everyone !

Thank you for this amazing and user friendly tool !

I'm doing bulk RNAseq analysis:

I used fastp for trimming and qc analysis
STAR for alignement
UMI Tools for deduplication
Salmon for quantification

In salmon documentation there is a note:

Read / alignment order

Salmon, like eXpress 1, uses a streaming inference method to perform transcript-level quantification. One of the fundamental assumptions of such inference methods is that observations (i.e. reads or alignments) are made “at random”. This means, for example, that alignments should not be sorted by target or position. If your reads or alignments do not appear in a random order with respect to the target transcripts, please randomize / shuffle them before performing quantification with Salmon.

I know my bam files are not sorted after STAR as I didn't use the sorted option. I saw that for UMI Tools there is the --no-sort-output but I didn't find how the sorting was done, ny name ? By genomic position ? Do you think I should precise --no-sort-output to use the output of dedup with salmon ?

Thanks in advance !

IanSudbery · 2024-11-17T09:50:49Z

Without the --no-sort-output the reads coming out of dedup will be sorted by genome position. Definately not right for salmon. However, even with it, the order still won't be fully random. Also, the pairing information isn't uite what salmon would expect (i.e. exactly one read 2 alignment per read 1 alignment, with the pairing information pointing at each other. I suggest that you add the following steps to your protocol:

I used fastp for trimming and qc analysis
STAR for alignement
UMI Tools for deduplication
samtools sort -n to sort by read name
UMI Tools prepare-for-rsem to make the pairing info more salmon friendly.
Salmon for quantification

Remeber you need to align to the transcriptome if you are going to use salmon for quantification.

Rayan21100 · 2024-11-17T11:03:36Z

Thank you for your answer ! So would you say that it's better to use the --no-sort-output ?
I was planning to use samtools collate to shuffle the reads but keeping them with their pairs but I have indeed pairs-related error with salmon so I might use prepare-for-rsem. Do you know if the reads will be randomize in that case ? I imagine that doing samtools sort -n will randomize them anyway (or maybe I could use samtools collate instead ?) but I was wondering how the order was kept with prepare-for-rsem

Thanks in advance !

Rayan21100 · 2024-11-17T11:50:43Z

Update: I tried both sort and collate before prepare-for-rsem and then I quantified with salmon. I didn't look exactly at the output of salmon but at least it's now running without error.
However I have a lot of warning during prepare-for-rsem in both cases:
2024-11-17 13:32:15,404 WARNING Alignment VH01309:279:AACMHJMHV:2:2506:64491:25895:UMI_ATTTTTTA 419 ENST00000619423 2053 has no mate -- skipped
I have 164 reads with no mates, do you think I should remove them with samtools view ?

IanSudbery · 2024-11-17T11:56:40Z

No, prepare-for-rsem should do that for you. Collate should be fine - we just need all reads with the same name to be together. Note that this will be more than just two reads from a pair - if you are aligning to a transcriptiome, reads will map multiple times so the different transcripts of the same gene.

Rayan21100 changed the title ~~Option --no-sort-output with deduce~~ Option --no-sort-output with dedup Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option --no-sort-output with dedup #669

Option --no-sort-output with dedup #669

Rayan21100 commented Nov 15, 2024 •

edited

Loading

IanSudbery commented Nov 17, 2024

Rayan21100 commented Nov 17, 2024 •

edited

Loading

Rayan21100 commented Nov 17, 2024

IanSudbery commented Nov 17, 2024

Option --no-sort-output with dedup #669

Option --no-sort-output with dedup #669

Comments

Rayan21100 commented Nov 15, 2024 • edited Loading

IanSudbery commented Nov 17, 2024

Rayan21100 commented Nov 17, 2024 • edited Loading

Rayan21100 commented Nov 17, 2024

IanSudbery commented Nov 17, 2024

Rayan21100 commented Nov 15, 2024 •

edited

Loading

Rayan21100 commented Nov 17, 2024 •

edited

Loading