Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange behaviour of the sR_Bowtie output #47

Open
slecrom opened this issue Nov 8, 2023 · 3 comments
Open

Strange behaviour of the sR_Bowtie output #47

slecrom opened this issue Nov 8, 2023 · 3 comments

Comments

@slecrom
Copy link

slecrom commented Nov 8, 2023

I want to characterise piRNAs as in the paper whose data is here: https://datadryad.org/stash/dataset/doi:10.5061/dryad.n2z34tn1p

I cleaned up the small RNAs against rRNAs, tRNAs, miRNAs and miscRNAs sequences and I made a selection of sizes between 23 and 29 nt.

I now want to retrieve unique reads from the Drosophila genome. So I used the sR_bowtie tool for small RNA short reads (Galaxy Version 2.3.0) with the option "Match unique mappers on DNA reference index" with 0 number of mismatches allowed. I asked for output in bam and also in fastq of the aligned files. I then ran the FastQC tool on these two output files.

I'll take the example of the ALBA25 sample, for which I have 16,518,042 as input. If I look at the output of sR_Bowtie I get 14,168,836 aligned reads.
# reads processed: 16518042
# reads with at least one alignment: 14168836 (85.7

But if I count the reads in the output files with the FastQC output I get 6,091,998 in the BAM, whereas I should get 14,168,836 reads and 3,742,792 in the Fastq aligned files.
Total Sequences 6091998 (BAM)
Total Sequences 3742792 (Fastq)

I used the "Map with Bowtie for Illumina tool (Galaxy Version 1.2.0)" to reproduce the alignment step with the same -v 0 and -m 1 parameters. I get the same output than using the sR_Bowtie tool.
# reads processed: 16518042
# reads with at least one reported alignment: 14168836 (85.78%)

If I use the "Samtools view" tool to keep only the aligned reads and run a FastQC, I get the expected number of sequences.
Total Sequences 14168836

Have you ever encountered this problem with the size of sR_Bowtie outputs?

@drosofff
Copy link
Member

drosofff commented Nov 8, 2023

I am away at a meeting and I will look at this tonight @slecrom

However... are you sure that you are not mixing "read" and "alignment". When idea of "unique mappers" comes in, it is often the source of issues.

@slecrom
Copy link
Author

slecrom commented Nov 9, 2023

I made a mistake using the -m option with the "Map with Bowtie" tool. So I launched it once again with the correct options and here are the two outputs from both tools:

Using sR_Bowtie
Setting the index via positional argument will be deprecated in a future release. Please use -x option instead.
# reads processed: 16518042
# reads with at least one alignment: 14168836 (85.78%)
# reads that failed to align: 2349206 (14.22%)
# reads with alignments suppressed due to -m: 10426044 (63.12%)
Reported 3742792 alignments

Using Map with Bowtie
# reads processed: 16518042
# reads with at least one reported alignment: 3742792 (22.66%)
# reads that failed to align: 2349206 (14.22%)
# reads with alignments suppressed due to -m: 10426044 (63.12%)
Reported 3742792 alignments to 1 output stream(s)

If the number of reads uniquely aligned are the same, however the second line "# reads with at least one…" is different. In addition, if I use the "BAM/SAM Mapping Stats reads mapping statistics for a provided BAM or SAM file" tool to compare the output of both BAM files, the total number of reads are different. Does the Bowtie output should retrieve all the reads even the none aligned ones?

sR_Bowtie

Total records: 6091998

QC failed: 0
Optical/PCR duplicate: 0
Non primary hits 0
Unmapped reads: 2349206
mapq < mapq_cut (non-unique): 0

mapq >= mapq_cut (unique): 3742792
Read-1: 0
Read-2: 0
Reads map to '+': 2129005
Reads map to '-': 1613787
Non-splice reads: 3742792
Splice reads: 0
Reads mapped in proper pairs: 0
Proper-paired reads map to different chrom:0

Map with Bowtie

Total records: 16518042

QC failed: 0
Optical/PCR duplicate: 0
Non primary hits 0
Unmapped reads: 12775250
mapq < mapq_cut (non-unique): 0

mapq >= mapq_cut (unique): 3742792
Read-1: 0
Read-2: 0
Reads map to '+': 2129005
Reads map to '-': 1613787
Non-splice reads: 3742792
Splice reads: 0
Reads mapped in proper pairs: 0
Proper-paired reads map to different chrom:0

@drosofff
Copy link
Member

drosofff commented Nov 9, 2023

Did not yet had time to look in details. But believe me, sRbowtie is my oldest tools and I spent hundreds of hours on it look at its behavior. So, it is not "un argument d'autorité" but 1 thing is 100% sure:

SRbowtie is a bowtie wrapper. This implies that if you do not have the same output with both tools, then it implies that different parameters have been passed to the bowtie.
By memory, the option that you are using with sRbowie is not -m but rather -M. To be checked...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants