-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strange behaviour of the sR_Bowtie output #47
Comments
I am away at a meeting and I will look at this tonight @slecrom However... are you sure that you are not mixing "read" and "alignment". When idea of "unique mappers" comes in, it is often the source of issues. |
I made a mistake using the -m option with the "Map with Bowtie" tool. So I launched it once again with the correct options and here are the two outputs from both tools: Using sR_Bowtie Using Map with Bowtie If the number of reads uniquely aligned are the same, however the second line "# reads with at least one…" is different. In addition, if I use the "BAM/SAM Mapping Stats reads mapping statistics for a provided BAM or SAM file" tool to compare the output of both BAM files, the total number of reads are different. Does the Bowtie output should retrieve all the reads even the none aligned ones? sR_Bowtie Total records: 6091998 QC failed: 0 mapq >= mapq_cut (unique): 3742792 Map with Bowtie Total records: 16518042 QC failed: 0 mapq >= mapq_cut (unique): 3742792 |
Did not yet had time to look in details. But believe me, sRbowtie is my oldest tools and I spent hundreds of hours on it look at its behavior. So, it is not "un argument d'autorité" but 1 thing is 100% sure: SRbowtie is a bowtie wrapper. This implies that if you do not have the same output with both tools, then it implies that different parameters have been passed to the bowtie. |
I want to characterise piRNAs as in the paper whose data is here: https://datadryad.org/stash/dataset/doi:10.5061/dryad.n2z34tn1p
I cleaned up the small RNAs against rRNAs, tRNAs, miRNAs and miscRNAs sequences and I made a selection of sizes between 23 and 29 nt.
I now want to retrieve unique reads from the Drosophila genome. So I used the sR_bowtie tool for small RNA short reads (Galaxy Version 2.3.0) with the option "Match unique mappers on DNA reference index" with 0 number of mismatches allowed. I asked for output in bam and also in fastq of the aligned files. I then ran the FastQC tool on these two output files.
I'll take the example of the ALBA25 sample, for which I have 16,518,042 as input. If I look at the output of sR_Bowtie I get 14,168,836 aligned reads.
# reads processed: 16518042
# reads with at least one alignment: 14168836 (85.7
But if I count the reads in the output files with the FastQC output I get 6,091,998 in the BAM, whereas I should get 14,168,836 reads and 3,742,792 in the Fastq aligned files.
Total Sequences 6091998 (BAM)
Total Sequences 3742792 (Fastq)
I used the "Map with Bowtie for Illumina tool (Galaxy Version 1.2.0)" to reproduce the alignment step with the same -v 0 and -m 1 parameters. I get the same output than using the sR_Bowtie tool.
# reads processed: 16518042
# reads with at least one reported alignment: 14168836 (85.78%)
If I use the "Samtools view" tool to keep only the aligned reads and run a FastQC, I get the expected number of sequences.
Total Sequences 14168836
Have you ever encountered this problem with the size of sR_Bowtie outputs?
The text was updated successfully, but these errors were encountered: