Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't find first and/or second of pair error #45

Open
dheerajbobbili1988 opened this issue May 14, 2020 · 1 comment
Open

Can't find first and/or second of pair error #45

dheerajbobbili1988 opened this issue May 14, 2020 · 1 comment

Comments

@dheerajbobbili1988
Copy link

dheerajbobbili1988 commented May 14, 2020

Hi,

I am trying to realign a bam file to a new reference and in the process I would like to use samblaster. When I use the command below, I am running into this error.

samblaster: Loaded 84 header sequence entries.
samblaster: Can't find first and/or second of pair in sam block of length 1 for id: SRR622461.3665340
samblaster:    At location: *:0
samblaster:    Are you sure the input is sorted by read ids?samblaster: Exiting early, the following stats are for processing preceeding the error
samblaster: Marked           0 of        297 (0.000%) total read ids as duplicates using 1620k memory in 0.001S CPU seconds and 13M33S(813S) wall time.
samblaster: Premature exit (return code 1).

Here is my command

samtools collate -uOn128 NA12878.chrom20.ILLUMINA.bwa.CEU.low_coverage.20121211.bam NA12878.chrom20.ILLUMINA.bwa.CEU.low_coverage.20121211.collate | samtools fastq - | bwa mem -pt20 -R  '@RG\tID:NA12878.chrom20.ILLUMINA.bwa.CEU.low_coverage.20121211\tLB:NA12878.chrom20.ILLUMINA.bwa.CEU.low_coverage.20121211\tSM:NA12878.chrom20.ILLUMINA.bwa.CEU.low_coverage.20121211\tPL:ILLUMINA' -M human_g1k_v37_Ensembl_MT_66.fasta - | samtools sort --threads=20 -m4G -n -O sam | samblaster -M | samtools view -Sb - > NA12878.chrom20.ILLUMINA.bwa.CEU.low_coverage.20121211.marked.bam

Now, my question is in this scenario can I safely add "--ignoreUnmated" flag or is samblaster is not suited for this purpose. Please let me know.

@GregoryFaust
Copy link
Owner

I think it is fine to try samblaster here with the --ignoreUnmated option. I see two issues:

  1. Since it appears you have pulled reads only from chrom20, there are bound to be reads in the input that have their mate aligned to a different chromosome in the original reference. This is what is probably causing your unmated reads. I guess the fact that the first unmated read shown appears to be unaligned is due to the change in reference. Also, I hope the samtools fastq command will work on such input with unmated pairs. In the samblaster output stats, you should see the number of unmated pairs as a percent of all pairs and you can then judge if you think this chrom20 selection is indeed the issue. I suggest redirecting stderr in the samblaster command to capture these stats.

  2. You don't need to sort into name order before using samblaster to mark duplicates. The samtools collate command will already make the input "read-id grouped" which is all samblaster requires, and BWA will not change the order of the reads in the output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants