-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multicore Bismark corrupts result files if one of child process fails #494
Comments
Log file |
Hi Roman, thanks for reporting this unwanted behaviour in such meticulous detail. It is true that the parallel processing does little (if any) to ensure that the output hasn't been corrupted, e.g. as a result of insufficient disk resources etc. In that sense, I think it is a good idea to simply kill the job to alert the user that something has gone wrong and needs to be looked at. I shall accept your PR now, and close this issue. |
Fix for #494: Multicore Bismark corrupts result files if one of child process fails
Hi, I'm using Bismark (0.23.0) in Snakemake pipeline to process many of the WGBS. I've noticed that some of my BAMs have fewer reads than expected, and ambiguous/unaligned FASTQ files have invalid format. I run my bismark in
multicore
mode. When I checked logs, I noticed that 2 of 4 cores failed withChild process terminated with exit signal: '13'
but the main Bismark process continued working and returned 0 exit code. As a result I got an incorrect result which hard to notice in batch processing mode. In this case Bismark should fail with an error. Child cores failed likely due to cluster issues (e.g. lack of HDD space or etc), I successfully re-aligned one of the problematic samples and got correct results and files.Some details:
bismark -X 600 --gzip --multicore 4 --ambiguous --unmapped --bowtie2 $(dirname resources/indexes/hg38/Bisulfite_Genome) -1 data/reads/wgbs_y23_pooled/Clean/FE04/FE04_1.fq.gz -2 data/reads/wgbs_y23_pooled/Clean/FE04/FE04_2.fq.gz
As a result:
FE04_hg38_PE_report.txt
)Number of lines in *.fq.gz files cannot be divided by 4:
Here is the place where format is corrupted:
![F64471F3-7CA7-4EFC-9321-3D36C2B094D6](https://user-images.githubusercontent.com/72933/173064121-44baeca3-40bb-4c9a-995c-c7c676097ee5.png)
I noticed this file format issue only due to FASTQC error on ambiguous reads. Otherwise the truncated BAM file could go to downstream:
The text was updated successfully, but these errors were encountered: