Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev #473

Closed
wants to merge 19 commits into from
Closed

Dev #473

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,4 @@ results/
testing/
testing*
*.pyc
*.swp
11 changes: 5 additions & 6 deletions bin/concatenateVCFs.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
#!/usr/bin/env bash
set -euo pipefail

# This script concatenates all VCF files that are in the local directory,
# that were created from different intervals to make a single final VCF

Expand Down Expand Up @@ -49,8 +48,8 @@ if [ -z ${noInt+x} ]
then
# First make a header from one of the VCF
# Remove interval information from the GATK command-line, but leave the rest
FIRSTVCF=$(set +o pipefail; ls *.vcf | head -n 1)
sed -n '/^[^#]/q;p' $FIRSTVCF | \
FIRSTVCF=$(set +o pipefail; ls *.vcf.gz | head -n 1)
sed -n '/^[^#]/q;p' <(zcat $FIRSTVCF) | \
awk '!/GATKCommandLine/{print}/GATKCommandLine/{for(i=1;i<=NF;i++){if($i!~/intervals=/ && $i !~ /out=/){printf("%s ",$i)}}printf("\n")}' \
> header

Expand All @@ -65,7 +64,7 @@ then

for chr in "${CONTIGS[@]}"; do
# Skip if globbing would not match any file to avoid errors such as
# "ls: cannot access chr3_*.vcf: No such file or directory" when chr3
# "ls: cannot access chr3_*.vcf.gz: No such file or directory" when chr3
# was not processed.
pattern="*_${chr}_*.vcf"
if ! compgen -G "${pattern}" > /dev/null ; then continue; fi
Expand All @@ -76,12 +75,12 @@ then
# Determine length of header.
# The 'q' command makes sed exit when it sees the first non-header
# line, which avoids reading in the entire file.
L=$(sed -n '/^[^#]/q;p' ${vcf} | wc -l)
L=$(sed -n '/^[^#]/q;p' <(zcat ${vcf}) | wc -l)

# Then print all non-header lines. Since tail is very fast (nearly as
# fast as cat), this is way more efficient than using a single sed,
# awk or grep command.
tail -n +$((L+1)) ${vcf}
tail -n +$((L+1)) <(zcat ${vcf})
done
done
) | bgzip -@${cpus} > rawcalls.unsorted.vcf.gz
Expand Down
2 changes: 1 addition & 1 deletion conf/base.config
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ process {
time = { check_max( 4.h * task.attempt, 'time' ) }
shell = ['/bin/bash', '-euo', 'pipefail']

errorStrategy = { task.exitStatus in [143,137,104,134,139, 247] ? 'retry' : 'finish' }
errorStrategy = { task.exitStatus in [143,137,104,134,139,140,247] ? 'retry' : 'finish' }
maxRetries = 1
maxErrors = '-1'

Expand Down
Loading