Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix coverageCalculations issue + check for bedfile #192

Merged
merged 7 commits into from
Mar 26, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ The bwa-mem command from Burrows-Wheeler Aligner(BWA) [[2]](#r2) is used to alig
The GATK [[4]](#r4) HaplotypeCaller estimates the most likely genotypes and allele frequencies in an alignment using a Bayesian likelihood model for every position of the genome regardless of whether a variant was detected at that site or not. This information can later be used in the project based genotyping step.
A joint analysis has been performed of all the samples in the project. This leads to a posterior probability of a variant allele at a site. SNPs and small Indels are written to a VCF file, along with information such as genotype quality, allele frequency, strand bias and read depth for that SNP/Indel. Based on quality thresholds from the GATK "best practices" [[5]](#r5). The SNPs and indels are filtered and marked as Lowqual or Pass resulting in a final VCF file.


### References
<a name="r1"> 1. Andrews S. (2010). FastQC: a quality control tool for high throughput sequence data. Available online at:http://www.bioinformatics.babraham.ac.uk/projects/fastqc </a>

Expand Down
2 changes: 1 addition & 1 deletion protocols/CoverageCalculations.sh
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ then

awk -v OFS='\t' '{print $1,$3}' "${sampleNameID}.${perTarget}.coveragePerTarget.sample_interval_summary" | sed '1d' > "${sampleNameID}.${perTarget}.coveragePerTarget.coveragePerTarget.txt.tmp.tmp"
sort -V "${sampleNameID}.${perTarget}.coveragePerTarget.coveragePerTarget.txt.tmp.tmp" > "${sampleNameID}.${perTarget}.coveragePerTarget.coveragePerTarget.txt.tmp"
perl -pi -e 's|-|\^|' "${perTargetDir}/${perTarget}.genesOnly" > "${sampleNameID}.${perTarget}.coveragePerTarget.genesOnly.tmp"
perl -p -e 's|-|\^|' "${perTargetDir}/${perTarget}.genesOnly" > "${sampleNameID}.${perTarget}.coveragePerTarget.genesOnly.tmp"
paste "${sampleNameID}.${perTarget}.coveragePerTarget.coveragePerTarget.txt.tmp" "${sampleNameID}.${perTarget}.coveragePerTarget.genesOnly.tmp" > "${sampleNameID}.${perTarget}.coveragePerTarget_inclGenes.txt"
##Paste command produces ^M character

Expand Down
27 changes: 25 additions & 2 deletions protocols/CreateExternSamplesProjects.sh
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,10 @@
#list lane
#string ngsUtilsVersion

#string dataDir
#string coveragePerBaseDir
#string coveragePerTargetDir

set -e
set -u

Expand Down Expand Up @@ -113,10 +117,29 @@ extract_samples_from_GAF_list.pl --i "${worksheet}" --o "${projectJobsDir}/${pro

batching="_small"

capturingKitProject=$(python ${EBROOTNGS_DNA}/scripts/getCapturingKit.py "${projectJobsDir}/${project}.csv")
capturingKitProject=$(python ${EBROOTNGS_DNA}/scripts/getCapturingKit.py "${projectJobsDir}/${project}.csv" | sed 's|\\||' )
captKit=$(echo "capturingKitProject" | awk 'BEGIN {FS="/"}{print $2}')

if [ ! -d "${dataDir}/${capturingKitProject}" ]
then
echo "Bedfile does not exist! Exiting"
exit 1
fi

if [[ "${capturingKitProject}" == *"Exoom"* || "${capturingKitProject}" == *"All_Exon_v1"* || "${capturingKitProject}" == *"wgs"* || "${capturingKitProject}" == *"WGS"* ]]
then
batching="_chr"
batching="_chr"
if [ ! -e "${coveragePerTargetDir}/${captKit}/${captKit}" ]
then
echo "Bedfile in ${coveragePerTargetDir} does not exist! Exiting"
exit 1
fi
else
if [ ! -e "${coveragePerBaseDir}/${captKit}/${captKit}" ]
then
echo "Bedfile in ${coveragePerBaseDir} does not exist! Exiting"
exit 1
fi
fi

if [ -f .compute.properties ];
Expand Down
27 changes: 26 additions & 1 deletion protocols/CreateInhouseProjects.sh
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,11 @@
#string ngsversion
#string ngsUtilsVersion

#string dataDir

#string coveragePerBaseDir
#string coveragePerTargetDir

#string project
#string logsDir

Expand Down Expand Up @@ -118,12 +123,32 @@ fi

batching="_small"

capturingKitProject=$(python ${EBROOTNGS_DNA}/scripts/getCapturingKit.py "${projectJobsDir}/${project}.csv")
capturingKitProject=$(python ${EBROOTNGS_DNA}/scripts/getCapturingKit.py "${projectJobsDir}/${project}.csv" | sed 's|\\||')
captKit=$(echo "capturingKitProject" | awk 'BEGIN {FS="/"}{print $2}')

if [ ! -d "${dataDir}/${capturingKitProject}" ]
then
echo "Bedfile does not exist! Exiting"
exit 1
fi

if [[ "${capturingKitProject}" == *"Exoom"* || "${capturingKitProject}" == *"All_Exon_v1"* || "${capturingKitProject}" == *"wgs"* || "${capturingKitProject}" == *"WGS"* ]]
then
batching="_chr"
if [ ! -e "${coveragePerTargetDir}/${captKit}/${captKit}" ]
then
echo "Bedfile in ${coveragePerTargetDir} does not exist! Exiting"
exit 1
fi
else
if [ ! -e "${coveragePerBaseDir}/${captKit}/${captKit}" ]
then
echo "Bedfile in ${coveragePerBaseDir} does not exist! Exiting"
exit 1
fi
fi


echo "BATCHIDLIST=${EBROOTNGS_DNA}/batchIDList${batching}.csv"

sh "${EBROOTMOLGENISMINCOMPUTE}/molgenis_compute.sh" \
Expand Down