Skip to content
This repository has been archived by the owner on Jan 27, 2020. It is now read-only.

GATK4 first round without MuTect1 and indel realignment #607

Merged
merged 41 commits into from
Aug 14, 2018
Merged

GATK4 first round without MuTect1 and indel realignment #607

merged 41 commits into from
Aug 14, 2018

Conversation

szilvajuhos
Copy link
Collaborator

Also have a look at the new container structure. I am trying to accommodate nf-core guidelines. alleleCount and ASCAT needs new bioconda recipes, but most of the other tools are in a collated (relatively big) container including GATK4, igvtools, etc.

annotate.nf Outdated
@@ -73,23 +73,23 @@ vcfToAnnotate = Channel.create()
vcfNotToAnnotate = Channel.create()

if (annotateVCF == []) {
// by default we annotate both germline and somatic results that we can find in the VariantCalling directory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, we annote all available vcfs by default, so it's not really a question of germline/somatic, but really more a question of which tools was run


LABEL \
author="Maxime Garcia" \
authors="Maxime.Gracia@scilifelab.se, Szilveszter.Juhos@scilifelab.se" \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in my name.

description="Image with tools used in Sarek" \
maintainer="maxime.garcia@scilifelab.se"
maintainers="Maxime.Gracia@scilifelab.se, Szilveszter.Juhos@scilifelab.se"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember well, we can use whichever label we want, but maintainer is meant to stay that way, because it's a port of the deprecated instruction MAINTAINER.
I do think we can use:

maintainer="Maxime Garcia <maxime.garcia@scilifelab.se>, Szilveszter Juhos <Szilveszter.Juhos@scilifelab.se>"

docker build -t szilvajuhos/sarek-vcfanno:latest .
docker images
docker push szilvajuhos/sarek-vcfanno:latest
singularity pull docker://szilvajuhos/sarek-vcfanno:latest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this script in this repo ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, of course not

germlineVC.nf Outdated
-L ${intervalBed} \
--dbsnp ${dbsnp} \
-O ${intervalBed.baseName}_${idSample}.g.vcf \
--emit-ref-confidence GVCF
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strange indentation

lib/QC.groovy Outdated
@@ -59,14 +59,14 @@ class QC {
// Get GATK version
static def getVersionGATK() {
"""
echo "GATK version"\$(java -jar \$GATK_HOME/GenomeAnalysisTK.jar --version 2>&1) > v_gatk.txt
gatk-launch ApplyBQSR --help 2>&1| awk -F/ '/java/{for(i=1;i<=NF;i++){if(\$i~/gatk4/){sub("gatk4-","",\$i);print \$i>"v_gatk.txt"}}}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can work out the regex in the Python script instead of doing it here, it'll make more sense

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the gatk-launch is GATK-provided, I do not want to fiddle with that. OTOH it would be nice if they would have a --version option :/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look more if there's something similar with the new GATK

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My feeling is that it is still the easiest way to have the version :/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking more of removing the awk part, and do the regex in the python script

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, sure, we can refactor it for the rest of the software later also.

@maxulysse
Copy link
Member

Quite an impressive work.
Well done @szilvajuhos \o/

- conda-forge::openjdk=8.0.144 # Needed for FastQC docker - see bioconda/bioconda-recipes#5026
- fastqc=0.11.7
- freebayes=1.2.0
- gatk4=4.0.3.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the 4.0.4.0, the executable is back to being gatk and not gatck-launch anymore

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine, will change the name in processes as well. In fact we have 4.0.6.0 also

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even better ;-)

- fastqc=0.11.7
- freebayes=1.2.0
- gatk4=4.0.3.0
- htslib=1.7
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use the 1.8 here.
If I remember well, htslib, bcftools and samtools can all have the same version

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, done but will check since I got a feeling that 1.8 has compatibility issues

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, strange, but good to know if you can confirm that

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is even 1.9 out already! You could already skip 1.8 ...
https://github.com/samtools/samtools/releases/

- gatk4=4.0.3.0
- htslib=1.7
- igvtools=2.3.93
- manta=1.3.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're updating, we can try the 1.4.0

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, done

@@ -14,7 +14,7 @@ env {
params {
genome_base = params.genome == 'GRCh37' ? '/sw/data/uppnex/ToolBox/ReferenceAssemblies/hg38make/bundle/2.8/b37' : params.genome == 'GRCh38' ? '/sw/data/uppnex/ToolBox/hg38bundle' : 'References/smallGRCh37'
singleCPUMem = 8.GB
totalMemory = 104.GB // change to 240 on irma
totalMemory = 92.GB // change to 240 on irma
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you change to 92?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by mistake

build.sh
COPY environment.yml /
RUN conda env update -n root -f /environment.yml && conda clean -a
ENV PATH /opt/conda/bin:$PATH
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tried, and was not working as expected, so I prefer to leave it as it is now, and improve when needed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will keep now with the ENV

@@ -0,0 +1,24 @@
# You can use this file to create a conda environment for this pipeline:
# conda env create -f environment.yml
name: sarek-core
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should specify a version here
so I would go for sarek-core-dev or sarek-core-2.1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just leave as sarek ?

@maxulysse maxulysse merged commit 924b90a into SciLifeLab:master Aug 14, 2018
jherrero referenced this pull request in UCL-BLIC/Sarek_v2.2.1 Apr 11, 2019
GATK4 first round without MuTect1 and indel realignment
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants