GATK4 HaplotypeCaller step, in gVCF mode, first step for subsequent whole cohort Joint Genotyping, following in GATK Best Practices (step Call Variants Per-Sample).
Small pipeline to call recalibrated BAM, on a per sample basis, and store the gVCF. This pipeline will take advantage of a scatter-gather strategy. A subsequent pipeline will perform the full cohort calling with all the gVCF files.
- This pipeline is based on nextflow. As we have several nextflow pipelines, we have centralized the common information in the IARC-nf repository. Please read it carefully as it contains essential information for the installation, basic usage and configuration of nextflow and our pipelines.
- GATK4 executables
- Picard Tools
--input
: your intput BAM file(s) (do not forget the quotes for multiple BAM files e.g.--input "test_*.bam"
)--output_dir
: the folder that will contain your test_123.gVCF file or your test_001.gVCF, test_002.gVCF, ... files.--ref_fasta
: your reference in FASTA. Of course, be sure it is compatible (or the same) with the one that aligned your BAM file(s).--gatk_exec
: the full path to your GATK4 binary file.--picard_dir
: directory that containspicard.jar
--interval_list
: a file for the intervals to call on. More information on interval_list format.
A nextflow.config is also included, modify for suitability outside our pre-configured clusters (see Nexflow configuration).
nextflow run iarcbioinfo/gatk4-HaplotypeCaller.nf -profile cobalt --input "/data/test_*.bam" --output_dir myGVCFs --ref_fasta /ref/Homo_sapiens_assembly38.fasta --gatk_exec /bin/gatk-4.0.4.0/gatk --interval_list target.list