From da5db45475b9b396176831c708b4ec125fd434f0 Mon Sep 17 00:00:00 2001 From: asp8200 Date: Thu, 15 Jun 2023 11:02:12 +0000 Subject: [PATCH 01/21] Updating docs/usage.md with Sentieon-related info --- docs/usage.md | 34 ++++++++++++++++++++++++++++++---- 1 file changed, 30 insertions(+), 4 deletions(-) diff --git a/docs/usage.md b/docs/usage.md index 541219ec4..c4dddb657 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -305,6 +305,36 @@ test,sample4_vs_sample3,manta,sample4_vs_sample3.diploid_sv.vcf.gz test,sample4_vs_sample3,manta,sample4_vs_sample3.somatic_sv.vcf.gz ``` +## Sentieon +[Sentieon](https://www.sentieon.com/) is a commercial solution to process genomics data with high computing efficiency, fast turnaround time, exceptional high accuracy, and 100% consistency. + +In particular, Sentieon contains what may be view as speedup version of some standard GATK tools, like bwamem and haplotyper. Sarek now contains support for some of modules of functionality from Sentieon. In order to use the Sentieon modules of Sarek, the user will need to supply the Sarek pipeline with a license for Sentieon. + +### Setup of Sentieon license for Sarek + +Sentieon supply license in the form of a string-value (a url) or a file. It should be base64-encoded and stored in a nextflow secret named `SENTIEON_LICENSE_BASE64`. If a license string (url) is supplied, then the nextflow secret should be set like this: + +```bash +nextflow secret set SENTIEON_LICENSE_BASE64 $(echo -n | base64 -w 0) +``` +If a license file is supplied, then the nextflow secret should be set like this: + +```bash +nextflow secrets set SENTIEON_LICENSE_BASE64 \$(cat | base64 -w 0) +``` +### Available Sentieon functions +Sarek contains the following Sentieon functions [bwa mem](https://support.sentieon.com/manual/usages/general/#bwa-mem-syntax), [LocusCollector](https://support.sentieon.com/manual/usages/general/#locuscollector-algorithm) + [Dedup](https://support.sentieon.com/manual/usages/general/#dedup-algorithm), [Haplotyper](https://support.sentieon.com/manual/usages/general/#haplotyper-algorithm), [GVCFtyper](https://support.sentieon.com/manual/usages/general/#gvcftyper-algorithm) and [VarCal](https://support.sentieon.com/manual/usages/general/#varcal-algorithm), so the basic processing of alignment of fastq-files to vcf-files can be done using speedup Sentieon functions. + +### Basic usage of Sentieon functions in Sarek + +To use Sentieon's aligner `bwa mem`, set the aligner option `sentieon-bwamem`. (This can, for example, be done by adding `--aligner sentieon-bwamem` to the nextflow run command.) + +To use Sentieon's function `Dedup`, specify `sentieon_dedup` as one of the tools. (This can, for example, be done by adding `--tools sentieon_dedup` to the nextflow run command.) + +To use Sentieon's function `Haplotyper`, specify `sentieon_haplotyper` as one of the tools. (This can, for example, be done by adding `--tools sentieon_haplotyper` to the nextflow run command.) (In order to skip the GATK-based variant-filer one may add `--skip_tools haplotyper_filter` to the nextflow run command.) Sarek also provides the option `sentieon_haplotyper_emit_mode` which can be used to set the [emit-mode](https://support.sentieon.com/manual/usages/general/#haplotyper-algorithm) of Sentieon's haplotyper. Sentieon's haplotyper can output both a vcf-file and a gvcf-file in the same run; this is achieved by setting `sentieon_haplotyper_emit_mode` to `,gvcf`, where `` is `variant`, `confident` or `all`. + +To use Sentieon's function `GVCFtyper` along with Sention's version of VQSR (`VarCal` and `ApplyVarCal`) for joint-germline genotyping, specify `sentieon_haplotyper` as one of the tools, set the option `sentieon_haplotyper_emit_mode` to `gvcf`, and add the option `joint_germline`. This can, for example, be done by adding `--tools sentieon_haplotyper --joint_germline --sentieon_haplotyper_emit_mode gvcf` to the nextflow run command.) + ## Updating the pipeline When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline: @@ -1006,7 +1036,3 @@ ERRORS: Some errors were detected Error type Number of errors ERROR_CHROMOSOME_NOT_FOUND 17522411 ``` - -## How to set up sarek to use sentieon - -Sarek is currently not supporting sentieon. It is planned for the upcoming release 3.3. In the meantime, please revert to the last release 2.7.2. From f60b3822ad130ffa81571acdc486f7f15f67b768 Mon Sep 17 00:00:00 2001 From: asp8200 Date: Fri, 16 Jun 2023 09:47:13 +0000 Subject: [PATCH 02/21] WIP: Adding info about Sentieon in docs/output.md --- docs/output.md | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/docs/output.md b/docs/output.md index 1a033e9f2..36beda4ae 100644 --- a/docs/output.md +++ b/docs/output.md @@ -18,8 +18,10 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d - [BWA](#bwa) - [BWA-mem2](#bwa-mem2) - [DragMap](#dragmap) + - [Sentieon bwa mem](#sentieon-bwa-mem) - [Duplicate Marking](#mark-duplicates) - [GATK MarkDuplicates (Spark)](#gatk-markduplicates-spark) + - [Sentieon LocusCollector and Dedup](#sentieon-locuscollector-dedup) - [Base Quality Score Recalibration](#base-quality-score-recalibration) - [GATK BaseRecalibrator (Spark)](#gatk-baserecalibrator-spark) - [GATK ApplyBQSR (Spark)](#gatk-applybqsr-spark) @@ -29,6 +31,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d - [DeepVariant](#deepvariant) - [FreeBayes](#freebayes) - [GATK HaplotypeCaller](#gatk-haplotypecaller) + - [Sentieon Haplotyper](#sentieon-haplotyper) - [GATK Mutect2](#gatk-mutect2) - [samtools mpileup](#samtools-mpileup) - [Strelka2](#strelka2) @@ -150,30 +153,35 @@ These files are intermediate and by default not placed in the output-folder kept [BWA](https://github.com/lh3/bwa) is a software package for mapping low-divergent sequences against a large reference genome. The aligned reads are then coordinate-sorted (or name-sorted if [`GATK MarkDuplicatesSpark`](https://gatk.broadinstitute.org/hc/en-us/articles/5358833264411-MarkDuplicatesSpark) is used for duplicate marking) with [samtools](https://www.htslib.org/doc/samtools.html). -These files are intermediate and by default not placed in the output-folder kept in the final files delivered to users. Set `--save_mapped` to enable publishing in CRAM format, furthermore add the flag `save_output_as_bam` for publishing in BAM format. - #### BWA-mem2 [BWA-mem2](https://github.com/bwa-mem2/bwa-mem2) is a software package for mapping low-divergent sequences against a large reference genome.The aligned reads are then coordinate-sorted (or name-sorted if [`GATK MarkDuplicatesSpark`](https://gatk.broadinstitute.org/hc/en-us/articles/5358833264411-MarkDuplicatesSpark) is used for duplicate marking) with [samtools](https://www.htslib.org/doc/samtools.html). -These files are intermediate and by default not placed in the output-folder kept in the final files delivered to users. Set `--save_mapped` to enable publishing, furthermore add the flag `save_output_as_bam` for publishing in BAM format. - #### DragMap [DragMap](https://github.com/Illumina/dragmap) is an open-source software implementation of the DRAGEN mapper, which the Illumina team created so that we would have an open-source way to produce the same results as their proprietary DRAGEN hardware. The aligned reads are then coordinate-sorted (or name-sorted if [`GATK MarkDuplicatesSpark`](https://gatk.broadinstitute.org/hc/en-us/articles/5358833264411-MarkDuplicatesSpark) is used for duplicate marking) with [samtools](https://www.htslib.org/doc/samtools.html). These files are intermediate and by default not placed in the output-folder kept in the final files delivered to users. Set `--save_mapped` to enable publishing, furthermore add the flag `save_output_as_bam` for publishing in BAM format. +#### Sentieon BWA mem + +Sentieon's [bwa mem](https://support.sentieon.com/manual/usages/general/#bwa-mem-syntax) is subroutine for mapping low-divergent sequences against a large reference genome. It is part of the proprietary software package [Sentieon](https://www.sentieon.com/). + +The aligned reads are then coordinate-sorted with `sentieon util sort`. +
Output files for all mappers and samples +The alignment files (BAM or CRAM) produced by the chosen aligner are, by default, not published, that is, they are not placed in the output-folder (`outdir`), but by setting `--save_mapped` the alignment files are published in CRAM format or, by additional setting `--save_output_as_bam`, in BAM format. + + **Output directory: `{outdir}/preprocessing/mapped//`** -- if `--save_mapped`: `.cram` and `.cram.crai` +- if `--save_mapped`: `.sorted.cram` and `.sorted.cram.crai` - CRAM file and index -- if `--save_mapped --save_output_as_bam`: `.bam` and `.bam.bai` +- if `--save_mapped --save_output_as_bam`: `.sorted.bam` and `.sorted.bam.bai` - BAM file and index
From bd0d9bfc92729af728bef2e3714e66b528ffc427 Mon Sep 17 00:00:00 2001 From: asp8200 Date: Fri, 16 Jun 2023 09:56:19 +0000 Subject: [PATCH 03/21] Removing legacy module-imports from BAM_SENTIEON_DEDUP --- subworkflows/local/bam_sentieon_dedup/main.nf | 2 -- 1 file changed, 2 deletions(-) diff --git a/subworkflows/local/bam_sentieon_dedup/main.nf b/subworkflows/local/bam_sentieon_dedup/main.nf index 72d7f0408..42c584fdb 100644 --- a/subworkflows/local/bam_sentieon_dedup/main.nf +++ b/subworkflows/local/bam_sentieon_dedup/main.nf @@ -5,10 +5,8 @@ // A when clause condition is defined in the conf/modules.config to determine if the module should be run include { CRAM_QC_MOSDEPTH_SAMTOOLS } from '../cram_qc_mosdepth_samtools/main' -include { GATK4_MARKDUPLICATES } from '../../../modules/nf-core/gatk4/markduplicates/main' include { SENTIEON_DEDUP } from '../../../modules/nf-core/sentieon/dedup/main' include { SAMTOOLS_INDEX as INDEX_INPUT } from '../../../modules/nf-core/samtools/index/main' -include { SAMTOOLS_INDEX as INDEX_MARKDUPLICATES } from '../../../modules/nf-core/samtools/index/main' workflow BAM_SENTIEON_DEDUP { take: From 5adbf9880e44eeb4a5b990847b046f60f3d39f60 Mon Sep 17 00:00:00 2001 From: Anders Sune Pedersen Date: Fri, 23 Jun 2023 14:29:56 +0200 Subject: [PATCH 04/21] Dedup and QualCal --- docs/output.md | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/docs/output.md b/docs/output.md index 36beda4ae..17b8871ba 100644 --- a/docs/output.md +++ b/docs/output.md @@ -165,7 +165,7 @@ These files are intermediate and by default not placed in the output-folder kept #### Sentieon BWA mem -Sentieon's [bwa mem](https://support.sentieon.com/manual/usages/general/#bwa-mem-syntax) is subroutine for mapping low-divergent sequences against a large reference genome. It is part of the proprietary software package [Sentieon](https://www.sentieon.com/). +Sentieon's [bwa mem](https://support.sentieon.com/manual/usages/general/#bwa-mem-syntax) is a subroutine for mapping low-divergent sequences against a large reference genome. It is part of the proprietary software package [Sentieon](https://www.sentieon.com/). The aligned reads are then coordinate-sorted with `sentieon util sort`. @@ -211,6 +211,26 @@ The resulting CRAM files are delivered to the users. +### Sentieon's LocusCollector and Dedup + +The subroutines LocusCollector and Dedup are part of Sentieon DNAseq packages with speedup versions of the standard GATK tools, and together those two subroutines correspond to GATK's MarkDuplicates. + +The subroutine [LocusCollector](https://support.sentieon.com/manual/usages/general/#driver-algorithm-syntax) collects read information that will be used for removing or marking of duplicate reads; its output is the score file indicating which reads are likely duplicates. + +The subroutine [Dedup](https://support.sentieon.com/manual/usages/general/#dedup-algorithm) marks or removes duplicate reads based no the score file supplied by LocusCollector, and produces a BAM or CRAM file. + +
+Output files for all samples + +**Output directory: `{outdir}/preprocessing/sentieon_dedup//`** + +- `.dedup.cram` and `.dedup.cram.crai` + - CRAM file and index +- if `--save_output_as_bam`: + - `.md.bam` and `.dedup.bam.bai` + +
+ ### Base Quality Score Recalibration During Base Quality Score Recalibration, systematic errors in the base quality scores are corrected by applying machine learning to detect and correct for them. This is important for evaluating the correct call of a variant during the variant discovery process. However, this is not needed for all combinations of tools in Sarek. Notably, this should be turned off when having UMI tagged reads or using DragMap (see [here](https://gatk.broadinstitute.org/hc/en-us/articles/4407897446939--How-to-Run-germline-single-sample-short-variant-discovery-in-DRAGEN-mode)) as mapper. @@ -276,6 +296,9 @@ See the [`--input`](usage.md#--input) section in the usage documentation for fur - CSV containing an entry for each sample with the columns `patient,sample,vcf` +#### Sentieon QualCal (BQSR) +Currently, Sentieon's version of BQSR, QualCal, is not available in Sarek. Recent Illumina sequencers tend to provide well-calibrated BQs, so BQSR may not provide much benefit. By default Sarek runs GATK's BQSR; that can be skipped by adding the option `--skip_tools baserecalibrator`. + ## Variant Calling The results regarding variant calling are collected in `{outdir}/variantcalling/`. From c2b6b6f4537a09ff41539eeabcd9004a3d7edc74 Mon Sep 17 00:00:00 2001 From: Anders Sune Pedersen Date: Fri, 23 Jun 2023 16:29:29 +0200 Subject: [PATCH 05/21] Sentieon Haplotyper Sentieon joint_germline etc --- docs/output.md | 69 ++++++++++++++++++++++++++++++++++++++++++++++---- docs/usage.md | 4 +-- 2 files changed, 66 insertions(+), 7 deletions(-) diff --git a/docs/output.md b/docs/output.md index 17b8871ba..55d4cbfd2 100644 --- a/docs/output.md +++ b/docs/output.md @@ -165,9 +165,9 @@ These files are intermediate and by default not placed in the output-folder kept #### Sentieon BWA mem -Sentieon's [bwa mem](https://support.sentieon.com/manual/usages/general/#bwa-mem-syntax) is a subroutine for mapping low-divergent sequences against a large reference genome. It is part of the proprietary software package [Sentieon](https://www.sentieon.com/). +Sentieon [bwa mem](https://support.sentieon.com/manual/usages/general/#bwa-mem-syntax) is a subroutine for mapping low-divergent sequences against a large reference genome. It is part of the proprietary software package [DNAseq](https://www.sentieon.com/detailed-description-of-pipelines/#dnaseq) from [Sentieon](https://www.sentieon.com/). -The aligned reads are then coordinate-sorted with `sentieon util sort`. +The aligned reads are coordinate-sorted with Sentieon.
Output files for all mappers and samples @@ -211,7 +211,7 @@ The resulting CRAM files are delivered to the users.
-### Sentieon's LocusCollector and Dedup +### Sentieon LocusCollector and Dedup The subroutines LocusCollector and Dedup are part of Sentieon DNAseq packages with speedup versions of the standard GATK tools, and together those two subroutines correspond to GATK's MarkDuplicates. @@ -276,7 +276,7 @@ The resulting recalibrated CRAM files are delivered to the user. Recalibrated CR The CSV files are auto-generated and can be used by Sarek for further processing and/or variant calling. -See the [`--input`](usage.md#--input) section in the usage documentation for further reading and documentation on how to make the most of them. +See the [`input`](usage#input-sample-sheet-configurations) section in the usage documentation for further reading and documentation on how to make the most of them.
Output files: @@ -374,12 +374,20 @@ If the haplotype-called VCF files are not filtered, then Sarek should be run wit [GATK Joint germline Variant Calling](https://gatk.broadinstitute.org/hc/en-us/articles/360035535932-Germline-short-variant-discovery-SNPs-Indels-) uses Haplotypecaller per sample in `gvcf` mode. Next, the gVCFs are consolidated from multiple samples into a [GenomicsDB](https://gatk.broadinstitute.org/hc/en-us/articles/5358869876891-GenomicsDBImport) datastore. After joint [genotyping](https://gatk.broadinstitute.org/hc/en-us/articles/5358906861083-GenotypeGVCFs), [VQSR](https://gatk.broadinstitute.org/hc/en-us/articles/5358906115227-VariantRecalibrator) is applied for filtering to produce the final multisample callset with the desired balance of precision and sensitivity. +
+Output files from joint germline variant callling + **Output directory: `{outdir}/variantcalling/haplotypecaller//`** +- `.haplotypecaller.g.vcf.gz` and `.haplotypecaller.g.vcf.gz.tbi` + - VCF with tabix index + +**Output directory: `{outdir}/variantcalling/sentieon_haplotyper/joint_variant_calling/`** + - `joint_germline.vcf.gz` and `joint_germline.vcf.gz.tbi` - VCF with tabix index - `joint_germline_recalibrated.vcf.gz` and `joint_germline_recalibrated.vcf.gz.tbi` - - variant recalibrated VCF with tabix index + - variant recalibrated VCF with tabix index (if VQSR is applied)
@@ -430,6 +438,57 @@ For further reading and documentation see the [samtools manual](https://www.htsl
+#### Sentieon Haplotyper + +[Sentieon Haplotyper](https://support.sentieon.com/manual/usages/general/#haplotyper-algorithm) is Sention's speedup version of GATK's Haplotypecaller (see above). + +
+Unfiltered VCF-files for normal samples + +**Output directory: `{outdir}/variantcalling/sentieon_haplotyper//`** + +- `.haplotyper.unfiltered.vcf.gz` and `.haplotyper.unfiltered.vcf.gz.tbi` + - VCF with tabix index + +
+ +The output from Sentieon's Haplotyper can be controlled through the option `--sentieon_haplotyper_emit_mode` for Sarek, see [Basic usage of Sentieon functions in Sarek](https://github.com/nf-core/sarek/blob/sentieon_docs/docs/usage.md#basic-usage-of-sentieon-functions-in-sarek). + +Unless `haplotyper_filter` is listed under `--skip_tools` in the nextflow command, GATK's CNNScoreVariants and FilterVariantTranches (see above) will applied to the unfiltered VCF-files obtained filtered vcf-files. + +
+Filtered VCF-files for normal samples + +**Output directory: `{outdir}/variantcalling/sentieon_haplotyper//`** + +- `.haplotyper.filtered.vcf.gz` and `.haplotyper.filtered.vcf.gz.tbi` + - VCF with tabix index + +
+ +##### Sentieon Joint Germline Variant Calling + +In Sentieon's package DNAseq, joint germline variant calling is done by first running Sentieon's Haplotyper in emit-mode `gvcf` for each sample, and then running Sentieon's [GVCFtyper](https://support.sentieon.com/manual/usages/general/#gvcftyper-algorithm) on the set of gVCF-files. See [Basic usage of Sentieon functions in Sarek](https://github.com/nf-core/sarek/blob/sentieon_docs/docs/usage.md#basic-usage-of-sentieon-functions-in-sarek) for information on how joint germline variant callling can be done in Sarek using Sentieon's DNAseq. + +Sarek's implementation of joint germline variant calling using DNAseq does not include the usage of [GenomicsDB](https://gatk.broadinstitute.org/hc/en-us/articles/5358869876891-GenomicsDBImport) datastore. After joint genotyping, Sentieon's version of VQSR ([VarCal](https://support.sentieon.com/manual/usages/general/#varcal-algorithm) and [ApplyVarCal](https://support.sentieon.com/manual/usages/general/#applyvarcal-algorithm)) is applied for filtering to produce the final multisample callset with the desired balance of precision and sensitivity. + +
+Output files from joint germline variant callling + +**Output directory: `{outdir}/variantcalling/sentieon_haplotyper//`** + +- `.haplotypecaller.g.vcf.gz` and `.haplotypecaller.g.vcf.gz.tbi` + - VCF with tabix index + +**Output directory: `{outdir}/variantcalling/sentieon_haplotyper/joint_variant_calling/`** + +- `joint_germline.vcf.gz` and `joint_germline.vcf.gz.tbi` + - VCF with tabix index +- `joint_germline_recalibrated.vcf.gz` and `joint_germline_recalibrated.vcf.gz.tbi` + - variant recalibrated VCF with tabix index (if VarCal is applied) + +
+ #### Strelka2 [Strelka2](https://github.com/Illumina/strelka) is a fast and accurate small variant caller optimized for analysis of germline variation in small cohorts and somatic variation in tumor/normal sample pairs. For further reading and documentation see the [Strelka2 user guide](https://github.com/Illumina/strelka/blob/master/docs/userGuide/README.md). If [Strelka2](https://github.com/Illumina/strelka) is used for somatic variant calling and [Manta](https://github.com/Illumina/manta) is also specified in tools, the output candidate indels from [Manta](https://github.com/Illumina/manta) are used according to [Strelka Best Practices](https://github.com/Illumina/strelka/blob/master/docs/userGuide/README.md#somatic-configuration-example). diff --git a/docs/usage.md b/docs/usage.md index c4dddb657..0e4eb81ee 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -323,7 +323,7 @@ If a license file is supplied, then the nextflow secret should be set like this: nextflow secrets set SENTIEON_LICENSE_BASE64 \$(cat | base64 -w 0) ``` ### Available Sentieon functions -Sarek contains the following Sentieon functions [bwa mem](https://support.sentieon.com/manual/usages/general/#bwa-mem-syntax), [LocusCollector](https://support.sentieon.com/manual/usages/general/#locuscollector-algorithm) + [Dedup](https://support.sentieon.com/manual/usages/general/#dedup-algorithm), [Haplotyper](https://support.sentieon.com/manual/usages/general/#haplotyper-algorithm), [GVCFtyper](https://support.sentieon.com/manual/usages/general/#gvcftyper-algorithm) and [VarCal](https://support.sentieon.com/manual/usages/general/#varcal-algorithm), so the basic processing of alignment of fastq-files to vcf-files can be done using speedup Sentieon functions. +Sarek contains the following Sentieon functions [bwa mem](https://support.sentieon.com/manual/usages/general/#bwa-mem-syntax), [LocusCollector](https://support.sentieon.com/manual/usages/general/#locuscollector-algorithm) + [Dedup](https://support.sentieon.com/manual/usages/general/#dedup-algorithm), [Haplotyper](https://support.sentieon.com/manual/usages/general/#haplotyper-algorithm), [GVCFtyper](https://support.sentieon.com/manual/usages/general/#gvcftyper-algorithm) and [VarCal](https://support.sentieon.com/manual/usages/general/#varcal-algorithm) + [ApplyVarCal](https://support.sentieon.com/manual/usages/general/#applyvarcal-algorithm), so the basic processing of alignment of fastq-files to VCF-files can be done using speedup Sentieon functions. ### Basic usage of Sentieon functions in Sarek @@ -337,7 +337,7 @@ To use Sentieon's function `GVCFtyper` along with Sention's version of VQSR (`Va ## Updating the pipeline -When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline: +When you run a nextflow command like, say, `nextflow run nf-core/sarek -profile docker -params-file params.yaml` which specifies the repository `nf-core/sarek`, then Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline: ```bash nextflow pull nf-core/sarek From 501a4a556aec4a660bc61b196f0ee1d4f7775cf5 Mon Sep 17 00:00:00 2001 From: Anders Sune Pedersen Date: Fri, 23 Jun 2023 16:50:00 +0200 Subject: [PATCH 06/21] Adding info on Sentieon Dedup reports --- docs/output.md | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/docs/output.md b/docs/output.md index 8dc770a80..c710bb25c 100644 --- a/docs/output.md +++ b/docs/output.md @@ -454,7 +454,7 @@ Files created: The output from Sentieon's Haplotyper can be controlled through the option `--sentieon_haplotyper_emit_mode` for Sarek, see [Basic usage of Sentieon functions in Sarek](https://github.com/nf-core/sarek/blob/sentieon_docs/docs/usage.md#basic-usage-of-sentieon-functions-in-sarek). -Unless `haplotyper_filter` is listed under `--skip_tools` in the nextflow command, GATK's CNNScoreVariants and FilterVariantTranches (see above) will applied to the unfiltered VCF-files obtained filtered vcf-files. +Unless `haplotyper_filter` is listed under `--skip_tools` in the nextflow command, GATK's CNNScoreVariants and FilterVariantTranches (see above) is applied to the unfiltered VCF-files in order to obtained filtered vcf-files.
Filtered VCF-files for normal samples @@ -921,6 +921,20 @@ The plot will show: - file used by [MultiQC](https://multiqc.info/)
+ +#### Sentieon Dedup reports + +Sentieon's DNAseq-subroutine Dedup produces a metrics report much like the one produce by GATK's MarkDuplicates. The Dedup metrics are imported into MultiQC as custom content and displayed in a table. + +
+Output files for all samples + +**Output directory: `{outdir}/reports/sentieon_dedup/`** + +- `.dedup.cram.metrics` + - file used by [MultiQC](https://multiqc.info/). +
+ #### samtools stats [samtools stats](https://www.htslib.org/doc/samtools.html) collects statistics from CRAM files and outputs in a text format. From 0e668953f161460d53d5b083cce0b19588df35b3 Mon Sep 17 00:00:00 2001 From: Anders Sune Pedersen Date: Fri, 23 Jun 2023 16:52:24 +0200 Subject: [PATCH 07/21] Fixed typo --- docs/output.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/output.md b/docs/output.md index c710bb25c..ef043b610 100644 --- a/docs/output.md +++ b/docs/output.md @@ -227,7 +227,7 @@ The subroutine [Dedup](https://support.sentieon.com/manual/usages/general/#dedup - `.dedup.cram` and `.dedup.cram.crai` - CRAM file and index - if `--save_output_as_bam`: - - `.md.bam` and `.dedup.bam.bai` + - `.dedup.bam` and `.dedup.bam.bai` @@ -924,7 +924,7 @@ The plot will show: #### Sentieon Dedup reports -Sentieon's DNAseq-subroutine Dedup produces a metrics report much like the one produce by GATK's MarkDuplicates. The Dedup metrics are imported into MultiQC as custom content and displayed in a table. +Sentieon's DNAseq subroutine Dedup produces a metrics report much like the one produce by GATK's MarkDuplicates. The Dedup metrics are imported into MultiQC as custom content and displayed in a table.
Output files for all samples From df48f6adbe96ab19c01df5e7a850db92b5db7b52 Mon Sep 17 00:00:00 2001 From: Anders Sune Pedersen Date: Fri, 23 Jun 2023 17:05:59 +0200 Subject: [PATCH 08/21] prettier --- docs/output.md | 3 +-- docs/usage.md | 4 ++++ 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/docs/output.md b/docs/output.md index ef043b610..0b8dcc1a8 100644 --- a/docs/output.md +++ b/docs/output.md @@ -174,7 +174,6 @@ The aligned reads are coordinate-sorted with Sentieon. The alignment files (BAM or CRAM) produced by the chosen aligner are, by default, not published, that is, they are not placed in the output-folder (`outdir`), but by setting `--save_mapped` the alignment files are published in CRAM format or, by additional setting `--save_output_as_bam`, in BAM format. - **Output directory: `{outdir}/preprocessing/mapped//`** - if `--save_mapped`: `.sorted.cram` and `.sorted.cram.crai` @@ -297,6 +296,7 @@ See the [`input`](usage#input-sample-sheet-configurations) section in the usage
#### Sentieon QualCal (BQSR) + Currently, Sentieon's version of BQSR, QualCal, is not available in Sarek. Recent Illumina sequencers tend to provide well-calibrated BQs, so BQSR may not provide much benefit. By default Sarek runs GATK's BQSR; that can be skipped by adding the option `--skip_tools baserecalibrator`. ## Variant Calling @@ -921,7 +921,6 @@ The plot will show: - file used by [MultiQC](https://multiqc.info/) - #### Sentieon Dedup reports Sentieon's DNAseq subroutine Dedup produces a metrics report much like the one produce by GATK's MarkDuplicates. The Dedup metrics are imported into MultiQC as custom content and displayed in a table. diff --git a/docs/usage.md b/docs/usage.md index 0e4eb81ee..edd0c413f 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -306,6 +306,7 @@ test,sample4_vs_sample3,manta,sample4_vs_sample3.somatic_sv.vcf.gz ``` ## Sentieon + [Sentieon](https://www.sentieon.com/) is a commercial solution to process genomics data with high computing efficiency, fast turnaround time, exceptional high accuracy, and 100% consistency. In particular, Sentieon contains what may be view as speedup version of some standard GATK tools, like bwamem and haplotyper. Sarek now contains support for some of modules of functionality from Sentieon. In order to use the Sentieon modules of Sarek, the user will need to supply the Sarek pipeline with a license for Sentieon. @@ -317,12 +318,15 @@ Sentieon supply license in the form of a string-value (a url) or a file. It shou ```bash nextflow secret set SENTIEON_LICENSE_BASE64 $(echo -n | base64 -w 0) ``` + If a license file is supplied, then the nextflow secret should be set like this: ```bash nextflow secrets set SENTIEON_LICENSE_BASE64 \$(cat | base64 -w 0) ``` + ### Available Sentieon functions + Sarek contains the following Sentieon functions [bwa mem](https://support.sentieon.com/manual/usages/general/#bwa-mem-syntax), [LocusCollector](https://support.sentieon.com/manual/usages/general/#locuscollector-algorithm) + [Dedup](https://support.sentieon.com/manual/usages/general/#dedup-algorithm), [Haplotyper](https://support.sentieon.com/manual/usages/general/#haplotyper-algorithm), [GVCFtyper](https://support.sentieon.com/manual/usages/general/#gvcftyper-algorithm) and [VarCal](https://support.sentieon.com/manual/usages/general/#varcal-algorithm) + [ApplyVarCal](https://support.sentieon.com/manual/usages/general/#applyvarcal-algorithm), so the basic processing of alignment of fastq-files to VCF-files can be done using speedup Sentieon functions. ### Basic usage of Sentieon functions in Sarek From 8616f179d7f23313327bce590734cfaa94a72e3f Mon Sep 17 00:00:00 2001 From: Anders Sune Pedersen Date: Fri, 23 Jun 2023 17:23:53 +0200 Subject: [PATCH 09/21] Adding sentieon-dedup-reports to content table --- docs/output.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/output.md b/docs/output.md index 0b8dcc1a8..ac9848f19 100644 --- a/docs/output.md +++ b/docs/output.md @@ -52,6 +52,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d - [FastQC](#fastqc) - [FastP](#fastp) - [GATK MarkDuplicates reports](#gatk-markduplicates-reports) + - [Sentieon Dedup reports](#sentieon-dedup-reports) - [mosdepth](#mosdepth) - [samtools stats](#samtools-stats) - [bcftools stats](#bcftools-stats) From c45f86e7cc7f3736d3125326b5ebe996327edd9c Mon Sep 17 00:00:00 2001 From: asp8200 Date: Mon, 26 Jun 2023 20:56:29 +0000 Subject: [PATCH 10/21] Fixing closing tag for detail --- docs/output.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/output.md b/docs/output.md index ac9848f19..9d20d4832 100644 --- a/docs/output.md +++ b/docs/output.md @@ -437,7 +437,7 @@ Files created: - `{sample,tumorsample_vs_normalsample}.mutect2.filtered.vcf.gz.filteringStats.tsv` - a stats file generated during the filtering of Mutect2 called variants - #### Sentieon Haplotyper From e4a6a47b7254c8fac5915dd9e686453204356167 Mon Sep 17 00:00:00 2001 From: Anders Sune Pedersen <37172585+asp8200@users.noreply.github.com> Date: Tue, 27 Jun 2023 13:48:54 +0200 Subject: [PATCH 11/21] Update docs/output.md Co-authored-by: SusiJo <43847534+SusiJo@users.noreply.github.com> --- docs/output.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/output.md b/docs/output.md index 9d20d4832..cbcd0cc2d 100644 --- a/docs/output.md +++ b/docs/output.md @@ -217,7 +217,7 @@ The subroutines LocusCollector and Dedup are part of Sentieon DNAseq packages wi The subroutine [LocusCollector](https://support.sentieon.com/manual/usages/general/#driver-algorithm-syntax) collects read information that will be used for removing or marking of duplicate reads; its output is the score file indicating which reads are likely duplicates. -The subroutine [Dedup](https://support.sentieon.com/manual/usages/general/#dedup-algorithm) marks or removes duplicate reads based no the score file supplied by LocusCollector, and produces a BAM or CRAM file. +The subroutine [Dedup](https://support.sentieon.com/manual/usages/general/#dedup-algorithm) marks or removes duplicate reads based on the score file supplied by LocusCollector, and produces a BAM or CRAM file.
Output files for all samples From da09a3b8815c2c6d945dd94062bc387e7ac6c1a0 Mon Sep 17 00:00:00 2001 From: Anders Sune Pedersen <37172585+asp8200@users.noreply.github.com> Date: Tue, 27 Jun 2023 14:34:06 +0200 Subject: [PATCH 12/21] Update docs/output.md Co-authored-by: SusiJo <43847534+SusiJo@users.noreply.github.com> --- docs/output.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/output.md b/docs/output.md index cbcd0cc2d..88767d97c 100644 --- a/docs/output.md +++ b/docs/output.md @@ -215,7 +215,7 @@ The resulting CRAM files are delivered to the users. The subroutines LocusCollector and Dedup are part of Sentieon DNAseq packages with speedup versions of the standard GATK tools, and together those two subroutines correspond to GATK's MarkDuplicates. -The subroutine [LocusCollector](https://support.sentieon.com/manual/usages/general/#driver-algorithm-syntax) collects read information that will be used for removing or marking of duplicate reads; its output is the score file indicating which reads are likely duplicates. +The subroutine [LocusCollector](https://support.sentieon.com/manual/usages/general/#driver-algorithm-syntax) collects read information that will be used for removing or tagging duplicate reads; its output is the score file indicating which reads are likely duplicates. The subroutine [Dedup](https://support.sentieon.com/manual/usages/general/#dedup-algorithm) marks or removes duplicate reads based on the score file supplied by LocusCollector, and produces a BAM or CRAM file. From c2873ee13c585abb820313dc00d9f3df5005db29 Mon Sep 17 00:00:00 2001 From: Anders Sune Pedersen <37172585+asp8200@users.noreply.github.com> Date: Tue, 27 Jun 2023 14:34:49 +0200 Subject: [PATCH 13/21] Update docs/output.md Co-authored-by: SusiJo <43847534+SusiJo@users.noreply.github.com> --- docs/output.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/output.md b/docs/output.md index 88767d97c..6f9d63c60 100644 --- a/docs/output.md +++ b/docs/output.md @@ -391,7 +391,7 @@ If the haplotype-called VCF files are not filtered, then Sarek should be run wit [GATK Joint germline Variant Calling](https://gatk.broadinstitute.org/hc/en-us/articles/360035535932-Germline-short-variant-discovery-SNPs-Indels-) uses Haplotypecaller per sample in `gvcf` mode. Next, the gVCFs are consolidated from multiple samples into a [GenomicsDB](https://gatk.broadinstitute.org/hc/en-us/articles/5358869876891-GenomicsDBImport) datastore. After joint [genotyping](https://gatk.broadinstitute.org/hc/en-us/articles/5358906861083-GenotypeGVCFs), [VQSR](https://gatk.broadinstitute.org/hc/en-us/articles/5358906115227-VariantRecalibrator) is applied for filtering to produce the final multisample callset with the desired balance of precision and sensitivity.
-Output files from joint germline variant callling +Output files from joint germline variant calling **Output directory: `{outdir}/variantcalling/haplotypecaller//`** From 8379a4c20f6e8df11698c4dfb6b64d2cd74e6ad3 Mon Sep 17 00:00:00 2001 From: Anders Sune Pedersen <37172585+asp8200@users.noreply.github.com> Date: Tue, 27 Jun 2023 14:40:39 +0200 Subject: [PATCH 14/21] Update docs/output.md Co-authored-by: SusiJo <43847534+SusiJo@users.noreply.github.com> --- docs/output.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/output.md b/docs/output.md index 6f9d63c60..0fa65613c 100644 --- a/docs/output.md +++ b/docs/output.md @@ -455,7 +455,7 @@ Files created: The output from Sentieon's Haplotyper can be controlled through the option `--sentieon_haplotyper_emit_mode` for Sarek, see [Basic usage of Sentieon functions in Sarek](https://github.com/nf-core/sarek/blob/sentieon_docs/docs/usage.md#basic-usage-of-sentieon-functions-in-sarek). -Unless `haplotyper_filter` is listed under `--skip_tools` in the nextflow command, GATK's CNNScoreVariants and FilterVariantTranches (see above) is applied to the unfiltered VCF-files in order to obtained filtered vcf-files. +Unless `haplotyper_filter` is listed under `--skip_tools` in the nextflow command, GATK's CNNScoreVariants and FilterVariantTranches (see above) is applied to the unfiltered VCF-files in order to obtain filtered VCF-files.
Filtered VCF-files for normal samples From 80301074b1e846060168c1689c52aa41a36d530b Mon Sep 17 00:00:00 2001 From: Anders Sune Pedersen <37172585+asp8200@users.noreply.github.com> Date: Tue, 27 Jun 2023 14:40:58 +0200 Subject: [PATCH 15/21] Update docs/output.md Co-authored-by: SusiJo <43847534+SusiJo@users.noreply.github.com> --- docs/output.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/output.md b/docs/output.md index 0fa65613c..6985bd7f4 100644 --- a/docs/output.md +++ b/docs/output.md @@ -469,7 +469,7 @@ Unless `haplotyper_filter` is listed under `--skip_tools` in the nextflow comman ##### Sentieon Joint Germline Variant Calling -In Sentieon's package DNAseq, joint germline variant calling is done by first running Sentieon's Haplotyper in emit-mode `gvcf` for each sample, and then running Sentieon's [GVCFtyper](https://support.sentieon.com/manual/usages/general/#gvcftyper-algorithm) on the set of gVCF-files. See [Basic usage of Sentieon functions in Sarek](https://github.com/nf-core/sarek/blob/sentieon_docs/docs/usage.md#basic-usage-of-sentieon-functions-in-sarek) for information on how joint germline variant callling can be done in Sarek using Sentieon's DNAseq. +In Sentieon's package DNAseq, joint germline variant calling is done by first running Sentieon's Haplotyper in emit-mode `gvcf` for each sample and then running Sentieon's [GVCFtyper](https://support.sentieon.com/manual/usages/general/#gvcftyper-algorithm) on the set of gVCF-files. See [Basic usage of Sentieon functions in Sarek](https://github.com/nf-core/sarek/blob/sentieon_docs/docs/usage.md#basic-usage-of-sentieon-functions-in-sarek) for information on how joint germline variant calling can be done in Sarek using Sentieon's DNAseq. Sarek's implementation of joint germline variant calling using DNAseq does not include the usage of [GenomicsDB](https://gatk.broadinstitute.org/hc/en-us/articles/5358869876891-GenomicsDBImport) datastore. After joint genotyping, Sentieon's version of VQSR ([VarCal](https://support.sentieon.com/manual/usages/general/#varcal-algorithm) and [ApplyVarCal](https://support.sentieon.com/manual/usages/general/#applyvarcal-algorithm)) is applied for filtering to produce the final multisample callset with the desired balance of precision and sensitivity. From 199453209733f6de5b45c4f40c7bcc8d2cc05d36 Mon Sep 17 00:00:00 2001 From: Anders Sune Pedersen <37172585+asp8200@users.noreply.github.com> Date: Tue, 27 Jun 2023 14:41:13 +0200 Subject: [PATCH 16/21] Update docs/output.md Co-authored-by: SusiJo <43847534+SusiJo@users.noreply.github.com> --- docs/output.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/output.md b/docs/output.md index 6985bd7f4..cb0ff4d6c 100644 --- a/docs/output.md +++ b/docs/output.md @@ -474,7 +474,7 @@ In Sentieon's package DNAseq, joint germline variant calling is done by first ru Sarek's implementation of joint germline variant calling using DNAseq does not include the usage of [GenomicsDB](https://gatk.broadinstitute.org/hc/en-us/articles/5358869876891-GenomicsDBImport) datastore. After joint genotyping, Sentieon's version of VQSR ([VarCal](https://support.sentieon.com/manual/usages/general/#varcal-algorithm) and [ApplyVarCal](https://support.sentieon.com/manual/usages/general/#applyvarcal-algorithm)) is applied for filtering to produce the final multisample callset with the desired balance of precision and sensitivity.
-Output files from joint germline variant callling +Output files from joint germline variant calling **Output directory: `{outdir}/variantcalling/sentieon_haplotyper//`** From 42bd4115b7142b7a910119ff8dec08f9265b55a3 Mon Sep 17 00:00:00 2001 From: Anders Sune Pedersen <37172585+asp8200@users.noreply.github.com> Date: Tue, 27 Jun 2023 14:43:23 +0200 Subject: [PATCH 17/21] Update docs/output.md Co-authored-by: SusiJo <43847534+SusiJo@users.noreply.github.com> --- docs/output.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/output.md b/docs/output.md index cb0ff4d6c..6b59e738b 100644 --- a/docs/output.md +++ b/docs/output.md @@ -924,7 +924,7 @@ The plot will show: #### Sentieon Dedup reports -Sentieon's DNAseq subroutine Dedup produces a metrics report much like the one produce by GATK's MarkDuplicates. The Dedup metrics are imported into MultiQC as custom content and displayed in a table. +Sentieon's DNAseq subroutine Dedup produces a metrics report much like the one produced by GATK's MarkDuplicates. The Dedup metrics are imported into MultiQC as custom content and displayed in a table.
Output files for all samples From 30347d945540b11c54ddc45625789e45188c37a1 Mon Sep 17 00:00:00 2001 From: asp8200 Date: Tue, 27 Jun 2023 12:47:11 +0000 Subject: [PATCH 18/21] Remove some brackets --- docs/usage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/usage.md b/docs/usage.md index edd0c413f..d1f435e21 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -335,7 +335,7 @@ To use Sentieon's aligner `bwa mem`, set the aligner option `sentieon-bwamem`. ( To use Sentieon's function `Dedup`, specify `sentieon_dedup` as one of the tools. (This can, for example, be done by adding `--tools sentieon_dedup` to the nextflow run command.) -To use Sentieon's function `Haplotyper`, specify `sentieon_haplotyper` as one of the tools. (This can, for example, be done by adding `--tools sentieon_haplotyper` to the nextflow run command.) (In order to skip the GATK-based variant-filer one may add `--skip_tools haplotyper_filter` to the nextflow run command.) Sarek also provides the option `sentieon_haplotyper_emit_mode` which can be used to set the [emit-mode](https://support.sentieon.com/manual/usages/general/#haplotyper-algorithm) of Sentieon's haplotyper. Sentieon's haplotyper can output both a vcf-file and a gvcf-file in the same run; this is achieved by setting `sentieon_haplotyper_emit_mode` to `,gvcf`, where `` is `variant`, `confident` or `all`. +To use Sentieon's function `Haplotyper`, specify `sentieon_haplotyper` as one of the tools. This can, for example, be done by adding `--tools sentieon_haplotyper` to the nextflow run command. In order to skip the GATK-based variant-filter, one may add `--skip_tools haplotyper_filter` to the nextflow run command. Sarek also provides the option `sentieon_haplotyper_emit_mode` which can be used to set the [emit-mode](https://support.sentieon.com/manual/usages/general/#haplotyper-algorithm) of Sentieon's haplotyper. Sentieon's haplotyper can output both a vcf-file and a gvcf-file in the same run; this is achieved by setting `sentieon_haplotyper_emit_mode` to `,gvcf`, where `` is `variant`, `confident` or `all`. To use Sentieon's function `GVCFtyper` along with Sention's version of VQSR (`VarCal` and `ApplyVarCal`) for joint-germline genotyping, specify `sentieon_haplotyper` as one of the tools, set the option `sentieon_haplotyper_emit_mode` to `gvcf`, and add the option `joint_germline`. This can, for example, be done by adding `--tools sentieon_haplotyper --joint_germline --sentieon_haplotyper_emit_mode gvcf` to the nextflow run command.) From 001020ef881337e0c3ac24e651f3d0b1f95681ae Mon Sep 17 00:00:00 2001 From: Anders Sune Pedersen <37172585+asp8200@users.noreply.github.com> Date: Tue, 27 Jun 2023 14:49:18 +0200 Subject: [PATCH 19/21] Update docs/usage.md Co-authored-by: SusiJo <43847534+SusiJo@users.noreply.github.com> --- docs/usage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/usage.md b/docs/usage.md index d1f435e21..db44b8c70 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -341,7 +341,7 @@ To use Sentieon's function `GVCFtyper` along with Sention's version of VQSR (`Va ## Updating the pipeline -When you run a nextflow command like, say, `nextflow run nf-core/sarek -profile docker -params-file params.yaml` which specifies the repository `nf-core/sarek`, then Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline: +When you launch a pipeline from the command-line with `nextflow run nf-core/sarek -profile docker -params-file params.yaml`, Nextflow will automatically pull the pipeline code from GitHub and store it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline: ```bash nextflow pull nf-core/sarek From 6b21e607b43631869a93e70ff48f66c2728a614c Mon Sep 17 00:00:00 2001 From: Anders Sune Pedersen <37172585+asp8200@users.noreply.github.com> Date: Tue, 27 Jun 2023 14:50:17 +0200 Subject: [PATCH 20/21] Update docs/usage.md Co-authored-by: SusiJo <43847534+SusiJo@users.noreply.github.com> --- docs/usage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/usage.md b/docs/usage.md index db44b8c70..ef5dc2f7c 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -337,7 +337,7 @@ To use Sentieon's function `Dedup`, specify `sentieon_dedup` as one of the tools To use Sentieon's function `Haplotyper`, specify `sentieon_haplotyper` as one of the tools. This can, for example, be done by adding `--tools sentieon_haplotyper` to the nextflow run command. In order to skip the GATK-based variant-filter, one may add `--skip_tools haplotyper_filter` to the nextflow run command. Sarek also provides the option `sentieon_haplotyper_emit_mode` which can be used to set the [emit-mode](https://support.sentieon.com/manual/usages/general/#haplotyper-algorithm) of Sentieon's haplotyper. Sentieon's haplotyper can output both a vcf-file and a gvcf-file in the same run; this is achieved by setting `sentieon_haplotyper_emit_mode` to `,gvcf`, where `` is `variant`, `confident` or `all`. -To use Sentieon's function `GVCFtyper` along with Sention's version of VQSR (`VarCal` and `ApplyVarCal`) for joint-germline genotyping, specify `sentieon_haplotyper` as one of the tools, set the option `sentieon_haplotyper_emit_mode` to `gvcf`, and add the option `joint_germline`. This can, for example, be done by adding `--tools sentieon_haplotyper --joint_germline --sentieon_haplotyper_emit_mode gvcf` to the nextflow run command.) +To use Sentieon's function `GVCFtyper` along with Sention's version of VQSR (`VarCal` and `ApplyVarCal`) for joint-germline genotyping, specify `sentieon_haplotyper` as one of the tools, set the option `sentieon_haplotyper_emit_mode` to `gvcf`, and add the option `joint_germline`. This can, for example, be done by adding `--tools sentieon_haplotyper --joint_germline --sentieon_haplotyper_emit_mode gvcf` to the nextflow run command. ## Updating the pipeline From 4ad67c0b2dbc5eda65f3268e8f995c813eef1372 Mon Sep 17 00:00:00 2001 From: Anders Sune Pedersen <37172585+asp8200@users.noreply.github.com> Date: Tue, 27 Jun 2023 14:51:13 +0200 Subject: [PATCH 21/21] Update docs/output.md Co-authored-by: SusiJo <43847534+SusiJo@users.noreply.github.com> --- docs/output.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/output.md b/docs/output.md index 6b59e738b..199f75eab 100644 --- a/docs/output.md +++ b/docs/output.md @@ -173,7 +173,7 @@ The aligned reads are coordinate-sorted with Sentieon.
Output files for all mappers and samples -The alignment files (BAM or CRAM) produced by the chosen aligner are, by default, not published, that is, they are not placed in the output-folder (`outdir`), but by setting `--save_mapped` the alignment files are published in CRAM format or, by additional setting `--save_output_as_bam`, in BAM format. +The alignment files (BAM or CRAM) produced by the chosen aligner are not published by default. CRAM output files will not be saved in the output-folder (`outdir`), unless the flag `--save_mapped` is used. BAM output can be selected by setting the flag `--save_output_as_bam`. **Output directory: `{outdir}/preprocessing/mapped//`**