From a3c473c7176e9eb710925c075687c3446e034840 Mon Sep 17 00:00:00 2001 From: Rike Date: Wed, 20 Jul 2022 15:44:19 +0200 Subject: [PATCH 1/8] update username [skip actions] --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 2a64119503..6b4c74ddf4 100644 --- a/README.md +++ b/README.md @@ -97,7 +97,7 @@ We thank the following people for their extensive assistance in the development - [Anders Sune Pedersen](https://github.com/asp8200) - [Chela James](https://github.com/chelauk) - [David Mas-Ponte](https://github.com/davidmasp) -- [Francesco L](https://github.com/nibscles) +- [Francesco Lescai](https://github.com/lescai) - [Gisela Gabernet](https://github.com/ggabernet) - [Harshil Patel](https://github.com/drpatelh) - [James A. Fellows Yates](https://github.com/jfy133) From b734c5b0936d4573225b6e0089fa2688c8aa9b95 Mon Sep 17 00:00:00 2001 From: Rike Date: Wed, 20 Jul 2022 16:07:09 +0200 Subject: [PATCH 2/8] remove addressed todo strings --- subworkflows/nf-core/variantcalling/deepvariant/main.nf | 1 - subworkflows/nf-core/variantcalling/manta/germline/main.nf | 1 - subworkflows/nf-core/variantcalling/manta/tumoronly/main.nf | 1 - 3 files changed, 3 deletions(-) diff --git a/subworkflows/nf-core/variantcalling/deepvariant/main.nf b/subworkflows/nf-core/variantcalling/deepvariant/main.nf index 112ba56fdd..4062c4aed6 100644 --- a/subworkflows/nf-core/variantcalling/deepvariant/main.nf +++ b/subworkflows/nf-core/variantcalling/deepvariant/main.nf @@ -4,7 +4,6 @@ include { DEEPVARIANT } from '../../../../modules/ include { TABIX_TABIX as TABIX_VC_DEEPVARIANT_GVCF } from '../../../../modules/nf-core/modules/tabix/tabix/main' include { TABIX_TABIX as TABIX_VC_DEEPVARIANT_VCF } from '../../../../modules/nf-core/modules/tabix/tabix/main' -//TODO: benchmark if it is better to provide multiple bed files & run on multiple machines + mergeing afterwards || one containing all intervals and run on one larger machine // Deepvariant: https://github.com/google/deepvariant/issues/510 workflow RUN_DEEPVARIANT { take: diff --git a/subworkflows/nf-core/variantcalling/manta/germline/main.nf b/subworkflows/nf-core/variantcalling/manta/germline/main.nf index 7276e88ed2..ed583b2434 100644 --- a/subworkflows/nf-core/variantcalling/manta/germline/main.nf +++ b/subworkflows/nf-core/variantcalling/manta/germline/main.nf @@ -3,7 +3,6 @@ include { GATK4_MERGEVCFS as MERGE_MANTA_SMALL_INDELS } from '../../../../../mod include { GATK4_MERGEVCFS as MERGE_MANTA_SV } from '../../../../../modules/nf-core/modules/gatk4/mergevcfs/main' include { MANTA_GERMLINE } from '../../../../../modules/nf-core/modules/manta/germline/main' -// TODO: Research if splitting by intervals is ok, we pretend for now it is fine. // Seems to be the consensus on upstream modules implementation too workflow RUN_MANTA_GERMLINE { take: diff --git a/subworkflows/nf-core/variantcalling/manta/tumoronly/main.nf b/subworkflows/nf-core/variantcalling/manta/tumoronly/main.nf index c966a8c498..78564ca2fd 100644 --- a/subworkflows/nf-core/variantcalling/manta/tumoronly/main.nf +++ b/subworkflows/nf-core/variantcalling/manta/tumoronly/main.nf @@ -3,7 +3,6 @@ include { GATK4_MERGEVCFS as MERGE_MANTA_SV } from '../../../../. include { GATK4_MERGEVCFS as MERGE_MANTA_TUMOR } from '../../../../../modules/nf-core/modules/gatk4/mergevcfs/main' include { MANTA_TUMORONLY } from '../../../../../modules/nf-core/modules/manta/tumoronly/main' -// TODO: Research if splitting by intervals is ok, we pretend for now it is fine. // Seems to be the consensus on upstream modules implementation too workflow RUN_MANTA_TUMORONLY { take: From caeb83a748de0e403dbdd6a04da5bd185cd07cc4 Mon Sep 17 00:00:00 2001 From: Rike Date: Wed, 20 Jul 2022 16:12:46 +0200 Subject: [PATCH 3/8] add gavin to contirbutors list --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 6b4c74ddf4..31c014040a 100644 --- a/README.md +++ b/README.md @@ -98,6 +98,7 @@ We thank the following people for their extensive assistance in the development - [Chela James](https://github.com/chelauk) - [David Mas-Ponte](https://github.com/davidmasp) - [Francesco Lescai](https://github.com/lescai) +- [Gavin Mackenzie](https://github.com/GCJMackenzie) - [Gisela Gabernet](https://github.com/ggabernet) - [Harshil Patel](https://github.com/drpatelh) - [James A. Fellows Yates](https://github.com/jfy133) From 30b43148bb4289fb6d40bc85e2e3bc83bd31aa7d Mon Sep 17 00:00:00 2001 From: Rike Date: Wed, 20 Jul 2022 16:27:12 +0200 Subject: [PATCH 4/8] add joint germline docs --- docs/output.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/docs/output.md b/docs/output.md index 0300dbfd77..7ac02e676b 100644 --- a/docs/output.md +++ b/docs/output.md @@ -323,7 +323,12 @@ If the haplotype-called VCF files are not filtered, then Sarek should be run wit **Output directory: `{outdir}/variantcalling/haplotypecaller//`** -_TODO_ +- `joint_germline.vcf.gz` and `joint_germline.vcf.gz.tbi` + - VCF with tabix index +- `joint_germline_recalibrated.vcf.gz` and `joint_germline_recalibrated.vcf.gz.tbi` + - variant recalibrated VCF with tabix index + + #### GATK Mutect2 From c30f8fbdb0f847584c5658f7e0e7ac77a23c2024 Mon Sep 17 00:00:00 2001 From: Rike Date: Wed, 20 Jul 2022 16:47:35 +0200 Subject: [PATCH 5/8] where do the ref genomes come from --- docs/usage.md | 55 ++++++++++++++++++++++----------------------------- 1 file changed, 24 insertions(+), 31 deletions(-) diff --git a/docs/usage.md b/docs/usage.md index 6fe7ef39d5..ba160db9e5 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -712,37 +712,30 @@ nextflow run nf-core/sarek --known_indels false --genome GRCh38.GATK ### Where do the used reference genomes originate from -_under construction - help needed_ - -GATK.GRCh38: - -| File | Tools | Origin | Docs | -| :-------------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :----------------------------------------------------------------------------------- | -| ascat_alleles | ASCAT | https://www.dropbox.com/s/uouszfktzgoqfy7/G1000_alleles_hg38.zip | https://github.com/VanLoo-lab/ascat/tree/master/ReferenceFiles/WGS | -| ascat_loci | ASCAT | https://www.dropbox.com/s/80cq0qgao8l1inj/G1000_loci_hg38.zip | https://github.com/VanLoo-lab/ascat/tree/master/ReferenceFiles/WGS | -| ascat_loci_gc | ASCAT | https://www.dropbox.com/s/80cq0qgao8l1inj/G1000_loci_hg38.zip | https://github.com/VanLoo-lab/ascat/tree/master/ReferenceFiles/WGS | -| ascat_loci_rt | ASCAT | https://www.dropbox.com/s/xlp99uneqh6nh6p/RT_G1000_hg38.zip | https://github.com/VanLoo-lab/ascat/tree/master/ReferenceFiles/WGS | -| bwa | bwa-mem | bwa index -p bwa/${fasta.baseName} $fasta | | -| bwamem2 | bwa-mem2 | bwa-mem2 index -p bwamem2/${fasta} $fasta | | -| dragmap | DragMap | dragen-os --build-hash-table true --ht-reference $fasta --output-directory dragmap | | -| dbsnp | Baserecalibrator, ControlFREEC, GenotypeGVCF, HaplotypeCaller | possibly from an old ftp server dbsnp_146.hg38.vcf.gz | https://gatk.broadinstitute.org/hc/en-us/articles/360035890811-Resource-bundle | -| dbsnp_tbi | Baserecalibrator, ControlFREEC, GenotypeGVCF, HaplotypeCaller | | | -| dict | Baserecalibrator(Spark), CNNScoreVariant, EstimateLibraryComplexity, FilterMutectCalls, FilterVariantTranches, GatherPileupSummaries,GenotypeGVCF, GetPileupSummaries, HaplotypeCaller, MarkDulpicates(Spark), MergeVCFs, Mutect2, Variantrecalibrator | https://console.cloud.google.com/storage/browser/_details/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.dict | https://gatk.broadinstitute.org/hc/en-us/articles/360035890811-Resource-bundle | -| fasta | ApplyBQSR(Spark), ApplyVQSR, ASCAT, Baserecalibrator(Spark), BWA, BWAMem2, CNNScoreVariant, CNVKit, ControlFREEC, DragMap, DEEPVariant, EnsemblVEP, EstimateLibraryComplexity, FilterMutectCalls, FilterVariantTranches, FreeBayes, GatherPileupSummaries,GenotypeGVCF, GetPileupSummaries, HaplotypeCaller, interval building, Manta, MarkDuplicates(Spark),MergeVCFs,MSISensorPro, Mutect2, Samtools, snpEff, Strelka, Tiddit, Variantrecalibrator | https://console.cloud.google.com/storage/browser/_details/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta | https://gatk.broadinstitute.org/hc/en-us/articles/360035890811-Resource-bundle | -| fasta_fai | ApplyBQSR(Spark), ApplyVQSR, ASCAT, Baserecalibrator(Spark), BWA, BWAMem2, CNNScoreVariant, CNVKit, ControlFREEC, DragMap, DEEPVariant, EnsemblVEP, EstimateLibraryComplexity, FilterMutectCalls, FilterVariantTranches, FreeBayes, GatherPileupSummaries,GenotypeGVCF, GetPileupSummaries, HaplotypeCaller, interval building, Manta, MarkDuplicates(Spark),MergeVCFs,MSISensorPro, Mutect2, Samtools, snpEff, Strelka, Tiddit, Variantrecalibrator | https://console.cloud.google.com/storage/browser/_details/genomics-public-data/resources/broad/hg38/v0/Homo_sapiens_assembly38.fasta.fai | https://gatk.broadinstitute.org/hc/en-us/articles/360035890811-Resource-bundle | -| germline_resource | GetPileupsummaries,Mutect2 | ? gnomAD.r2.1.1.GRCh38.PASS.AC.AF.only.vcf.gz" | | -| germline_resource_tbi | GetPileupsummaries,Mutect2 | ? gnomAD.r2.1.1.GRCh38.PASS.AC.AF.only.vcf.gz.tbi" | | -| intervals | ApplyBQSR(Spark), ASCAT, Baserecalibraotr(Spark), BCFTools, CNNScoreVariants, ControlFREEC, Deepvariant, FilterVariantTranches, FreeBayes, GenotypeGVCF, GetPileupSummaries, HaplotypeCaller, Strelka, mpileup, MSISensorPro, Mutect2, VCFTools | https://console.cloud.google.com/storage/browser/_details/genomics-public-data/resources/broad/hg38/v0/wgs_calling_regions.hg38.interval_list | | -| known_indels | BaseRecalibrator(Spark), FilterVariantTranches | https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz,beta/Homo_sapiens_assembly38.known_indels}.vcf. | | -| known_indels_tbi | BaseRecalibrator(Spark), FilterVariantTranches | https://storage.googleapis.com/genomics-public-data/resources/broad/hg38/v0/Mills_and_1000G_gold_standard.indels.hg38.vcf.gz.tbi,beta/Homo_sapiens_assembly38.known_indels}.vcf.gz.tbi" | | -| mappability | ControlFREEC | http://xfer.curie.fr/get/vyIi4w8EONl/out100m2_hg38.zip | http://boevalab.inf.ethz.ch/FREEC/tutorial.html | -| pon | Mutect2 | https://console.cloud.google.com/storage/browser/_details/gatk-best-practices/somatic-hg38/1000g_pon.hg38.vcf.gz | https://gatk.broadinstitute.org/hc/en-us/articles/360035890631-Panel-of-Normals-PON- | -| pon_tbi | Mutect2 | https://console.cloud.google.com/storage/browser/_details/gatk-best-practices/somatic-hg38/1000g_pon.hg38.vcf.gz.tbi | https://gatk.broadinstitute.org/hc/en-us/articles/360035890631-Panel-of-Normals-PON- | -| snpeff_db | | 'GRCh38.99' | | -| snpeff_genome | | 'GRCh38' | | -| vep_cache_version | | 105 | | -| vep_genome | | 'GRCh38' | | -| chr_dir | | "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Sequence/Chromosomes" | | +For GATK.GRCh38 the links for each reference file and the corresponding processes that use them is listed below. For GATK.GRCh37 the files originate from the same sources: + +| File | Tools | Origin | Docs | +| :-------------------- | :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-------------------------------------------------------------------------------------------------------------------- | :----------------------------------------------------------------------------------- | +| ascat_alleles | ASCAT | https://www.dropbox.com/s/uouszfktzgoqfy7/G1000_alleles_hg38.zip | https://github.com/VanLoo-lab/ascat/tree/master/ReferenceFiles/WGS | +| ascat_loci | ASCAT | https://www.dropbox.com/s/80cq0qgao8l1inj/G1000_loci_hg38.zip | https://github.com/VanLoo-lab/ascat/tree/master/ReferenceFiles/WGS | +| ascat_loci_gc | ASCAT | https://www.dropbox.com/s/80cq0qgao8l1inj/G1000_loci_hg38.zip | https://github.com/VanLoo-lab/ascat/tree/master/ReferenceFiles/WGS | +| ascat_loci_rt | ASCAT | https://www.dropbox.com/s/xlp99uneqh6nh6p/RT_G1000_hg38.zip | https://github.com/VanLoo-lab/ascat/tree/master/ReferenceFiles/WGS | +| bwa | bwa-mem | bwa index -p bwa/${fasta.baseName} $fasta | | +| bwamem2 | bwa-mem2 | bwa-mem2 index -p bwamem2/${fasta} $fasta | | +| dragmap | DragMap | dragen-os --build-hash-table true --ht-reference $fasta --output-directory dragmap | | +| dbsnp | Baserecalibrator, ControlFREEC, GenotypeGVCF, HaplotypeCaller | [GATKBundle](https://console.cloud.google.com/storage/browser/_details/genomics-public-data/resources/broad/hg38/v0/) | https://gatk.broadinstitute.org/hc/en-us/articles/360035890811-Resource-bundle | +| dbsnp_tbi | Baserecalibrator, ControlFREEC, GenotypeGVCF, HaplotypeCaller | [GATKBundle](https://console.cloud.google.com/storage/browser/_details/genomics-public-data/resources/broad/hg38/v0/) | | +| dict | Baserecalibrator(Spark), CNNScoreVariant, EstimateLibraryComplexity, FilterMutectCalls, FilterVariantTranches, GatherPileupSummaries,GenotypeGVCF, GetPileupSummaries, HaplotypeCaller, MarkDulpicates(Spark), MergeVCFs, Mutect2, Variantrecalibrator | [GATKBundle](https://console.cloud.google.com/storage/browser/_details/genomics-public-data/resources/broad/hg38/v0/) | https://gatk.broadinstitute.org/hc/en-us/articles/360035890811-Resource-bundle | +| fasta | ApplyBQSR(Spark), ApplyVQSR, ASCAT, Baserecalibrator(Spark), BWA, BWAMem2, CNNScoreVariant, CNVKit, ControlFREEC, DragMap, DEEPVariant, EnsemblVEP, EstimateLibraryComplexity, FilterMutectCalls, FilterVariantTranches, FreeBayes, GatherPileupSummaries,GenotypeGVCF, GetPileupSummaries, HaplotypeCaller, interval building, Manta, MarkDuplicates(Spark),MergeVCFs,MSISensorPro, Mutect2, Samtools, snpEff, Strelka, Tiddit, Variantrecalibrator | [GATKBundle](https://console.cloud.google.com/storage/browser/_details/genomics-public-data/resources/broad/hg38/v0/) | https://gatk.broadinstitute.org/hc/en-us/articles/360035890811-Resource-bundle | +| fasta_fai | ApplyBQSR(Spark), ApplyVQSR, ASCAT, Baserecalibrator(Spark), BWA, BWAMem2, CNNScoreVariant, CNVKit, ControlFREEC, DragMap, DEEPVariant, EnsemblVEP, EstimateLibraryComplexity, FilterMutectCalls, FilterVariantTranches, FreeBayes, GatherPileupSummaries,GenotypeGVCF, GetPileupSummaries, HaplotypeCaller, interval building, Manta, MarkDuplicates(Spark),MergeVCFs,MSISensorPro, Mutect2, Samtools, snpEff, Strelka, Tiddit, Variantrecalibrator | [GATKBundle](https://console.cloud.google.com/storage/browser/_details/genomics-public-data/resources/broad/hg38/v0/) | https://gatk.broadinstitute.org/hc/en-us/articles/360035890811-Resource-bundle | +| germline_resource | GetPileupsummaries,Mutect2 | [GATKBundle](https://console.cloud.google.com/storage/browser/_details/genomics-public-data/resources/broad/hg38/v0/) | | +| germline_resource_tbi | GetPileupsummaries,Mutect2 | [GATKBundle](https://console.cloud.google.com/storage/browser/_details/genomics-public-data/resources/broad/hg38/v0/) | | +| intervals | ApplyBQSR(Spark), ASCAT, Baserecalibraotr(Spark), BCFTools, CNNScoreVariants, ControlFREEC, Deepvariant, FilterVariantTranches, FreeBayes, GenotypeGVCF, GetPileupSummaries, HaplotypeCaller, Strelka, mpileup, MSISensorPro, Mutect2, VCFTools | [GATKBundle](https://console.cloud.google.com/storage/browser/_details/genomics-public-data/resources/broad/hg38/v0/) | | +| known_indels | BaseRecalibrator(Spark), FilterVariantTranches | [GATKBundle](https://console.cloud.google.com/storage/browser/_details/genomics-public-data/resources/broad/hg38/v0/) | | +| known_indels_tbi | BaseRecalibrator(Spark), FilterVariantTranches | [GATKBundle](https://console.cloud.google.com/storage/browser/_details/genomics-public-data/resources/broad/hg38/v0/) | | +| mappability | ControlFREEC | http://xfer.curie.fr/get/vyIi4w8EONl/out100m2_hg38.zip | http://boevalab.inf.ethz.ch/FREEC/tutorial.html | +| pon | Mutect2 | [GATKBundle](https://console.cloud.google.com/storage/browser/_details/genomics-public-data/resources/broad/hg38/v0/) | https://gatk.broadinstitute.org/hc/en-us/articles/360035890631-Panel-of-Normals-PON- | +| pon_tbi | Mutect2 | [GATKBundle](https://console.cloud.google.com/storage/browser/_details/genomics-public-data/resources/broad/hg38/v0/) | https://gatk.broadinstitute.org/hc/en-us/articles/360035890631-Panel-of-Normals-PON- | ## How to run sarek when no(t all) reference files are in igenomes From 702f3f5056e4704ddd9c67007638931fae1a5556 Mon Sep 17 00:00:00 2001 From: Rike Date: Wed, 20 Jul 2022 16:54:02 +0200 Subject: [PATCH 6/8] remove duplicated sections --- docs/usage.md | 32 +------------------------------- 1 file changed, 1 insertion(+), 31 deletions(-) diff --git a/docs/usage.md b/docs/usage.md index ba160db9e5..65a9699019 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -737,36 +737,6 @@ For GATK.GRCh38 the links for each reference file and the corresponding processe | pon | Mutect2 | [GATKBundle](https://console.cloud.google.com/storage/browser/_details/genomics-public-data/resources/broad/hg38/v0/) | https://gatk.broadinstitute.org/hc/en-us/articles/360035890631-Panel-of-Normals-PON- | | pon_tbi | Mutect2 | [GATKBundle](https://console.cloud.google.com/storage/browser/_details/genomics-public-data/resources/broad/hg38/v0/) | https://gatk.broadinstitute.org/hc/en-us/articles/360035890631-Panel-of-Normals-PON- | -## How to run sarek when no(t all) reference files are in igenomes - -For common genomes, such as GRCh38 and GRCh37, the pipeline is shipped with (almost) all necessary reference files. However, sometimes it is necessary to use custom references for some or all files: - -### No igenomes reference files are used - -If none of your required genome files are in igenomes, `--igenomes_ignore` must be set to ignore any igenomes input and `--genome null`. The `fasta` file is the only required input file and must be provided to run the pipeline. All other possible reference file can be provided in addition. For details, see the paramter documentation. - -Minimal example for custom genomes: - -``` -nextflow run nf-core/sarek --genome null --igenomes_ignore --fasta -``` - -### Overwrite specific reference files - -If you don't want to use some of the provided reference genomes, they can be overwritten by either providing a new file or setting the respective file parameter to `false`, if it should be ignored: - -Example for using a custom known indels file: - -``` -nextflow run nf-core/sarek --known_indels --genome GRCh38.GATK -``` - -Example for not using known indels, but all other provided reference files: - -``` -nextflow run nf-core/sarek --known_indels false --genome GRCh38.GATK -``` - ## How to customise SnpEff and VEP annotation Sarek uses nf-core provided containers for both snpEff and VEP for several reference genomes ('CanFam3', 'GRCh37', 'GRCh38', 'GRCm38' and 'WBcel235'). @@ -905,6 +875,6 @@ Error type Number of errors ERROR_CHROMOSOME_NOT_FOUND 17522411 ``` -## How to set sarek up to use sentieon +## How to set up sarek to use sentieon Sarek 3.0 is currently not supporting sentieon. It is planned for the upcoming release 3.1. In the meantime, please revert to the last release 2.7.2. From c82a97133dabcefbfd9f064fc4db0ee73f3459fe Mon Sep 17 00:00:00 2001 From: FriederikeHanssen Date: Wed, 20 Jul 2022 16:56:45 +0200 Subject: [PATCH 7/8] Update docs/usage.md Co-authored-by: Maxime U. Garcia --- docs/usage.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/usage.md b/docs/usage.md index 65a9699019..13f53521df 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -730,7 +730,7 @@ For GATK.GRCh38 the links for each reference file and the corresponding processe | fasta_fai | ApplyBQSR(Spark), ApplyVQSR, ASCAT, Baserecalibrator(Spark), BWA, BWAMem2, CNNScoreVariant, CNVKit, ControlFREEC, DragMap, DEEPVariant, EnsemblVEP, EstimateLibraryComplexity, FilterMutectCalls, FilterVariantTranches, FreeBayes, GatherPileupSummaries,GenotypeGVCF, GetPileupSummaries, HaplotypeCaller, interval building, Manta, MarkDuplicates(Spark),MergeVCFs,MSISensorPro, Mutect2, Samtools, snpEff, Strelka, Tiddit, Variantrecalibrator | [GATKBundle](https://console.cloud.google.com/storage/browser/_details/genomics-public-data/resources/broad/hg38/v0/) | https://gatk.broadinstitute.org/hc/en-us/articles/360035890811-Resource-bundle | | germline_resource | GetPileupsummaries,Mutect2 | [GATKBundle](https://console.cloud.google.com/storage/browser/_details/genomics-public-data/resources/broad/hg38/v0/) | | | germline_resource_tbi | GetPileupsummaries,Mutect2 | [GATKBundle](https://console.cloud.google.com/storage/browser/_details/genomics-public-data/resources/broad/hg38/v0/) | | -| intervals | ApplyBQSR(Spark), ASCAT, Baserecalibraotr(Spark), BCFTools, CNNScoreVariants, ControlFREEC, Deepvariant, FilterVariantTranches, FreeBayes, GenotypeGVCF, GetPileupSummaries, HaplotypeCaller, Strelka, mpileup, MSISensorPro, Mutect2, VCFTools | [GATKBundle](https://console.cloud.google.com/storage/browser/_details/genomics-public-data/resources/broad/hg38/v0/) | | +| intervals | ApplyBQSR(Spark), ASCAT, Baserecalibrator(Spark), BCFTools, CNNScoreVariants, ControlFREEC, Deepvariant, FilterVariantTranches, FreeBayes, GenotypeGVCF, GetPileupSummaries, HaplotypeCaller, Strelka, mpileup, MSISensorPro, Mutect2, VCFTools | [GATKBundle](https://console.cloud.google.com/storage/browser/_details/genomics-public-data/resources/broad/hg38/v0/) | | | known_indels | BaseRecalibrator(Spark), FilterVariantTranches | [GATKBundle](https://console.cloud.google.com/storage/browser/_details/genomics-public-data/resources/broad/hg38/v0/) | | | known_indels_tbi | BaseRecalibrator(Spark), FilterVariantTranches | [GATKBundle](https://console.cloud.google.com/storage/browser/_details/genomics-public-data/resources/broad/hg38/v0/) | | | mappability | ControlFREEC | http://xfer.curie.fr/get/vyIi4w8EONl/out100m2_hg38.zip | http://boevalab.inf.ethz.ch/FREEC/tutorial.html | From 0bc9403a67163bba09da269a016d942fd89f0bf7 Mon Sep 17 00:00:00 2001 From: Rike Date: Wed, 20 Jul 2022 18:59:30 +0200 Subject: [PATCH 8/8] remove duplicated sections --- docs/usage.md | 75 ++------------------------------------------------- 1 file changed, 2 insertions(+), 73 deletions(-) diff --git a/docs/usage.md b/docs/usage.md index 87a04aafa6..cbe11c4dca 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -581,7 +581,7 @@ This list is by no means exhaustive and it will depend on the specific analysis | [Control-FREEC](https://github.com/BoevaLab/FREEC) | x | x | x | - | x | x | | [MSIsensorPro](https://github.com/xjtu-omics/msisensor-pro) | x | x | x | - | - | x | -## How to run ASCAT with WES +## How to run ASCAT with whole-exome sequencing data? While the ASCAT implementation in sarek is capable of running with whole-exome sequencing data, the needed references are currently not provided with the igenomes.config. According to the [developers](https://github.com/VanLoo-lab/ascat/issues/97) of ASCAT, loci and allele files (one file per chromosome) can be downloaded directly from the [Battenberg repository](https://ora.ox.ac.uk/objects/uuid:08e24957-7e76-438a-bd38-66c48008cf52). @@ -604,19 +604,6 @@ For mapping, sarek follows the parameter suggestions provided in this [paper](ht In addition, currently the mismatch penalty for reads with tumor status in the sample sheet are mapped with a mismatch penalty of `-B 3`. -## MultiQC related issues - -### Plots for SnpEff are missing - -When plots are missing, it is possible that the fasta and the custom SnpEff database are not matching https://pcingola.github.io/SnpEff/se_faq/#error_chromosome_not_found-details. -The SnpEff completes without throwing an error causing nextflow to complete successfully. An indication for the error are these lines in the `.command` files: - -```text -ERRORS: Some errors were detected -Error type Number of errors -ERROR_CHROMOSOME_NOT_FOUND 17522411 -``` - ## How to create a panel-of-normals for Mutect2 For a detailed tutorial on how to create a panel-of-normals, see [here](https://gatk.broadinstitute.org/hc/en-us/articles/360035531132). @@ -805,64 +792,6 @@ For more details, see [here](https://www.ensembl.org/info/docs/tools/vep/script/ Resource requests are difficult to generalize and are often dependent on input data size. Currently, the number of cpus and memory requested by default were adapted from tests on 5 ICGC paired whole-genome sequencing samples with approximately 40X and 80X depth. For targeted data analysis, this is overshooting by a lot. In this case resources for each process can be limited by either setting `--max_memory` and `-max_cpus` or tailoring the request by process name as described [here](#resource-requests). If you are using sarek for a certain data type regulary, and would like to make these requests available to others on your system, an institution-specific, pipeline-specific config file can be added [here](https://github.com/nf-core/configs/tree/master/conf/pipeline/sarek). -## Spark related issues - -If you have problems running processes that make use of Spark such, for instance, as `MarkDuplicates`, then that might be due to a limit on the number of simultaneously open files on your system. -You can check your current limit by typing the following: - -```bash -ulimit -n -``` - -The default limit size is usually 1024 which is quite low to run Spark jobs. -In order to increase the size limit permanently you can: - -Edit the file `/etc/security/limits.conf` and add the lines: - -```bash -* soft nofile 65535 -* hard nofile 65535 -``` - -Edit the file `/etc/sysctl.conf` and add the line: - -```bash -fs.file-max = 65535 -``` - -Edit the file `/etc/sysconfig/docker` and add the new limits to OPTIONS like this: - -```bash -OPTIONS=”—default-ulimit nofile=65535:65535" -``` - -Re-start your session. - -Note that the way to increase the open file limit in your system may be slightly different or require additional steps. - -### Cannot delete work folder when using docker + Spark - -Currently, when running spark-based tools in combination with docker, it is required to set `docker.userEmulation = false`. This can unfortunately cause permission issues when `work/` is being written with root permissions. In case this happens, you might need to configure docker to run without `userEmulation` (see [here](https://github.com/Midnighter/nf-core-adr/blob/main/docs/adr/0008-refrain-from-using-docker-useremulation-in-nextflow.md)). - -## How to handle UMIs - -Sarek can process UMI-reads, using [fgbio](http://fulcrumgenomics.github.io/fgbio/tools/latest/) tools. - -In order to use reads containing UMI tags as your initial input, you need to include `--umi_read_structure ` in your parameters. - -This will enable pre-processing of the reads and UMI consensus reads calling, which will then be used to continue the workflow from the mapping steps. For post-UMI processing depending on the experimental setup, duplicate marking and base quality recalibration can be skipped with `--skip_tools`. - -### UMI Read Structure - -This parameter is a string, which follows a [convention](https://github.com/fulcrumgenomics/fgbio/wiki/Read-Structures) to describe the structure of the umi. -If your reads contain a UMI only on one end, the string should only represent one structure (i.e. "2M11S+T"); should your reads contain a UMI on both ends, the string will contain two structures separated by a blank space (i.e. "2M11S+T 2M11S+T"). - -### Limitations and future updates - -Recent updates to Samtools have been introduced, which can speed-up performance of fgbio tools used in this workflow. -The current workflow does not handle duplex UMIs (i.e. where opposite strands of a duplex molecule have been tagged with a different UMI), and best practices have been proposed to process this type of data. -Both changes will be implemented in a future release. - ## MultiQC related issues ### Plots for SnpEff are missing @@ -870,7 +799,7 @@ Both changes will be implemented in a future release. When plots are missing, it is possible that the fasta and the custom SnpEff database are not matching https://pcingola.github.io/SnpEff/se_faq/#error_chromosome_not_found-details. The SnpEff completes without throwing an error causing nextflow to complete successfully. An indication for the error are these lines in the `.command` files: -``` +```text ERRORS: Some errors were detected Error type Number of errors ERROR_CHROMOSOME_NOT_FOUND 17522411