Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor #1151

Merged
merged 22 commits into from
Jul 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
7ba9311
indent things, clean up, add comments, consolidate
FriederikeHanssen Jul 10, 2023
fe1c506
refactor post vc steps to prepare for varlociraptor
FriederikeHanssen Jul 10, 2023
d8e6fe7
fix import paths
FriederikeHanssen Jul 10, 2023
e80d157
get rid of WARN: There's no process matching config selector
FriederikeHanssen Jul 10, 2023
be9786f
add post vc back in
FriederikeHanssen Jul 10, 2023
ce4a924
test md and sentieon separately
FriederikeHanssen Jul 11, 2023
000de54
fix input channel cardinality for fgbio
FriederikeHanssen Jul 11, 2023
483fd18
fix naming
FriederikeHanssen Jul 11, 2023
3b53b05
fix more bad spacing, typos
FriederikeHanssen Jul 11, 2023
7efdf76
update sw map with concatenate, move type description
FriederikeHanssen Jul 11, 2023
764e09d
simplify test config
FriederikeHanssen Jul 11, 2023
7eba720
simplify test config
FriederikeHanssen Jul 11, 2023
c36a726
update PR ID
FriederikeHanssen Jul 11, 2023
81dde8c
remove gvcf vcfs from csv for restart
FriederikeHanssen Jul 11, 2023
fc203ac
simplify everything
FriederikeHanssen Jul 11, 2023
2430c9a
add spaces for prettyness
FriederikeHanssen Jul 11, 2023
f5e3aac
fix variable name
FriederikeHanssen Jul 11, 2023
10249ca
remove empty channel
FriederikeHanssen Jul 11, 2023
6584e32
spread out gatk md and sentieon dedup more
FriederikeHanssen Jul 11, 2023
46ff810
update md5sums since csv don't contain gvcf files
FriederikeHanssen Jul 11, 2023
0d8999b
add selector to get controlfreec tests to pass
FriederikeHanssen Jul 11, 2023
bf43387
need to keep wild card declaration to ensure precendence
FriederikeHanssen Jul 12, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Changed

- [#1151](https://github.com/nf-core/sarek/pull/1151) - Refactor codebase

### Fixed

- [#1143](https://github.com/nf-core/sarek/pull/1143) - `snpeff_db` is now a string
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ It's listed on [Elixir - Tools and Data Services Registry](https://bio.tools/nf-
Depending on the options and samples provided, the pipeline can currently perform the following:

- Form consensus reads from UMI sequences (`fgbio`)
- Sequencing quality control and trimming (`FastQC`, `fastp`)
- Sequencing quality control and trimming (enabled by `--trim_fastq`) (`FastQC`, `fastp`)
- Map Reads to Reference (`BWA-mem`, `BWA-mem2`, `dragmap` or `Sentieon BWA-mem`)
- Process BAM file (`GATK MarkDuplicates`, `GATK BaseRecalibrator` and `GATK ApplyBQSR` or `Sentieon LocusCollector` and `Sentieon Dedup`)
- Summarise alignment statistics (`samtools stats`, `mosdepth`)
Expand Down
37 changes: 7 additions & 30 deletions conf/modules/aligner.config
Original file line number Diff line number Diff line change
Expand Up @@ -33,13 +33,7 @@ process {
ext.when = { params.aligner == "sentieon-bwamem" }
}



withName: "(BWAMEM.*_MEM|DRAGMAP_ALIGN)" {
// Markduplicates Spark NEEDS name-sorted reads or runtime goes through the roof
// However if it's skipped, reads need to be coordinate-sorted
// Only name sort if Spark for Markduplicates + duplicate marking is not skipped
ext.args2 = { params.use_gatk_spark && params.use_gatk_spark.contains('markduplicates') && (!params.skip_tools || (params.skip_tools && !params.skip_tools.split(',').contains('markduplicates'))) ? '-n' : '' }
withName: "(BWAMEM.*_MEM|DRAGMAP_ALIGN|SENTIEON_BWAMEM)" {
ext.prefix = { params.split_fastq > 1 ? "${meta.id}".concat('.').concat(reads.get(0).name.tokenize('.')[0]) : "${meta.id}.sorted" }
publishDir = [
mode: params.publish_dir_mode,
Expand All @@ -61,29 +55,12 @@ process {
]
}


withName: "SENTIEON_BWAMEM" {
// Markduplicates Spark NEEDS name-sorted reads or runtime goes through the roof.
// However, currently SENTIEON_BWAMEM only supports coordinate sorting the reads.
ext.prefix = { params.split_fastq > 1 ? "${meta.id}".concat('.').concat(reads.get(0).name.tokenize('.')[0]) : "${meta.id}.sorted" }
publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/preprocessing/" },
pattern: "*bam",
// Only save if save_output_as_bam AND
// (save_mapped OR no_markduplicates OR sentieon_dedup) AND
// only a single BAM file per sample
saveAs: {
if (params.save_output_as_bam &&
(
params.save_mapped ||
(params.skip_tools && params.skip_tools.split(',').contains('markduplicates')) &&
!(params.tools && params.tools.split(',').contains('sentieon_dedup'))
) && (meta.size * meta.num_lanes == 1)
) { "mapped/${meta.id}/${it}" }
else { null }
}
]
withName: "(BWAMEM.*_MEM|DRAGMAP_ALIGN)" {
// Markduplicates Spark NEEDS name-sorted reads or runtime goes through the roof
// However if it's skipped, reads need to be coordinate-sorted
// Only name sort if Spark for Markduplicates + duplicate marking is not skipped
// Currently SENTIEON_BWAMEM only supports coordinate sorting the reads.
ext.args2 = { params.use_gatk_spark && params.use_gatk_spark.contains('markduplicates') && (!params.skip_tools || (params.skip_tools && !params.skip_tools.split(',').contains('markduplicates'))) ? '-n' : '' }
}

withName: "BWAMEM.*_MEM|SENTIEON_BWAMEM" {
Expand Down
10 changes: 10 additions & 0 deletions conf/modules/markduplicates.config
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,16 @@ process {
]
}

withName: 'NFCORE_SAREK:SAREK:(BAM_MARKDUPLICATES|BAM_MARKDUPLICATES_SPARK):CRAM_QC_MOSDEPTH_SAMTOOLS:SAMTOOLS_STATS' {
ext.when = { !(params.skip_tools && params.skip_tools.split(',').contains('samtools')) }
ext.prefix = { "${meta.id}.md.cram" }
publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/reports/samtools/${meta.id}" },
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: 'BAM_TO_CRAM_MAPPING' {
// Run only when mapping should be saved as CRAM or when no MD is done
ext.when = (params.save_mapped && !params.save_output_as_bam) ||
Expand Down
20 changes: 0 additions & 20 deletions conf/modules/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -40,26 +40,6 @@ process {
]
}

withName: 'NFCORE_SAREK:SAREK:(BAM_MARKDUPLICATES|BAM_MARKDUPLICATES_SPARK):CRAM_QC_MOSDEPTH_SAMTOOLS:SAMTOOLS_STATS' {
ext.when = { !(params.skip_tools && params.skip_tools.split(',').contains('samtools')) }
ext.prefix = { "${meta.id}.md.cram" }
publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/reports/samtools/${meta.id}" },
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: 'NFCORE_SAREK:SAREK:BAM_SENTIEON_DEDUP:CRAM_QC_MOSDEPTH_SAMTOOLS:SAMTOOLS_STATS' {
ext.when = { !(params.skip_tools && params.skip_tools.split(',').contains('samtools')) }
ext.prefix = { "${meta.id}.dedup.cram" }
publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/reports/samtools/${meta.id}" },
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: 'NFCORE_SAREK:SAREK:CRAM_QC_NO_MD:SAMTOOLS_STATS' {
ext.when = { !(params.skip_tools && params.skip_tools.split(',').contains('samtools')) }
ext.prefix = { "${meta.id}.sorted.cram" }
Expand Down
1 change: 1 addition & 0 deletions conf/modules/mutect2.config
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@

process {
if (params.tools && params.tools.split(',').contains('mutect2')) {

withName: 'GATK4_MUTECT2' {
ext.prefix = { meta.num_intervals <= 1 ? "${meta.id}.mutect2" : "${meta.id}.mutect2.${intervals.simpleName}" }
ext.when = { params.tools && params.tools.split(',').contains('mutect2') }
Expand Down
5 changes: 5 additions & 0 deletions conf/modules/post_variant_calling.config
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,9 @@
// Like, for instance, concatenating the unannotated, germline vcf-files

process {

withName: 'GERMLINE_VCFS_CONCAT'{
ext.when = params.concatenate_vcfs
publishDir = [
//specify to avoid publishing, overwritten otherwise
enabled: false
Expand All @@ -24,6 +26,7 @@ process {

withName: 'GERMLINE_VCFS_CONCAT_SORT'{
ext.prefix = { "${meta.id}.germline" }
ext.when = params.concatenate_vcfs
publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/variant_calling/concat/${meta.id}/" }
Expand All @@ -32,10 +35,12 @@ process {

withName: 'TABIX_EXT_VCF' {
ext.prefix = { "${input.baseName}" }
ext.when = params.concatenate_vcfs
}

withName: 'TABIX_GERMLINE_VCFS_CONCAT_SORT'{
ext.prefix = { "${meta.id}.germline" }
ext.when = params.concatenate_vcfs
publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/variant_calling/concat/${meta.id}/" }
Expand Down
1 change: 1 addition & 0 deletions conf/modules/prepare_cache.config
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
// PREPARE_CACHE

process {

// SNPEFF
withName: 'SNPEFF_DOWNLOAD' {
ext.when = { params.tools && (params.tools.split(',').contains('snpeff') || params.tools.split(',').contains('merge')) }
Expand Down
12 changes: 12 additions & 0 deletions conf/modules/sentieon_dedup.config
Original file line number Diff line number Diff line change
Expand Up @@ -34,4 +34,16 @@ process {
]
}

if (params.tools && params.tools.contains('sentieon_dedup')) {
withName: 'NFCORE_SAREK:SAREK:BAM_SENTIEON_DEDUP:CRAM_QC_MOSDEPTH_SAMTOOLS:SAMTOOLS_STATS' {
ext.when = { !(params.skip_tools && params.skip_tools.split(',').contains('samtools')) }
ext.prefix = { "${meta.id}.dedup.cram" }
publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/reports/samtools/${meta.id}" },
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}
}

}
18 changes: 10 additions & 8 deletions conf/modules/sentieon_haplotyper.config
Original file line number Diff line number Diff line change
Expand Up @@ -45,14 +45,16 @@ process {
]
}

withName: '.*BAM_VARIANT_CALLING_SENTIEON_HAPLOTYPER:VCF_VARIANT_FILTERING_GATK:FILTERVARIANTTRANCHES' {
ext.prefix = {"${meta.id}.haplotyper"}
ext.args = { "--info-key CNN_1D" }
publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/variant_calling/sentieon_haplotyper/${meta.id}/"},
pattern: "*{vcf.gz,vcf.gz.tbi}"
]
if (params.tools && params.tools.contains('sentieon_haplotyper')) {
Copy link
Contributor

@asp8200 asp8200 Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the if-clause added to avoid the standard WARN when BAM_VARIANT_CALLING_SENTIEON_HAPLOTYPER:VCF_VARIANT_FILTERING_GATK:FILTERVARIANTTRANCHES doesn't run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

withName: '.*BAM_VARIANT_CALLING_SENTIEON_HAPLOTYPER:VCF_VARIANT_FILTERING_GATK:FILTERVARIANTTRANCHES' {
ext.prefix = {"${meta.id}.haplotyper"}
ext.args = { "--info-key CNN_1D" }
publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/variant_calling/sentieon_haplotyper/${meta.id}/"},
pattern: "*{vcf.gz,vcf.gz.tbi}"
]
}
}

}
41 changes: 20 additions & 21 deletions conf/modules/sentieon_joint_germline.config
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@

process {

withName: 'NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:BAM_JOINT_CALLING_GERMLINE_SENTIEON:SENTIEON_GVCFTYPER' {
withName: 'SENTIEON_GVCFTYPER' {
ext.args = { "--allow-old-rms-mapping-quality-annotation-data" }
ext.prefix = { meta.intervals_name }
publishDir = [
Expand All @@ -24,32 +24,32 @@ process {
}

if (params.tools && params.tools.contains('sentieon_haplotyper') && params.joint_germline) {
withName: 'NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:BAM_JOINT_CALLING_GERMLINE_SENTIEON::BCFTOOLS_SORT' {
withName: 'NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:BAM_JOINT_CALLING_GERMLINE_SENTIEON:BCFTOOLS_SORT' {
ext.prefix = { vcf.baseName - ".vcf" + ".sort" }
publishDir = [
enabled: false
]
}
}

withName: 'NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:BAM_JOINT_CALLING_GERMLINE_SENTIEON:MERGE_GENOTYPEGVCFS' {
ext.prefix = "joint_germline"
publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/variant_calling/sentieon_haplotyper/joint_variant_calling/" },
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
pattern: "*{vcf.gz,vcf.gz.tbi}"
]
}
withName: 'NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:BAM_JOINT_CALLING_GERMLINE_SENTIEON:MERGE_GENOTYPEGVCFS' {
ext.prefix = "joint_germline"
publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/variant_calling/sentieon_haplotyper/joint_variant_calling/" },
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
pattern: "*{vcf.gz,vcf.gz.tbi}"
]
}

withName: 'NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:BAM_JOINT_CALLING_GERMLINE_SENTIEON:MERGE_VQSR' {
ext.prefix = "joint_germline_recalibrated"
publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/variant_calling/sentieon_haplotyper/joint_variant_calling/"},
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
pattern: "*{vcf.gz,vcf.gz.tbi}"
]
withName: 'NFCORE_SAREK:SAREK:BAM_VARIANT_CALLING_GERMLINE_ALL:BAM_JOINT_CALLING_GERMLINE_SENTIEON:MERGE_VQSR' {
ext.prefix = "joint_germline_recalibrated"
publishDir = [
mode: params.publish_dir_mode,
path: { "${params.outdir}/variant_calling/sentieon_haplotyper/joint_variant_calling/"},
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
pattern: "*{vcf.gz,vcf.gz.tbi}"
]
}
}

withName: 'SENTIEON_VARCAL_INDEL' {
Expand Down Expand Up @@ -78,5 +78,4 @@ process {
ext.args = '--sensitivity 99.9 --var_type SNP'
}


}
1 change: 1 addition & 0 deletions conf/modules/umi.config
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ process {
enabled: false
]
}

withName: 'GROUPREADSBYUMI' {
publishDir = [
[ path: { "${params.outdir}/reports/umi/" },
Expand Down
26 changes: 13 additions & 13 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ params {
}

process {

withName:'.*:FREEC_SOMATIC'{
ext.args = {
[
Expand All @@ -69,27 +70,26 @@ process {
}
}

if (params.tools && params.tools.split(',').contains('mutect2')) {
if (params.joint_mutect2) {
withName: 'MUTECT2_PAIRED' {
ext.args = { params.ignore_soft_clipped_bases ?
"--dont-use-soft-clipped-bases true --f1r2-tar-gz ${task.ext.prefix}.f1r2.tar.gz --normal-sample ${meta.normal_id}" :
"--f1r2-tar-gz ${task.ext.prefix}.f1r2.tar.gz --normal-sample ${meta.normal_id}" }
}
if (params.joint_mutect2) {
withName: 'MUTECT2_PAIRED' {
ext.args = { params.ignore_soft_clipped_bases ?
"--dont-use-soft-clipped-bases true --f1r2-tar-gz ${task.ext.prefix}.f1r2.tar.gz --normal-sample ${meta.normal_id}" :
"--f1r2-tar-gz ${task.ext.prefix}.f1r2.tar.gz --normal-sample ${meta.normal_id}" }
}
else {
withName: '.*MUTECT2_PAIRED'{
//sample name from when the test data was generated
ext.args = { "--f1r2-tar-gz ${task.ext.prefix}.f1r2.tar.gz --normal-sample normal " }
}
}
else {
withName: 'MUTECT2_PAIRED'{
//sample name from when the test data was generated
ext.args = { "--f1r2-tar-gz ${task.ext.prefix}.f1r2.tar.gz --normal-sample normal " }
}
}

withName: '.*:FILTERVARIANTTRANCHES'{
withName: 'FILTERVARIANTTRANCHES'{
ext.args = { "--info-key CNN_1D --indel-tranche 0" }
}
}


// Enable container engines/virtualisation envs for CI testing
// only works when specified with the profile ENV
// otherwise tests can be done with the regular provided profiles
Expand Down
11 changes: 5 additions & 6 deletions conf/test/cache.config
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ process {
ext.sentieon_auth_data_base64 = secrets.SENTIEON_AUTH_DATA_BASE64
}

// This must contain .* in order to properly overwrite the standard config in test cases
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the comment 👍

withName:'.*:FREEC_SOMATIC'{
ext.args = {
[
Expand All @@ -86,14 +87,12 @@ process {
}
}

if (params.tools && params.tools.split(',').contains('mutect2')) {
withName: '.*MUTECT2_PAIRED'{
//sample name from when the test data was generated
ext.args = { "--f1r2-tar-gz ${task.ext.prefix}.f1r2.tar.gz --normal-sample normal " }
}
withName: 'MUTECT2_PAIRED'{
//sample name from when the test data was generated
ext.args = { "--f1r2-tar-gz ${task.ext.prefix}.f1r2.tar.gz --normal-sample normal " }
}

withName: '.*:FILTERVARIANTTRANCHES'{
withName: 'FILTERVARIANTTRANCHES'{
ext.args = { "--info-key CNN_1D --indel-tranche 0" }
}
}
Expand Down
Binary file modified docs/images/sarek_subway.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading