Concatenating germline vcfs #792

asp8200 · 2022-10-11T14:58:13Z

Issue #738.

Adding option --concatenate_vcfs for concatenating the vcf-files from all the germline variant-callers, except cnvkit which doesn't produce a vcf-file.

I've set it up so that Sarek puts the concatenated .vcf.gz-file here:

results/variant_calling/concat/<patient>/<patient>.germline.vcf.gz

The output-vcf-files are sorted and have index-files. Added an INFO-field named SOURCE in the output-vcf-fils giving the name of the (source) vcf-file from whence the variant came. (The filename also indicate which variant-caller was used.)

I updated nextflow_schema.json - perhaps docs/usage.md and docs/output.md should also be updated?

PR checklist

This comment contains a description of changes (with reason).
If you've fixed a bug or added code that should be tested, add tests!
If you've added a new tool - have you followed the pipeline conventions in the contribution docs- [ ] If necessary, also make a PR on the nf-core/sarek branch on the nf-core/test-datasets repository.
Make sure your code lints (nf-core lint).
Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
Usage Documentation in docs/usage.md is updated.
Output Documentation in docs/output.md is updated.
CHANGELOG.md is updated.
README.md is updated (including new tool citations and authors/contributors).

github-actions · 2022-10-11T15:12:51Z

`nf-core lint` overall result: Passed ✅ ⚠️

Posted for pipeline commit b32b4cf

+| ✅ 151 tests passed       |+
#| ❔   8 tests were ignored |#
!| ❗   2 tests had warnings |!

❗ Test warnings:

pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your prefered methods description, e.g. add publication citation for this pipeline
schema_description - No description provided in schema for parameter: cnvkit_reference

❔ Tests ignored:

files_exist - File is ignored: conf/modules.config
files_exist - File is ignored: conf/test.config
files_exist - File is ignored: conf/test_full.config
files_unchanged - File ignored due to lint config: assets/nf-core-sarek_logo_light.png
files_unchanged - File ignored due to lint config: docs/images/nf-core-sarek_logo_light.png
files_unchanged - File ignored due to lint config: docs/images/nf-core-sarek_logo_dark.png
files_unchanged - File ignored due to lint config: lib/NfcoreTemplate.groovy
template_strings - template_strings

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: assets/nf-core-sarek_logo_light.png
files_exist - File found: docs/images/nf-core-sarek_logo_light.png
files_exist - File found: docs/images/nf-core-sarek_logo_dark.png
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: lib/nfcore_external_java_deps.jar
files_exist - File found: lib/NfcoreSchema.groovy
files_exist - File found: lib/NfcoreTemplate.groovy
files_exist - File found: lib/Utils.groovy
files_exist - File found: lib/WorkflowMain.groovy
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: conf/igenomes.config
files_exist - File found: .github/workflows/awstest.yml
files_exist - File found: .github/workflows/awsfulltest.yml
files_exist - File found: lib/WorkflowSarek.groovy
files_exist - File found: modules.json
files_exist - File found: pyproject.toml
files_exist - File not found check: Singularity
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: docs/images/nf-core-sarek_logo.png
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: .travis.yml
nextflow_config - Config variable found: manifest.name
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: manifest.homePage
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: params.show_hidden_params
nextflow_config - Config variable found: params.schema_ignore_params
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable (correctly) not found: params.version
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config manifest.name began with nf-core/
nextflow_config - Config variable manifest.homePage began with https://github.com/nf-core/
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config manifest.version ends in dev: '3.2dev'
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - CODE_OF_CONDUCT.md matches the template
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/CONTRIBUTING.md matches the template
files_unchanged - .github/ISSUE_TEMPLATE/bug_report.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/config.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/PULL_REQUEST_TEMPLATE.md matches the template
files_unchanged - .github/workflows/branch.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .github/workflows/linting.yml matches the template
files_unchanged - assets/email_template.html matches the template
files_unchanged - assets/email_template.txt matches the template
files_unchanged - assets/sendmail_template.txt matches the template
files_unchanged - docs/README.md matches the template
files_unchanged - lib/nfcore_external_java_deps.jar matches the template
files_unchanged - lib/NfcoreSchema.groovy matches the template
files_unchanged - .gitignore matches the template
files_unchanged - .prettierignore matches the template
files_unchanged - pyproject.toml matches the template
actions_ci - '.github/workflows/ci.yml' is triggered on expected events
actions_ci - '.github/workflows/ci.yml' checks minimum NF version
actions_awstest - '.github/workflows/awstest.yml' is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml does not use -profile test
readme - README Nextflow minimum version badge matched config. Badge: 21.10.3, Config: 21.10.3
readme - README Nextflow minimum version in Quick Start section matched config. README: 21.10.3, Config: 21.10.3
pipeline_name_conventions - Name adheres to nf-core convention
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
schema_params - Schema matched params returned from nextflow config
actions_schema_validation - Workflow validation passed: pytest-workflow.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: fix-linting.yml
actions_schema_validation - Workflow validation passed: awsfulltest.yml
actions_schema_validation - Workflow validation passed: awstest.yml
actions_schema_validation - Workflow validation passed: awsfulltest_germline.yml
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
actions_schema_validation - Workflow validation passed: linting.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
multiqc_config - 'assets/multiqc_config.yml' follows the ordering scheme of the minimally required plugins.
multiqc_config - 'assets/multiqc_config.yml' contains a matching 'report_comment'.
multiqc_config - 'assets/multiqc_config.yml' contains 'export_plots: true'.
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'

Run details

nf-core/tools version 2.6
Run at 2022-12-06 19:06:23

…iles being concatenated

conf/modules/modules.config

subworkflows/local/bam_variant_calling_germline_all/main.nf

maxulysse · 2022-11-11T10:25:22Z

So far I am just concatenating the germline-vcfs from haplotypecaller and strelka, and placing the resulting vcf <patient>.germline.vcf.gz in the results-folder results/variant_calling/concat/<patient>.

I think it's best to start small, and just do germline snps/indels for now.

@maxime doesn't want the concatenation to be optional.

I do think it's better to have that optional, people can have different usage downstream.

I've set it up so that Sarek puts the concatenated .vcf.gz-file here:
results/variant_calling/concat/<patient>/<patient>.germline.vcf.gz
Should there also be a .tbi-file for the vcf-file?

Yes, in my opinion, as long as we produce a vcf.gz, we should have it tabix indexed.
Can we create a results/variant_calling/concat/<patient>/<patient>.germline.txt to list all vcf that were concatenated to produce this file, or do we have that info in the final vcf?

asp8200 · 2022-11-11T10:37:21Z

So far I am just concatenating the germline-vcfs from haplotypecaller and strelka, and placing the resulting vcf <patient>.germline.vcf.gz in the results-folder results/variant_calling/concat/<patient>.

I think it's best to start small, and just do germline snps/indels for now.

@maxime doesn't want the concatenation to be optional.

I do think it's better to have that optional, people can have different usage downstream.
I've set it up so that Sarek puts the concatenated .vcf.gz-file here:
results/variant_calling/concat/<patient>/<patient>.germline.vcf.gz
Should there also be a .tbi-file for the vcf-file?
Yes, in my opinion, as long as we produce a vcf.gz, we should have it tabix indexed. Can we create a results/variant_calling/concat/<patient>/<patient>.germline.txt to list all vcf that were concatenated to produce this file, or do we have that info in the final vcf?

Thanks for the feedback, @maxulysse. Much appreciated. I'll make the concatenation optional somehow :-)

Concerning your idea about the text-file - the vcf-file produced by bcftools concat already contains information about which vcf-files where concatenated:

##bcftools_concatCommand=concat --output test1.germline.vcf.gz --threads 1 test1.strelka.variants.vcf.gz test1.manta.diploid_sv.vcf.gz test1.haplotypecaller.filtered.vcf.gz; Date=Thu Nov 10 21:40:33 2022

I'd say that makes the text-file redundant, right?

maxulysse · 2022-11-11T10:46:08Z

I'd say that makes the text-file redundant, right?
yes, that's enough for me indeed

…vcf-files from deepvariant.

… for vcf-files from freebayes.

… for vcf-files from tiddit.

…th the other variant-calling-parameters.

asp8200 · 2022-11-29T10:10:00Z

@FriederikeHanssen @maxulysse : Can I get you guys to do a preliminary review of this PR?

If this PR looks okay, then I'll update the corresponding modules in github.com/nf-core/modules.

I've tested this PR with the following cmd:

nextflow run main.nf -profile test,singularity  --input mapped_joint_bam.fixed.csv -dump-channels -ansi-log false --step variant_calling --concatenate_vcfs --tools cnvkit,deepvariant,freebayes,haplotypecaller,manta,mpileup,strelka,tiddit

and it gives me a concatenated germline-vcf-file which was made by this bcftools concat - command:

##bcftools_concatCommand=concat --output testN.vcf.gz --threads 1 testN.bcftools.vcf.gz testN.tiddit.vcf.gz testN.deepvariant.vcf.gz testN.freebayes.vcf.gz testN.manta.diploid_sv.vcf.gz testN.strelka.variants.vcf.gz testN.haplotypecaller.filtered.vcf.gz; Date=Tue Nov 29 10:47:08 2022

(N.B. The cnvkit doesn't produce a vcf-file, so no variants from cnvkit in the concatenated vcf-file.)

In fact, two concatenated vcf-files were produced, since the input-samplesheet contains to bam-files:

results/variant_calling/concat/testN/testN.germline.vcf.gz
results/variant_calling/concat/testT/testT.germline.vcf.gz

testN.germline.vcf.gz

The vcf-files are sorted and have corresponding tbi-files.

Warning: This PR contains some real clumsy code:
https://github.com/asp8200/sarek/blob/f8edc0034b9f01e3644ae75d7eaf57449581659c/workflows/sarek.nf#L1048-L1060

lassefolkersen

I think the overall looks good. I had one comment on sorting --- is that something that has been discussed? I mean post-concatenate sorting?

Also -- I think interpretation wise it does make sense to talk about 'an intersection' of results with long variants, cnvs, e.g. as when including Manta etc. Few long variant callers will agree on exactly where the break ends are of a variant, so it's not really going to be an intersection more like an union with slightly different border repeats of the same CNV.

modules/nf-core/bcftools/concat/main.nf

conf/modules/modules.config

workflows/sarek.nf

FriederikeHanssen · 2022-12-01T21:10:40Z

Damn! All the hard work I did with getting the variant-callers to return index-files all the way back to sarek.nf seems to be redundant, as I'll have to compute new index files after adding the INFO-field ~~SET~~ SOURCE to the vcf-files.

🙈 oh no

…catenating them.

asp8200 · 2022-12-01T22:39:01Z

Ok, so I introduced a local module for adding the INFO-field SOURCE=<name-of-input-vcf-file>. Here is the concatenated vcf-file

testN.germline.vcf.gz

With the CLI-options --concatenate_vcfs germline-vcf-files from the following variant-callers will be concatenated:

deepvariant
freebayes
haplotypecaller
manta
mpileup
strelka
tiddit

In the attached concatenated vcf-files, there are no variant from manta or tiddit.

What do you guys think about this solution? I'm still passing the index-files from the variant-caller-modules all the way back to sarek.nf; that is actually not necessary with the usage of the local module. Should I get rid of the code passing the index-files from the variant-caller-modules back to sarek.nf? 🤔

…nd patient-id correspond to id in bam-files. See nf-core#872

asp8200 · 2022-12-06T09:41:34Z

I'm still passing the index-files from the variant-caller-modules all the way back to sarek.nf; that is actually not necessary with the usage of the local module. Should I get rid of the code passing the index-files from the variant-caller-modules back to sarek.nf?

The fastest and easiest solution would be just to get rid of the (new) code which is passing the index-files back to sarek.nf, since then I don't have to update anything in nf-core/modules 😁

…ing from that module

tests/csv/3.0/mapped_joint_bam.fixed.csv

asp8200 · 2022-12-06T20:40:15Z

I'm still passing the index-files from the variant-caller-modules all the way back to sarek.nf; that is actually not necessary with the usage of the local module. Should I get rid of the code passing the index-files from the variant-caller-modules back to sarek.nf?

The fastest and easiest solution would be just to get rid of the (new) code which is passing the index-files back to sarek.nf, since then I don't have to update anything in nf-core/modules 😁

@maxulysse asked me to get rid of the redundant code, and so I did.

I now - finally - have all CI-tests passing. Let's merge this thing!

FriederikeHanssen · 2022-12-07T12:14:12Z

uh nice 🥳 🚀

asp8200 added 2 commits October 11, 2022 16:39

WIP. Just concatenating germline-vcfs from strelka and hyplotypecaller

1fd779d

Merge branch 'dev' into concatenating_vcfs

22fe29e

asp8200 requested a review from FriederikeHanssen October 11, 2022 14:58

asp8200 added 2 commits October 12, 2022 13:02

Adding the germline vcf-file from manta to the list of germline vcf-f…

5ee59a8

…iles being concatenated

Making sure the channel manta_vcf_tbi is defined even if manta isnt run

624b6eb

FriederikeHanssen mentioned this pull request Nov 7, 2022

Intersection of VCF-files #809

Open

merge from dev

5a5fb17

maxulysse reviewed Nov 11, 2022

View reviewed changes

conf/modules/modules.config Outdated Show resolved Hide resolved

maxulysse reviewed Nov 11, 2022

View reviewed changes

subworkflows/local/bam_variant_calling_germline_all/main.nf Outdated Show resolved Hide resolved

asp8200 added 13 commits November 13, 2022 12:30

Adding support for concatenation of germline vcf-files. Now also for …

b89f088

…vcf-files from deepvariant.

Adding CLI-open concatenate_vcf to the schema-json

a936722

WIP: Adding support for concatenation of germline vcf-files. Now also…

f91d40b

… for vcf-files from freebayes.

WIP: Adding support for concatenation of germline vcf-files. Now also…

d859e04

… for vcf-files from tiddit.

Merge branch 'dev' into concatenating_vcfs

54d6c43

Merge branch 'dev' into concatenating_vcfs

ad2b5a7

Adding support for concatenation of vcf from mpileup

38ac53d

Changing CLI-option concatenate_vcf to concatenate_vcfs.

dba9993

Merge branch 'dev' into concatenating_vcfs

f476da7

Initializing CLI-option concatenate_vcfs to false.

34baf9a

Sorting concatenated germline-vcf-file and adding tbi.

d3a4578

Updating schema. Grouping the CLI-option concatenate_vcfs together wi…

9e21631

…th the other variant-calling-parameters.

prettier

f8edc00

lassefolkersen approved these changes Nov 30, 2022

View reviewed changes

modules/nf-core/bcftools/concat/main.nf Show resolved Hide resolved

FriederikeHanssen reviewed Nov 30, 2022

View reviewed changes

conf/modules/modules.config Outdated Show resolved Hide resolved

FriederikeHanssen reviewed Nov 30, 2022

View reviewed changes

workflows/sarek.nf Outdated Show resolved Hide resolved

asp8200 added 3 commits December 1, 2022 22:37

Adding INFO-field SOURCE=<input-vcf> to germline-vcf-files before con…

70b1027

…catenating them.

cleaner

c999a8f

Fixed typo in INFO-field SOURCE in concatenated germline-vcf

24ad87a

asp8200 added 6 commits December 5, 2022 20:12

Temporary and fixed copy of mapped_joint_bam.csv in which sample-id a…

04da3de

…nd patient-id correspond to id in bam-files. See nf-core#872

WIP: Adding test of the concatenation of germline-vcfs

8257243

Trying to add new tests

498db83

Trying to get new test running

27826a8

Avoiding publishing files from GERMLINE_VCFS_CONCAT

bd8f2be

Skip CI-test concatenate_vcfs in conda test-env

f910c82

asp8200 added 3 commits December 6, 2022 11:33

prettier

ea9d925

Adding synonym for module BCFTOOLS_CONCAT in order to disable publish…

812f6d0

…ing from that module

Updating changelog

c733593

maxulysse reviewed Dec 6, 2022

View reviewed changes

tests/csv/3.0/mapped_joint_bam.fixed.csv Show resolved Hide resolved

asp8200 added 4 commits December 6, 2022 19:44

Moving config from modules.config to post_variant_calling.config

439246d

fixing comment

4d15e40

Remove code to pass back tbi-files to sarek.nf

07fb548

Comments added

b32b4cf

asp8200 marked this pull request as ready for review December 6, 2022 20:37

asp8200 changed the title ~~DRAFT: Concatenating vcfs~~ Concatenating germline vcfs Dec 6, 2022

asp8200 requested review from amasplund, FriederikeHanssen and maxulysse December 6, 2022 20:38

maxulysse approved these changes Dec 7, 2022

View reviewed changes

maxulysse merged commit 6da7069 into nf-core:dev Dec 7, 2022

asp8200 mentioned this pull request Dec 7, 2022

Combine all VCFs across different variants caller, such as mutect2, strelka2 #738

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concatenating germline vcfs #792

Concatenating germline vcfs #792

asp8200 commented Oct 11, 2022 •

edited

Loading

github-actions bot commented Oct 11, 2022 •

edited

Loading

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

maxulysse commented Nov 11, 2022

asp8200 commented Nov 11, 2022

maxulysse commented Nov 11, 2022

asp8200 commented Nov 29, 2022 •

edited

Loading

lassefolkersen left a comment

FriederikeHanssen commented Dec 1, 2022

asp8200 commented Dec 1, 2022

asp8200 commented Dec 6, 2022 •

edited

Loading

asp8200 commented Dec 6, 2022

FriederikeHanssen commented Dec 7, 2022

Concatenating germline vcfs #792

Concatenating germline vcfs #792

Conversation

asp8200 commented Oct 11, 2022 • edited Loading

PR checklist

github-actions bot commented Oct 11, 2022 • edited Loading

nf-core lint overall result: Passed ✅ ⚠️

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

maxulysse commented Nov 11, 2022

asp8200 commented Nov 11, 2022

maxulysse commented Nov 11, 2022

asp8200 commented Nov 29, 2022 • edited Loading

lassefolkersen left a comment

Choose a reason for hiding this comment

FriederikeHanssen commented Dec 1, 2022

asp8200 commented Dec 1, 2022

asp8200 commented Dec 6, 2022 • edited Loading

asp8200 commented Dec 6, 2022

FriederikeHanssen commented Dec 7, 2022

asp8200 commented Oct 11, 2022 •

edited

Loading

github-actions bot commented Oct 11, 2022 •

edited

Loading

`nf-core lint` overall result: Passed ✅ ⚠️

asp8200 commented Nov 29, 2022 •

edited

Loading

asp8200 commented Dec 6, 2022 •

edited

Loading