Skip to content

Commit

Permalink
fix: rename samplename in final vcf header (#1310)
Browse files Browse the repository at this point in the history
This PR aims to change the name of the sample columns in the final VCFs to match the changes in the scout_load.yaml for sample-id, which will use LIMS-ID instead of TUMOR/NORMAL. (Clinical-Genomics/cg#2650)

Added:
- New rule to create namemap to translate sample type names in vcf header to sampleID

Changed:
- Renamed names of sample columns in final vcf (clinical.pass.vcf.gz) to sampleID

Removed:
- Removed sed command in CNVpytor rule introduced here (fix: cnvpytor header float #1182) as it has been fixed by updating CNVpytor and is no longer necessary ([ Failed Analysis ] CNVpytor float values not accepted by bcftools #1152)
  • Loading branch information
mathiasbio authored Nov 4, 2023
1 parent b084920 commit 66b049e
Show file tree
Hide file tree
Showing 12 changed files with 64 additions and 21 deletions.
4 changes: 4 additions & 0 deletions BALSAMIC/constants/cluster_analysis.json
Original file line number Diff line number Diff line change
Expand Up @@ -284,6 +284,10 @@
"time": "00:15:00",
"n": 1
},
"create_final_vcf_namemap": {
"time": "00:15:00",
"n": 1
},
"svdb_merge_tumor_normal": {
"time": "01:00:00",
"n": 8
Expand Down
1 change: 1 addition & 0 deletions BALSAMIC/constants/rules.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@
"snakemake_rules/annotation/germline_annotation.rule",
"snakemake_rules/annotation/varcaller_sv_filter.rule",
"snakemake_rules/annotation/vcf2cytosure_convert.rule",
"snakemake_rules/annotation/final_vcf_reheader.rule",
],
},
"single_targeted": {
Expand Down
18 changes: 18 additions & 0 deletions BALSAMIC/snakemake_rules/annotation/final_vcf_reheader.rule
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# vim: syntax=python tabstop=4 expandtab
# coding: utf-8

rule create_final_vcf_namemap:
input:
multiqc_json = qc_dir + "multiqc_data/multiqc_data.json",
output:
namemap = vep_dir + "status_to_sample_id_namemap"
params:
status_to_sample_id = status_to_sample_id
message:
"Creating final vcf namemap."
threads:
get_threads(cluster_config, "create_final_vcf_namemap")
shell:
"""
echo -e {params.status_to_sample_id} > {output.namemap};
"""
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,7 @@ rm {output.vcf_pass_tnscope_umi}.temp2;
rule bcftools_filter_vardict_clinical_tumor_normal:
input:
vcf_snv_clinical = vep_dir + "{var_type}.somatic.{case_name}.vardict.clinical.vcf.gz",
namemap = vep_dir + "status_to_sample_id_namemap"
output:
vcf_pass_vardict = vep_dir + "{var_type}.somatic.{case_name}.vardict.clinical.filtered.pass.vcf.gz",
bcftools_counts_clinical = vep_dir + "{var_type}.somatic.{case_name}.vardict.clinical.filtered.pass.stats"
Expand All @@ -152,11 +153,11 @@ rule bcftools_filter_vardict_clinical_tumor_normal:
"adding FOUND_IN tags to the output VCF for {params.case_name} "
shell:
"""
bcftools view {input.vcf_snv_clinical} | \
bcftools reheader --threads {threads} -s {input.namemap} {input.vcf_snv_clinical} |\
bcftools filter --threads {threads} --include 'INFO/GNOMADAF_popmax <= {params.pop_freq[0]} || INFO/GNOMADAF_popmax == \".\"' --soft-filter '{params.pop_freq[1]}' --mode '+' | \
bcftools filter --threads {threads} --include 'INFO/SWEGENAF <= {params.swegen_freq[0]} || INFO/SWEGENAF == \".\"' --soft-filter '{params.swegen_freq[1]}' --mode '+' |\
bcftools filter --threads {threads} --include 'INFO/Frq <= {params.loqusdb_clinical_freq[0]} || INFO/Frq == \".\"' --soft-filter '{params.loqusdb_clinical_freq[1]}' --mode '+' |\
bcftools view --threads {threads} -f PASS -o {output.vcf_pass_vardict}.temp1 -O z;
bcftools view --threads {threads} -f PASS -O z -o {output.vcf_pass_vardict}.temp1;
python {params.edit_vcf_script} \
--input_vcf {output.vcf_pass_vardict}.temp1 \
Expand All @@ -178,6 +179,7 @@ rm {output.vcf_pass_vardict}.temp2;
rule bcftools_filter_TNscope_umi_clinical_tumor_normal:
input:
vcf_snv_clinical = vep_dir + "{var_type}.somatic.{case_name}.tnscope_umi.clinical.vcf.gz",
namemap = vep_dir + "status_to_sample_id_namemap"
output:
vcf_pass_tnscope_umi = vep_dir + "{var_type}.somatic.{case_name}.tnscope_umi.clinical.filtered.pass.vcf.gz",
bcftools_counts_clinical = vep_dir + "{var_type}.somatic.{case_name}.tnscope_umi.clinical.filtered.pass.stats"
Expand All @@ -200,11 +202,12 @@ rule bcftools_filter_TNscope_umi_clinical_tumor_normal:
"adding FOUND_IN tags to the output VCF file for {params.case_name} "
shell:
"""
bcftools view --threads {threads} -f PASS,triallelic_site {input.vcf_snv_clinical} | \
bcftools reheader --threads {threads} -s {input.namemap} {input.vcf_snv_clinical} |\
bcftools view --threads {threads} -f PASS,triallelic_site | \
bcftools filter --threads {threads} --include 'INFO/GNOMADAF_popmax <= {params.pop_freq[0]} || INFO/GNOMADAF_popmax == \".\"' --soft-filter '{params.pop_freq[1]}' --mode '+' | \
bcftools filter --threads {threads} --include 'INFO/SWEGENAF <= {params.swegen_freq[0]} || INFO/SWEGENAF == \".\"' --soft-filter '{params.swegen_freq[1]}' --mode '+' |\
bcftools filter --threads {threads} --include 'INFO/Frq <= {params.loqusdb_clinical_freq[0]} || INFO/Frq == \".\"' --soft-filter '{params.loqusdb_clinical_freq[1]}' --mode '+' |\
bcftools view --threads {threads} -i 'FILTER == "PASS" || FILTER == "triallelic_site"' -o {output.vcf_pass_tnscope_umi}.temp1 -O z;
bcftools view --threads {threads} -i 'FILTER == "PASS" || FILTER == "triallelic_site"' -O z -o {output.vcf_pass_tnscope_umi}.temp1;
python {params.edit_vcf_script} \
--input_vcf {output.vcf_pass_tnscope_umi}.temp1 \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,7 @@ rm {output.vcf_pass_tnscope_umi}.temp2;
rule bcftools_filter_vardict_clinical_tumor_only:
input:
vcf_snv_clinical = vep_dir + "{var_type}.somatic.{case_name}.vardict.clinical.vcf.gz",
namemap = vep_dir + "status_to_sample_id_namemap"
output:
vcf_pass_vardict = vep_dir + "{var_type}.somatic.{case_name}.vardict.clinical.filtered.pass.vcf.gz",
bcftools_counts_clinical = vep_dir + "{var_type}.somatic.{case_name}.vardict.clinical.filtered.pass.stats"
Expand All @@ -152,12 +153,13 @@ rule bcftools_filter_vardict_clinical_tumor_only:
"adding FOUND_IN tags to the output VCF for {params.case_name}"
shell:
"""
bcftools view {input.vcf_snv_clinical} | \
bcftools reheader --threads {threads} -s {input.namemap} {input.vcf_snv_clinical} |\
bcftools filter --include 'INFO/GNOMADAF_popmax <= {params.pop_freq[0]} || INFO/GNOMADAF_popmax == \".\"' --soft-filter '{params.pop_freq[1]}' --mode '+' | \
bcftools filter --threads {threads} --include 'INFO/SWEGENAF <= {params.swegen_freq[0]} || INFO/SWEGENAF == \".\"' --soft-filter '{params.swegen_freq[1]}' --mode '+' |\
bcftools filter --threads {threads} --include 'INFO/Frq <= {params.loqusdb_clinical_freq[0]} || INFO/Frq == \".\"' --soft-filter '{params.loqusdb_clinical_freq[1]}' --mode '+' |\
bcftools view --threads {threads} -f PASS -o {output.vcf_pass_vardict}.temp1 -O z;
python {params.edit_vcf_script} \
--input_vcf {output.vcf_pass_vardict}.temp1 \
--output_vcf {output.vcf_pass_vardict}.temp2 \
Expand All @@ -178,6 +180,7 @@ rm {output.vcf_pass_vardict}.temp2;
rule bcftools_filter_TNscope_umi_clinical_tumor_only:
input:
vcf_snv_clinical = vep_dir + "{var_type}.somatic.{case_name}.tnscope_umi.clinical.vcf.gz",
namemap = vep_dir + "status_to_sample_id_namemap"
output:
vcf_pass_tnscope_umi = vep_dir + "{var_type}.somatic.{case_name}.tnscope_umi.clinical.filtered.pass.vcf.gz",
bcftools_counts_clinical = vep_dir + "{var_type}.somatic.{case_name}.tnscope_umi.clinical.filtered.pass.stats"
Expand All @@ -200,11 +203,12 @@ rule bcftools_filter_TNscope_umi_clinical_tumor_only:
"adding FOUND_IN tags to the output VCF for {params.case_name}"
shell:
"""
bcftools view --threads {threads} -f PASS,triallelic_site {input.vcf_snv_clinical} | \
bcftools reheader --threads {threads} -s {input.namemap} {input.vcf_snv_clinical} |\
bcftools view --threads {threads} -f PASS,triallelic_site | \
bcftools filter --threads {threads} --include 'INFO/GNOMADAF_popmax <= {params.pop_freq[0]} || INFO/GNOMADAF_popmax == \".\"' --soft-filter '{params.pop_freq[1]}' --mode '+' | \
bcftools filter --threads {threads} --include 'INFO/SWEGENAF <= {params.swegen_freq[0]} || INFO/SWEGENAF == \".\"' --soft-filter '{params.swegen_freq[1]}' --mode '+' |\
bcftools filter --threads {threads} --include 'INFO/Frq <= {params.loqusdb_clinical_freq[0]} || INFO/Frq == \".\"' --soft-filter '{params.loqusdb_clinical_freq[1]}' --mode '+' |\
bcftools view --threads {threads} -i 'FILTER == "PASS" || FILTER == "triallelic_site"' -o {output.vcf_pass_tnscope_umi}.temp1 -O z;
bcftools view --threads {threads} -i 'FILTER == "PASS" || FILTER == "triallelic_site"' -O z -o {output.vcf_pass_tnscope_umi}.temp1;
python {params.edit_vcf_script} \
--input_vcf {output.vcf_pass_tnscope_umi}.temp1 \
Expand Down
8 changes: 5 additions & 3 deletions BALSAMIC/snakemake_rules/annotation/varcaller_sv_filter.rule
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ rule bcftools_filter_sv_research:
"""
bcftools view --threads {threads} -f .,PASS,MaxDepth {input.vcf_sv_research} |\
bcftools filter --threads {threads} --include 'INFO/SWEGENAF <= {params.swegen_freq[0]} || INFO/SWEGENAF == \".\"' --soft-filter '{params.swegen_freq[1]}' --mode '+' |\
bcftools view --threads {threads} -f .,PASS,MaxDepth -O z -o {output.vcf_pass_svdb};
bcftools view --threads {threads} -f .,PASS,MaxDepth -O z -o {output.vcf_pass_svdb};
tabix -p vcf -f {output.vcf_pass_svdb};
Expand All @@ -36,6 +36,7 @@ bcftools +counts {output.vcf_pass_svdb} > {output.bcftools_counts};
rule bcftools_filter_sv_clinical:
input:
vcf_sv_clinical = vep_dir + "SV.somatic.{case_name}.svdb.clinical.vcf.gz",
namemap = vep_dir + "status_to_sample_id_namemap"
output:
vcf_pass_svdb = vep_dir + "SV.somatic.{case_name}.svdb.clinical.filtered.pass.vcf.gz",
bcftools_counts = vep_dir + "SV.somatic.{case_name}.svdb.clinical.filtered.pass.stats"
Expand All @@ -54,10 +55,11 @@ rule bcftools_filter_sv_clinical:
"Filtering merged clinical structural and copy number variants using bcftools for {params.case_name}"
shell:
"""
bcftools view --threads {threads} -f .,PASS,MaxDepth {input.vcf_sv_clinical} |\
bcftools reheader --threads {threads} -s {input.namemap} {input.vcf_sv_clinical} |\
bcftools view --threads {threads} -f .,PASS,MaxDepth |\
bcftools filter --threads {threads} --include 'INFO/SWEGENAF <= {params.swegen_freq[0]} || INFO/SWEGENAF == \".\"' --soft-filter '{params.swegen_freq[1]}' --mode '+' |\
bcftools filter --threads {threads} --include 'INFO/Frq <= {params.loqusdb_clinical_freq[0]} || INFO/Frq == \".\"' --soft-filter '{params.loqusdb_clinical_freq[1]}' --mode '+' |\
bcftools view --threads {threads} -f .,PASS,MaxDepth -O z -o {output.vcf_pass_svdb};
bcftools view --threads {threads} -f .,PASS,MaxDepth -O z -o {output.vcf_pass_svdb};
tabix -p vcf -f {output.vcf_pass_svdb};
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ bcftools +counts {output.vcf_pass_tnscope} > {output.bcftools_counts_research};
rule bcftools_filter_tnscope_clinical_tumor_normal:
input:
vcf_snv_clinical = vep_dir + "{var_type}.somatic.{case_name}.tnscope.clinical.vcf.gz",
namemap = vep_dir + "status_to_sample_id_namemap"
output:
vcf_pass_tnscope = vep_dir + "{var_type}.somatic.{case_name}.tnscope.clinical.filtered.pass.vcf.gz",
bcftools_counts_clinical = vep_dir + "{var_type}.somatic.{case_name}.tnscope.clinical.filtered.pass.stats"
Expand All @@ -50,18 +51,19 @@ rule bcftools_filter_tnscope_clinical_tumor_normal:
swegen_freq = [SENTIEON_CALLER.swegen_snv_freq.tag_value, SENTIEON_CALLER.swegen_snv_freq.filter_name],
loqusdb_clinical_freq = [SENTIEON_CALLER.loqusdb_clinical_snv_freq.tag_value, SENTIEON_CALLER.loqusdb_clinical_snv_freq.filter_name],
housekeeper_id = {"id": config["analysis"]["case_id"], "tags": "clinical"},
case_name = '{case_name}'
case_name = '{case_name}',
threads:
get_threads(cluster_config, 'bcftools_filter_tnscope_clinical_tumor_normal')
message:
"Filtering WGS tumor-normal tnscope annotated clinical variants using bcftools for {params.case_name}"
shell:
"""
bcftools view -f PASS,triallelic_site --threads {threads} {input.vcf_snv_clinical} | \
bcftools reheader --threads {threads} -s {input.namemap} {input.vcf_snv_clinical} | \
bcftools view -f PASS,triallelic_site --threads {threads} | \
bcftools filter --threads {threads} --include 'INFO/GNOMADAF_popmax <= {params.pop_freq[0]} || INFO/GNOMADAF_popmax == \".\"' --soft-filter '{params.pop_freq[1]}' --mode '+' | \
bcftools filter --threads {threads} --include 'INFO/SWEGENAF <= {params.swegen_freq[0]} || INFO/SWEGENAF == \".\"' --soft-filter '{params.swegen_freq[1]}' --mode '+' |\
bcftools filter --threads {threads} --include 'INFO/Frq <= {params.loqusdb_clinical_freq[0]} || INFO/Frq == \".\"' --soft-filter '{params.loqusdb_clinical_freq[1]}' --mode '+' |\
bcftools view --threads {threads} -i 'FILTER == "PASS" || FILTER == "triallelic_site"' -o {output.vcf_pass_tnscope} -O z;
bcftools view --threads {threads} -i 'FILTER == "PASS" || FILTER == "triallelic_site"' -O z -o {output.vcf_pass_tnscope};
tabix -p vcf -f {output.vcf_pass_tnscope};
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,8 @@ bcftools +counts {output.vcf_pass_tnscope} > {output.bcftools_counts_research};
rule bcftools_filter_tnscope_clinical_tumor_only:
input:
vcf_snv_clinical = vep_dir + "{var_type}.somatic.{case_name}.tnscope.clinical.vcf.gz",
wgs_calling_file = config["reference"]["wgs_calling_regions"]
wgs_calling_file = config["reference"]["wgs_calling_regions"],
namemap = vep_dir + "status_to_sample_id_namemap"
output:
vcf_pass_tnscope = vep_dir + "{var_type}.somatic.{case_name}.tnscope.clinical.filtered.pass.vcf.gz",
bcftools_counts_clinical = vep_dir + "{var_type}.somatic.{case_name}.tnscope.clinical.filtered.pass.stats"
Expand All @@ -60,16 +61,19 @@ rule bcftools_filter_tnscope_clinical_tumor_only:
"Filtering WGS tumor-only tnscope annotated clinical variants using bcftools for {params.case_name}"
shell:
"""
grep -v '^@' {input.wgs_calling_file} > {input.wgs_calling_file}.bed
grep -v '^@' {input.wgs_calling_file} > {input.wgs_calling_file}.bed;
bcftools view -f PASS,triallelic_site --threads {threads} --regions-file {input.wgs_calling_file}.bed {input.vcf_snv_clinical} | \
bcftools view --regions-file {input.wgs_calling_file}.bed {input.vcf_snv_clinical} | \
bcftools reheader --threads {threads} -s {input.namemap} | \
bcftools view -f PASS,triallelic_site --threads {threads} | \
bcftools filter --threads {threads} --include 'INFO/GNOMADAF_popmax <= {params.pop_freq[0]} || INFO/GNOMADAF_popmax == \".\"' --soft-filter '{params.pop_freq[1]}' --mode '+' | \
bcftools filter --threads {threads} --include 'INFO/SWEGENAF <= {params.swegen_freq[0]} || INFO/SWEGENAF == \".\"' --soft-filter '{params.swegen_freq[1]}' --mode '+' |\
bcftools filter --threads {threads} --include 'INFO/Frq <= {params.loqusdb_clinical_freq[0]} || INFO/Frq == \".\"' --soft-filter '{params.loqusdb_clinical_freq[1]}' --mode '+' |\
bcftools filter --threads {threads} --include 'INFO/SWEGENAF <= {params.swegen_freq[0]} || INFO/SWEGENAF == \".\"' --soft-filter '{params.swegen_freq[1]}' --mode '+' | \
bcftools filter --threads {threads} --include 'INFO/Frq <= {params.loqusdb_clinical_freq[0]} || INFO/Frq == \".\"' --soft-filter '{params.loqusdb_clinical_freq[1]}' --mode '+' | \
bcftools view --threads {threads} -i 'FILTER == "PASS" || FILTER == "triallelic_site"' -o {output.vcf_pass_tnscope} -O z;
tabix -p vcf -f {output.vcf_pass_tnscope};
bcftools +counts {output.vcf_pass_tnscope} > {output.bcftools_counts_clinical};
"""


Original file line number Diff line number Diff line change
Expand Up @@ -243,8 +243,6 @@ cp {params.tmpdir}/{params.tumor}.global.0000.png {output.scatter_cnvpytor};
cp {params.tmpdir}/{params.tumor}.circular.0001.png {output.circular_cnvpytor};
sed 's/FORMAT=<ID=CN,Number=1,Type=Integer/FORMAT=<ID=CN,Number=1,Type=Float/g' -i {params.tmpdir}/{params.tumor}.vcf
bgzip -c -l 9 {params.tmpdir}/{params.tumor}.vcf > {output.cnv_cnvpytor};
echo -e \"{params.tumor}\\tTUMOR\" > {output.namemap};
Expand Down
1 change: 0 additions & 1 deletion BALSAMIC/workflows/QC.smk
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,6 @@ vcf_dir: str = Path(result_dir, "vcf").as_posix() + "/"
qc_dir: str = Path(result_dir, "qc").as_posix() + "/"
delivery_dir: str = Path(result_dir, "delivery").as_posix() + "/"


# Run information
singularity_image: str = config_model.singularity['image']
sample_names: List[str] = config_model.get_all_sample_names()
Expand Down
6 changes: 6 additions & 0 deletions BALSAMIC/workflows/balsamic.smk
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,12 @@ sequencing_type = config_model.analysis.sequencing_type
if config_model.analysis.analysis_type == "paired":
normal_sample: str = config_model.get_sample_name_by_type(SampleType.NORMAL)

# Sample status to sampleID namemap
if config_model.analysis.analysis_type == "paired":
status_to_sample_id = "TUMOR" + "\\\\t" + tumor_sample + "\\\\n" + "NORMAL" + "\\\\t" + normal_sample
else:
status_to_sample_id = "TUMOR" + "\\\\t" + tumor_sample


# vcfanno annotations
research_annotations.append( {
Expand Down
2 changes: 2 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ Changed:
* Updated snakemake version to 7.32.4 https://github.com/Clinical-Genomics/BALSAMIC/pull/1308
* Migrate analysis models to pydantic v2 https://github.com/Clinical-Genomics/BALSAMIC/pull/1306
* Split analysis model into config and params models https://github.com/Clinical-Genomics/BALSAMIC/pull/1306
* Renamed name in sample column of final clincial vcfs https://github.com/Clinical-Genomics/BALSAMIC/pull/1310

Fixed:
^^^^^^
Expand All @@ -87,6 +88,7 @@ Removed:
* Plugin CLI https://github.com/Clinical-Genomics/BALSAMIC/pull/1245
* Realignment step for TGA workflow https://github.com/Clinical-Genomics/BALSAMIC/pull/1272
* Archived/outdated workflows and scripts https://github.com/Clinical-Genomics/BALSAMIC/pull/1296
* Sed command to convert CNVpytor integer to float, deprecated by updated CNVpytor version https://github.com/Clinical-Genomics/BALSAMIC/pull/1310

[12.0.2]
--------
Expand Down

0 comments on commit 66b049e

Please sign in to comment.