Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Update the list of files to be stored and delivered #915

Merged
merged 34 commits into from
May 2, 2022

Conversation

ivadym
Copy link
Contributor

@ivadym ivadym commented Apr 19, 2022

This PR:

Updates the list of files that are generated and stored by BALSAMIC.

  • Removed from .hk file:

    • fastp rule:
      • _1.fp.fastq.gz
      • _2.fp.fastq.gz
      • _fastp.json
      • _fastp.html
    • bcftools_filter_tnscope_tumor_only rule:
      • .tnscope_isec.all.filtered.vcf.gz
    • bcftools_filter_tnhaplotyper_tumor_normal and bcftools_filter_tnhaplotyper_tumor_only rules:
      • .tnhaplotyper.all.filtered.vcf.gz
      • .tnhaplotyper.all.filtered.pass.vcf.gz
  • Made temporal:

    • fastp rule:
      • _1.fp.fastq.gz
      • _2.fp.fastq.gz
    • fastp_umi rule:
      • _1.umi_optimized.fastq.gz
      • _2.umi_optimized.fastq.gz

Hermes PR: Clinical-Genomics/hermes#44

Review and tests:

  • Tests pass
  • Code review
  • New code is executed and covered by tests, and test approve

@ivadym ivadym changed the base branch from master to develop April 19, 2022 10:09
@ivadym ivadym changed the title Refactor: Update the list of files to be stored and delivered refactor: Update the list of files to be stored and delivered Apr 19, 2022
@codecov
Copy link

codecov bot commented Apr 19, 2022

Codecov Report

Merging #915 (62c590f) into develop (8142fac) will decrease coverage by 0.28%.
The diff coverage is n/a.

@@             Coverage Diff             @@
##           develop     #915      +/-   ##
===========================================
- Coverage    99.54%   99.25%   -0.29%     
===========================================
  Files           29       29              
  Lines         1749     1749              
===========================================
- Hits          1741     1736       -5     
- Misses           8       13       +5     
Flag Coverage Δ
unittests 99.25% <ø> (-0.29%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
BALSAMIC/constants/workflow_rules.py 100.00% <ø> (ø)
BALSAMIC/utils/rule.py 94.01% <0.00%> (-4.28%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8142fac...62c590f. Read the comment docs.

@ashwini06 ashwini06 mentioned this pull request Apr 21, 2022
5 tasks
@ivadym ivadym marked this pull request as ready for review April 21, 2022 11:02
Copy link
Contributor

@ashwini06 ashwini06 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivadym : Can you also please link the related Hermes PR to this PR.

BALSAMIC/constants/workflow_rules.py Outdated Show resolved Hide resolved
BALSAMIC/constants/workflow_rules.py Outdated Show resolved Hide resolved
@ivadym
Copy link
Contributor Author

ivadym commented Apr 21, 2022

Linking the Hermes PR: Clinical-Genomics/hermes#44

@ashwini06
Copy link
Contributor

@ivadym : Is it possible to fix the tests for few of these missing lines? https://app.codecov.io/gh/Clinical-Genomics/BALSAMIC/compare/915/changes

@ivadym
Copy link
Contributor Author

ivadym commented Apr 22, 2022

@ivadym : Is it possible to fix the tests for few of these missing lines? https://app.codecov.io/gh/Clinical-Genomics/BALSAMIC/compare/915/changes

I was trying to address it and I added an additional test which covers the lines which codecov was complaining about.

def test_deliver_tumor_only_panel_files(
invoke_cli,
environ,
tumor_only_config,
helpers,
sentieon_install_dir,
sentieon_license,
caplog,
):
# GIVEN a tumor-normal config file
helpers.read_config(tumor_only_config)
actual_delivery_report = Path(helpers.delivery_dir, helpers.case_id + ".hk")
# GIVEN a list of expected files to be included in the report
delivery_files = [
"sample_tumor_only.json",
"sample_tumor_only_report.html",
"sample_tumor_only_BALSAMIC_" + balsamic_version + "_graph.pdf",
"multiqc_data.json",
"sample_tumor_only_metrics_deliverables.yaml",
]
# GIVEN a compound tag that should be added to the report
compound_tags = [
["balsamic-config"],
["balsamic-report"],
["balsamic-dag"],
["multiqc-json", "json"],
["qc-metrics-yaml", "yaml"],
]
with mock.patch.dict(
environ,
{
"SENTIEON_LICENSE": sentieon_license,
"SENTIEON_INSTALL_DIR": sentieon_install_dir,
},
), caplog.at_level(logging.DEBUG):
# WHEN running analysis
result = invoke_cli(
[
"report",
"deliver",
"--sample-config",
tumor_only_config,
"--disable-variant-caller",
"cnvkit",
]
)
# THEN it should run without any errors
assert result.exit_code == 0
assert actual_delivery_report.is_file()
with open(actual_delivery_report, "r") as report:
report = json.load(report)["files"]
assert report
# THEN all the expected files and tags should be included in the report
for file in report:
assert os.path.basename(file["path"]) in delivery_files
assert file["tag"] in compound_tags or file["tag"][::-1] in compound_tags

To test it explicitly we would need to call get_rule_output and provide a snakemake.common.Rules objectas an input. Any ideas of how to create a mock Rules object?

@ivadym
Copy link
Contributor Author

ivadym commented Apr 28, 2022

Quality control results:

/qc/sample_tumor_normal_wgs_metrics_deliverables.yaml
/qc/multiqc_report.html
/qc/multiqc_data/multiqc_data.json

Analysis specific results:

  • TN-WGS
        /vep/SNV.germline.tumor.dnascope.vcf.gz
        /vep/SNV.germline.normal.dnascope.vcf.gz
        /vep/SV.germline.tumor.manta_germline.vcf.gz
        /vep/SV.germline.normal.manta_germline.vcf.gz
        /vcf/SNV.somatic.sample_tumor_normal_wgs.tnscope.vcf.gz
        /vcf/SNV.somatic.sample_tumor_normal_wgs.tnhaplotyper.vcf.gz
        /vcf/SV.somatic.sample_tumor_normal_wgs.svdb.vcf.gz
        /vep/SNV.somatic.sample_tumor_normal_wgs.tnscope.all.filtered.pass.vcf.gz
        /vep/SNV.somatic.sample_tumor_normal_wgs.tnhaplotyper.all.filtered.pass.vcf.gz
        /vep/SV.somatic.sample_tumor_normal_wgs.svdb.all.filtered.pass.vcf.gz
        /vep/SNV.somatic.sample_tumor_normal_wgs.tnscope.balsamic_stat
        /vep/SNV.somatic.sample_tumor_normal_wgs.tnhaplotyper.balsamic_stat
        /vcf/CNV.somatic.sample_tumor_normal_wgs.ascat.output.pdf
        /vcf/CNV.somatic.sample_tumor_normal_wgs.ascat.copynumber.txt.gz
  • TN-TGA
        /vep/SNV.germline.tumor.haplotypecaller.vcf.gz
        /vep/SNV.germline.normal.haplotypecaller.vcf.gz
        /vep/SNV.germline.tumor.dnascope.vcf.gz
        /vep/SNV.germline.normal.dnascope.vcf.gz
        /vep/SV.germline.tumor.manta_germline.vcf.gz
        /vep/SV.germline.normal.manta_germline.vcf.gz
        /vcf/SNV.somatic.sample_tumor_normal.TNscope_umi.vcf.gz
        /vcf/SNV.somatic.sample_tumor_normal.vardict.vcf.gz
        /vcf/SNV.somatic.sample_tumor_normal.tnhaplotyper.vcf.gz
        /vcf/SV.somatic.sample_tumor_normal.svdb.vcf.gz
        /vep/SNV.somatic.sample_tumor_normal.TNscope_umi.all.filtered.pass.vcf.gz
        /vep/SNV.somatic.sample_tumor_normal.vardict.all.filtered.pass.vcf.gz
        /vep/SNV.somatic.sample_tumor_normal.tnhaplotyper.all.filtered.pass.vcf.gz
        /vep/SV.somatic.sample_tumor_normal.svdb.all.filtered.pass.vcf.gz
        /vep/SNV.somatic.sample_tumor_normal.vardict.balsamic_stat
        /vep/SNV.somatic.sample_tumor_normal.tnhaplotyper.balsamic_stat
        /vep/SNV.somatic.sample_tumor_normal.TNscope_umi.balsamic_stat
        /cnv/tumor.merged.cns
        /cnv/tumor.merged-diagram.pdf
        /cnv/tumor.merged-scatter.pdf
        /cnv/sample_tumor_normal.gene_metrics
        /vcf/CNV.somatic.sample_tumor_normal.cnvkit.vcf2cytosure.cgh
        /vep/SNV.somatic.sample_tumor_normal.vardict.all.filtered.pass.ranked.vcf.gz
        /qc/umi_qc/concatenated_normal_XXXXXX_R.umi.mean_family_depth
        /qc/umi_qc/concatenated_tumor_XXXXXX_R.umi.mean_family_depth
        /qc/umi_qc/sample_tumor_normal.TNscope_umi.AFtable.txt

@ivadym
Copy link
Contributor Author

ivadym commented Apr 29, 2022

Files and tags are parsed and generated as expected:

  • WGS_TN

Screen Shot 2022-04-28 at 20 43 23

  • TGA_TN

Screen Shot 2022-04-28 at 21 17 05

Copy link
Contributor

@ashwini06 ashwini06 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 🎇

@ashwini06
Copy link
Contributor

Files and tags are parsed and generated as expected:

  • WGS_TN
Screen Shot 2022-04-28 at 20 43 23
  • TGA_TN
Screen Shot 2022-04-28 at 21 17 05

@ivadym What does the warning message "Cannot deliver step (rule) ..." refers to?

@ivadym
Copy link
Contributor Author

ivadym commented Apr 29, 2022

@ivadym What does the warning message "Cannot deliver step (rule) ..." refers to?

It means that the rule it tries to deliver is not among the list of all the rules executed during the analysis.

If you notice we are going through all the rules that we are going to deliver. However, among all of them we can find some that are specific to the analysis type. Therefore, for TN_WGS, for example, you cannot find the rules associated with the UMIs.

for my_rule in set(rules_to_deliver):
try:
housekeeper_id = getattr(rules, my_rule).params.housekeeper_id
except (ValueError, AttributeError, RuleException, WorkflowError) as e:
LOG.warning("Cannot deliver step (rule) {}: {}".format(my_rule, e))
continue

Copy link
Collaborator

@khurrammaqbool khurrammaqbool left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 .
As we discussed, let's refactor to decouple delivery after this.

@sonarqubecloud
Copy link

sonarqubecloud bot commented May 2, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@ivadym ivadym merged commit 5120e17 into develop May 2, 2022
@ivadym ivadym deleted the update/delivered-files branch May 2, 2022 10:55
rannick added a commit that referenced this pull request Jun 7, 2022
* add option for analysis workflow to config case

* add analysis workflow to constants

* add analysis workflows to models

* add conditions to balsamic workflow

* modify multiqc condition for umi

* modify condition for umi files

* add default analysis workflow to qc_config

* modify test models

* add analysis workflow for pon to work fine

* change balsamic_umi to balsamic-umi

* black format pon config file

* update Changelog

* add balsamic-qc as analysis_workflow option, remove qc config

* add analysis workflow cli option to deliver

* formatting and adding analysis workflow as argument for get_snakefile

* adapt test configs

* adapt qc tests

* add analysis_workflow in deliver.py

* add analysis_workflow in status.py

* modify test get snakefile

* remove cli option analysis_type of deliver

* change run analysis to infer the snakefile workflow from the config file

* change the check for reference_genome as the error message was always triggered without consequences

* remove analysis-type in invoke.cli

* GENOME_VERSION needs to be set up for rules, so set to the different genome version instead of always hg19

* update changelog

* feat: add qc only workflow (#847)

* feat: add qc only workflow

* conftest qc config

* formatting

* fixing code smells and attemps to reduce duplication

* formatting

* remove qc_config container version

* formatting

* fix qc test

* fix qc test

* test fixing

* test fixing

* formatting

* draft pytest qc

* feat: add qc only workflow

* conftest qc config

* formatting

* fixing code smells and attemps to reduce duplication

* formatting

* remove qc_config container version

* formatting

* fix qc test

* fix qc test

* test fixing

* test fixing

* formatting

* draft pytest qc

* black linting + removing test_config_qc_graph_value_error

* add container version in qc

* add tests for QC graph generation and ValueError

* black

* upgrade black because of click update and remove unused config_dick in conftest

* update black in github action

* remove benchmark plot for qc

* address duplication

* changelog and conftest update

* Apply suggestions from code review

Co-authored-by: ashwini06 <ashwini06@users.noreply.github.com>

* balck upgrade in changelog

* remove variable germline_call_samples as only qc

* Update BALSAMIC/workflows/QC.smk

Remove chromlist

Co-authored-by: ashwini06 <ashwini06@users.noreply.github.com>

* Revert "changelog and conftest update"

This reverts commit fabddaf.

* remove umiworkflow and vcf from qc

* remove bedchrom from qc workflow

* update to CHANGELOG from black version

* add canfam3 to workflow qc test

* remove unused import

* remove wgs from qc workflow and modify command to qc_panel

* add qc_panel to click

* adapt conftest to qc_panel

* adapt  to qc_panel

* use qc_panel also in get_snakemake instead of only qc and models.py comments

* modidy test_utils to qc_panel

* qc metrics

* keep chr in refGene as ref fasta uses chr

* add more memory to picard for canfam3 because dogs have 38 chromosomes

* no analysis_specific_results in QC.sml

* stop removing chr with canfam3 references

Co-authored-by: ashwini06 <ashwini06@users.noreply.github.com>

* docs: add annotation resources (#916)

* add ascat to copynumber file

* update changelog

* add balsamic annotation resources to docs

* update changelog

* fix review suggestion

* modify description for CLNACC

* modify description for COSMIC_CNT

* modify description for COSMIC_CNT

* feat: update delly (#920)

* update delly to 0.9.1

* update changelog

* update changelog

* fix Dockerfile

* fix Dockerfile

* remove OMP

* add OMP

* remove OMP

* add OMP

* fix: Change of gnomad pop freq value for UMI workflow (#919)

* add new gnomad pop freq for umi workflow

* edit balsamic filters docs

* update changelog

* fix ident

* fix typo

* add pop_freq_umi to model attributes

* fix indentation

* add review suggestions

* add PR number to changelog

* feat: add Delly CNV for tumor only workflow (#923)

* update changelog

* add tumor only cnv analysis and fix messeges in rules

* fix messege text

* update BALSAMIC documentation

* feat: add delly CNV read-depth profile (#924)

* update changelog

* update changelog

* add copy-number profile to delly tumor only

* add copy-number profile to delly tumor only

* refactor: Remove gatk haplotypcaller  (#922)

* remove gatk haplotypecallers

* remove haplotypecaller and tnsnv from cluster json

* remove haplotypecaller from models

* remove haplotypecaller from workflow params

* remove haplotypcaller and tnsnv from analysis json

* remove unused callers from tests

* update changelog

* refactor: balsamic containers (#921)

* update align_qc base image

* update align_qc tool versions

* add tabix version

* remove csvkit from align_qc

* remove csvkit frm bioinfo_tool env

* update align _qc container tool versions in readthedocs

* add samtools versions to tests

* update changelog

* update base image in coverage_qc container

* update tool versions in cover_qc container

* update tool versions in bioinfo softwares docs

* update changelog

* update base image in container varcall_cnvkit

* update cnvkit version

* update purecn version and lock bcftools and tabix versions

* update docs and changelog

* update base image in varcall_py36 container

* update tools in varcall_py36

* update samtools version in docs

* update changelog

* update base image of annotate container

* update ensembl vep in annotate container

* update readthedocs for vep version

* update changelog

* fix typo in varcall_py27

* refactor: Update the list of files to be stored and delivered (#915)

* feat: bcftools counts QC validation (#925)

* Bump version: 8.2.10 → 9.0.0

* docs: balsamic fix styling (#926)

* fix balsamic docs styling

* fix identattion

* remove Admonitions

* fix typo

* fix: align qc container (#928)

* revert csvkit to align_qc container

* update changelog

* add PR no. to changelog

* Bump version: 9.0.0 → 9.0.1

* add option for analysis workflow to config case

* add analysis workflow to constants

* add analysis workflows to models

* add conditions to balsamic workflow

* modify multiqc condition for umi

* modify condition for umi files

* add default analysis workflow to qc_config

* modify test models

* add analysis workflow for pon to work fine

* docs: update balsamic method version in bumpversion (#930)

* edit version in balsamic methods

* update bumpeversion list for version changes

* change log  update

* change balsamic_umi to balsamic-umi

* black format pon config file

* Update CHANGELOG.rst (#931)

* update Changelog

* add balsamic-qc as analysis_workflow option, remove qc config

* add analysis workflow cli option to deliver

* formatting and adding analysis workflow as argument for get_snakefile

* adapt test configs

* adapt qc tests

* add analysis_workflow in deliver.py

* add analysis_workflow in status.py

* modify test get snakefile

* remove cli option analysis_type of deliver

* change run analysis to infer the snakefile workflow from the config file

* change the check for reference_genome as the error message was always triggered without consequences

* remove analysis-type in invoke.cli

* GENOME_VERSION needs to be set up for rules, so set to the different genome version instead of always hg19

* formatting

* formatting

* update CHANGELOG

* update CHANGELOG

* black formatting

* change GENOME_VERSION to config[reference][genome_version]

* add link to PR in CHANGELOG

* add quality metrics in QC workflow

* add error when canfam3 is used with BALSAMIC main workflow

* update input stats files for collect_custom_qc_metrics rule

* update changelog

Co-authored-by: ashwini06 <ashwini.jeggari@scilifelab.se>
Co-authored-by: ashwini06 <ashwini06@users.noreply.github.com>
Co-authored-by: Khurram Maqbool <khurram.maqbool@outlook.com>
Co-authored-by: ivadym <vadym.ivanchuk@scilifelab.se>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants