Skip to content

Commit

Permalink
refactor: balsamic containers (#921)
Browse files Browse the repository at this point in the history
* update align_qc base image

* update align_qc tool versions

* add tabix version

* remove csvkit from align_qc

* remove csvkit frm bioinfo_tool env

* update align _qc container tool versions in readthedocs

* add samtools versions to tests

* update changelog

* update base image in coverage_qc container

* update tool versions in cover_qc container

* update tool versions in bioinfo softwares docs

* update changelog

* update base image in container varcall_cnvkit

* update cnvkit version

* update purecn version and lock bcftools and tabix versions

* update docs and changelog

* update base image in varcall_py36 container

* update tools in varcall_py36

* update samtools version in docs

* update changelog

* update base image of annotate container

* update ensembl vep in annotate container

* update readthedocs for vep version

* update changelog

* fix typo in varcall_py27
  • Loading branch information
ashwini06 authored and rannick committed May 23, 2022
1 parent 51be7a5 commit 523026d
Show file tree
Hide file tree
Showing 18 changed files with 58 additions and 68 deletions.
1 change: 0 additions & 1 deletion BALSAMIC/config/balsamic_env.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ align_qc:
- picard
- multiqc
- fastp
- csvkit
annotate:
- ensembl-vep
- vcfanno
Expand Down
1 change: 0 additions & 1 deletion BALSAMIC/constants/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,6 @@
"picard": "align_qc",
"multiqc": "align_qc",
"fastp": "align_qc",
"csvkit": "align_qc",
"ensembl-vep": "annotate",
"genmod": "annotate",
"vcfanno": "annotate",
Expand Down
4 changes: 2 additions & 2 deletions BALSAMIC/containers/align_qc/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
FROM continuumio/miniconda3:4.9.2-alpine
FROM continuumio/miniconda3:4.10.3-alpine

LABEL base_image="continuumio/miniconda3:4.9.2-alpine"
LABEL base.image="continuumio/miniconda3:4.10.3-alpine"
LABEL about.home="https://github.com/Clinical-Genomics/BALSAMIC"
LABEL about.documentation="https://balsamic.readthedocs.io/"
LABEL about.license="MIT License (MIT)"
Expand Down
15 changes: 6 additions & 9 deletions BALSAMIC/containers/align_qc/align_qc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,11 @@ channels:

dependencies:
- bioconda::bedtools=2.30.0
- bioconda::bwa=0.7.15
- bioconda::bwa=0.7.17
- bioconda::fastqc=0.11.9
- bioconda::samtools=1.12
- bioconda::samtools=1.15.1
- bioconda::tabix=0.2.6
- bioconda::picard=2.25.0
- bioconda::multiqc=1.11
- bioconda::fastp=0.20.1
- conda-forge::csvkit=1.0.4
- conda-forge::libiconv
- conda-forge::fontconfig
- conda-forge::r-base=4.1.1
- bioconda::picard=2.27.1
- bioconda::multiqc=1.12
- bioconda::fastp=0.23.2
- conda-forge::r-base=4.1.3
4 changes: 2 additions & 2 deletions BALSAMIC/containers/annotate/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
FROM continuumio/miniconda3:4.9.2-alpine
FROM continuumio/miniconda3:4.10.3-alpine

LABEL base_image="continuumio/miniconda3:4.9.2-alpine"
LABEL base.image="continuumio/miniconda3:4.10.3-alpine"
LABEL about.home="https://github.com/Clinical-Genomics/BALSAMIC"
LABEL about.documentation="https://balsamic.readthedocs.io/"
LABEL about.license="MIT License (MIT)"
Expand Down
6 changes: 2 additions & 4 deletions BALSAMIC/containers/annotate/annotate.yaml
Original file line number Diff line number Diff line change
@@ -1,13 +1,11 @@
channels:
- anaconda
- defaults
- conda-forge

dependencies:
- anaconda::python=3.7
- bioconda::ensembl-vep=100.2
- bioconda::bcftools=1.10
- conda-forge::libopenblas=0.3.20
- bioconda::ensembl-vep=104.3
- bioconda::bcftools=1.10
- bioconda::vcfanno=0.3.3
- anaconda::gxx_linux-64=7.3.0
- anaconda::pip=20.2.4
4 changes: 2 additions & 2 deletions BALSAMIC/containers/coverage_qc/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
FROM continuumio/miniconda3:4.9.2-alpine
FROM continuumio/miniconda3:4.10.3-alpine

LABEL base_image="continuumio/miniconda3:4.9.2-alpine"
LABEL base.image="continuumio/miniconda3:4.10.3-alpine"
LABEL about.home="https://github.com/Clinical-Genomics/BALSAMIC"
LABEL about.documentation="https://balsamic.readthedocs.io/"
LABEL about.license="MIT License (MIT)"
Expand Down
4 changes: 2 additions & 2 deletions BALSAMIC/containers/coverage_qc/coverage_qc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,5 @@ channels:
- conda-forge

dependencies:
- bioconda::sambamba=0.6.6
- bioconda::mosdepth=0.2.9
- bioconda::sambamba=0.8.2
- bioconda::mosdepth=0.3.3
4 changes: 2 additions & 2 deletions BALSAMIC/containers/varcall_cnvkit/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
FROM continuumio/miniconda3:4.9.2-alpine
FROM continuumio/miniconda3:4.10.3-alpine

LABEL base_image="continuumio/miniconda3:4.9.2-alpine"
LABEL base.image="continuumio/miniconda3:4.10.3-alpine"
LABEL about.home="https://github.com/Clinical-Genomics/BALSAMIC"
LABEL about.documentation="https://balsamic.readthedocs.io/"
LABEL about.license="MIT License (MIT)"
Expand Down
2 changes: 1 addition & 1 deletion BALSAMIC/containers/varcall_cnvkit/varcall_cnvkit.sh
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
conda env update -n base --file ${1}.yaml --prune
pip install --no-cache-dir cnvkit==0.9.4 biopython==1.76
pip install --no-cache-dir cnvkit==0.9.9 biopython==1.79
6 changes: 3 additions & 3 deletions BALSAMIC/containers/varcall_cnvkit/varcall_cnvkit.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,6 @@ dependencies:
- bioconda::bioconductor-genomicranges=1.46.0
- bioconda::bioconductor-dnacopy=1.68.0
- bioconda::bioconductor-variantannotation=1.40.0
- bioconda::bioconductor-purecn=2.0.1
- bioconda::bcftools>=1.13
- bioconda::tabix>=0.2.6
- bioconda::bioconductor-purecn=2.0.2
- bioconda::bcftools=1.13
- bioconda::tabix=0.2.6
2 changes: 1 addition & 1 deletion BALSAMIC/containers/varcall_py27/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
FROM continuumio/miniconda:4.7.12

LABEL base_image="continuumio/miniconda3:4.9.2-alpine"
LABEL base.image="continuumio/miniconda:4.7.12"
LABEL about.home="https://github.com/Clinical-Genomics/BALSAMIC"
LABEL about.documentation="https://balsamic.readthedocs.io/"
LABEL about.license="MIT License (MIT)"
Expand Down
4 changes: 2 additions & 2 deletions BALSAMIC/containers/varcall_py36/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
FROM continuumio/miniconda3:4.9.2-alpine
FROM continuumio/miniconda3:4.10.3-alpine

LABEL base_image="continuumio/miniconda3:4.9.2-alpine"
LABEL base.image="continuumio/miniconda3:4.10.3-alpine"
LABEL about.home="https://github.com/Clinical-Genomics/BALSAMIC"
LABEL about.documentation="https://balsamic.readthedocs.io/"
LABEL about.license="MIT License (MIT)"
Expand Down
6 changes: 3 additions & 3 deletions BALSAMIC/containers/varcall_py36/varcall_py36.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ dependencies:
- bioconda::tabix=0.2.6
- bioconda::samtools=1.11
- bioconda::gatk=3.8
- bioconda::vardict=2019.06.04=pl526_0
- bioconda::vardict-java=1.7
- bioconda::vardict=2019.06.04
- bioconda::vardict-java=1.8.3
- bioconda::svdb=2.6.0
- conda-forge::libiconv
- conda-forge::r-base=3.6.3
- conda-forge::r-base=4.1.1
3 changes: 3 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@ Changed:
* Upgrade black to 22.3.0
* For UMI workflow, post filter `gnomad_pop_freq` value is changed from `0.005` to `0.02` #919
* updated delly to 0.9.1 #920
* container base_image (align_qc, annotate, coverage_qc, varcall_cnvkit, varcall_py36) to 4.10.3-alpine #921
* update container (align_qc, annotate, coverage_qc, varcall_cnvkit,varcall_py36) bioinfo tool versions #921
* update tool versions (align_qc, annotate, coverage_qc, varcall_cnvkit) in methods and softwares docs #921

Fixed:
^^^^^^
Expand Down
30 changes: 15 additions & 15 deletions docs/balsamic_methods.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,39 +7,39 @@ Target Genome Analysis

BALSAMIC :superscript:`1` (**version** = 8.2.10) was used to analyze the data from raw FASTQ files.
We first quality controlled FASTQ files using FastQC v0.11.9 :superscript:`2`.
Adapter sequences and low-quality bases were trimmed using fastp v0.20.1 :superscript:`3`.
Trimmed reads were mapped to the reference genome hg19 using BWA MEM v0.7.15 :superscript:`4`.
The resulted SAM files were converted to BAM files and sorted using samtools v1.12 :superscript:`5`.
Duplicated reads were marked using Picard tools MarkDuplicate v2.25.0 :superscript:`6`
Adapter sequences and low-quality bases were trimmed using fastp v0.23.2 :superscript:`3`.
Trimmed reads were mapped to the reference genome hg19 using BWA MEM v0.7.17 :superscript:`4`.
The resulted SAM files were converted to BAM files and sorted using samtools v1.15.1 :superscript:`5`.
Duplicated reads were marked using Picard tools MarkDuplicate v2.27.1 :superscript:`6`
and promptly quality controlled using CollectHsMetrics, CollectInsertSizeMetrics and CollectAlignmentSummaryMetrics functionalities.
Results of the quality controlled steps were summarized by MultiQC v1.11 :superscript:`7`.
Results of the quality controlled steps were summarized by MultiQC v1.12 :superscript:`7`.
Small somatic mutations (SNVs and INDELs) were called for each sample using VarDict v2019.06.04 :superscript:`8`.
Apart from the Vardict filters to report the variants, the called-variants were also further second filtered using the criteria
(*MQ >= 40, DP >= 100, VD >= 5, Minimum AF >= 0.007, Maximum AF < 1, GNOMADAF_popmax <= 0.005*).
Only those variants that fulfilled the filtering criteria and scored as `PASS` in the VCF file were reported.
Structural variants were called using Manta v1.6.0 :superscript:`9` and Delly v0.9.1 :superscript:`10`.
Copy number aberrations were called using CNVkit v0.9.4 :superscript:`11`.
Copy number aberrations were called using CNVkit v0.9.9 :superscript:`11`.
The variant calls from CNVkit, Manta and Delly were merged using SVDB v2.6.0 :superscript:`12`.
All variants were annotated using Ensembl VEP v100.2 :superscript:`13`. We used vcfanno v0.3.3 :superscript:`14`
All variants were annotated using Ensembl VEP v104.3 :superscript:`13`. We used vcfanno v0.3.3 :superscript:`14`
to annotate somatic variants for their population allele frequency from gnomAD v2.1.1 :superscript:`18`.

Whole Genome Analysis
~~~~~~~~~~~~~~~~~~~~~
BALSAMIC :superscript:`1` (**version** = 8.2.10) was used to analyze the data from raw FASTQ files.
We first quality controlled FASTQ files using FastQC v0.11.9 :superscript:`2`.
Adapter sequences and low-quality bases were trimmed using fastp v0.20.1 :superscript:`3`.
Adapter sequences and low-quality bases were trimmed using fastp v0.23.2 :superscript:`3`.
Trimmed reads were mapped to the reference genome hg19 using sentieon-tools :superscript:`15`.
The resulted SAM files were converted to BAM files and sorted using samtools v1.12 :superscript:`5`.
Duplicated reads were marked using Picard tools MarkDuplicate v2.25.0 :superscript:`6`
The resulted SAM files were converted to BAM files and sorted using samtools v1.15.1 :superscript:`5`.
Duplicated reads were marked using Picard tools MarkDuplicate v2.27.1 :superscript:`6`
and promptly quality controlled using CollectMultipleMetrics and CollectWgsMetrics functionalities.
Results of the quality controlled steps were summarized by MultiQC v1.11 :superscript:`7`.
Results of the quality controlled steps were summarized by MultiQC v1.12 :superscript:`7`.
Small somatic mutations (SNVs and INDELs) were called for each sample using Sentieon TNscope and TNhaplotyper :superscript:`16`.
The called-variants were also further second filtered using the criteria (DP(tumor,normal) >= 10; AD(tumor) >= 3; AF(tumor) >= 0.05, Maximum AF(tumor < 1; GNOMADAF_popmax <= 0.001; normalized base quality scores >= 20, read_counts of alt,ref alle > 0).
The filtered variants from TNscope and TNhaplotyper were merged using bcftools isec functionality to reduce the number of variants for tumor-only samples.
Structural variants were called using Manta v1.6.0 :superscript:`9` and Delly v0.9.1 :superscript:`10`.
Copy number aberrations were called using ascatNgs v4.5.0 :superscript:`17` for tumor-normal samples.
The structural variant calls from Manta, Delly and ascatNgs were merged using SVDB v2.6.0 :superscript:`12`
All variants were finally annotated using Ensembl VEP v100.2 :superscript:`13`. We used vcfanno v0.3.3 :superscript:`14`
All variants were finally annotated using Ensembl VEP v104.3 :superscript:`13`. We used vcfanno v0.3.3 :superscript:`14`
to annotate somatic variants for their population allele frequency from gnomAD v2.1.1 :superscript:`18`.

=============================
Expand All @@ -48,16 +48,16 @@ UMI Data Analysis

BALSAMIC :superscript:`1` (**version** = 8.2.10) was used to analyze the data from raw FASTQ files.
We first quality controlled FASTQ files using FastQC v0.11.9 :superscript:`2`.
Adapter sequences and low-quality bases were trimmed using fastp v0.20.1 :superscript:`3`.
Adapter sequences and low-quality bases were trimmed using fastp v0.23.2 :superscript:`3`.
UMI tag extraction and consensus generation were performed using Sentieon tools v202010.02 :superscript:`15`.
The alignment of UMI extracted and consensus called reads to the human reference genome (hg19) was done by bwa-mem and
samtools using Sentieon utils. Consensus reads were filtered based on the number of minimum reads supporting each UMI tag group.
We applied a criteria filter of minimum reads `3,1,1`. It means that at least three UMI tag groups should be ideally considered from both DNA strands,
where a minimum of at least one UMI tag group should exist in each single-stranded consensus read.
The filtered consensus reads were quality controlled using Picard CollectHsMetrics v2.25.0 :superscript:`5`. Results of the quality controlled steps were summarized by MultiQC v1.11 :superscript:`6`.
The filtered consensus reads were quality controlled using Picard CollectHsMetrics v2.27.1 :superscript:`5`. Results of the quality controlled steps were summarized by MultiQC v1.12 :superscript:`6`.
For each sample, somatic mutations were called using Sentieon TNscope :superscript:`16`, with non-default parameters for passing the final list of variants
(--min_tumor_allele_frac 0.0005, --filter_t_alt_frac 0.0005, --min_init_tumor_lod 0.5, min_tumor_lod 4, --max_error_per_read 5 --pcr_indel_model NONE, GNOMADAF_popmax <= 0.001).
All variants were finally annotated using Ensembl VEP v100.2 :superscript:`7`. We used vcfanno v0.3.3 :superscript:`8` to annotate somatic variants for their population allele frequency from gnomAD v2.1.1 :superscript:`18`.
All variants were finally annotated using Ensembl VEP v104.3 :superscript:`7`. We used vcfanno v0.3.3 :superscript:`8` to annotate somatic variants for their population allele frequency from gnomAD v2.1.1 :superscript:`18`.
For exact parameters used for each software, please refer to https://github.com/Clinical-Genomics/BALSAMIC.
We used three commercially available products from SeraCare [Material numbers: 0710-067110 :superscript:`19`, 0710-067211 :superscript:`20`, 0710-067312 :superscript:`21`] for validating the efficiency of the UMI workflow in identifying 14 mutation sites at known allelic frequencies.

Expand Down
28 changes: 11 additions & 17 deletions docs/bioinfo_softwares.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ bcftools
~~~~~~~~
:Source code: `GitHub` `<https://github.com/samtools/bcftools>`_
:Article: `Bioinformatics` `<https://pubmed.ncbi.nlm.nih.gov/21903627/>`_
:Version: `>1.9`
:Version: `>=1.10`

bedtools
~~~~~~~~
Expand All @@ -28,19 +28,13 @@ bwa
~~~
:Source code: `GitHub` `<https://github.com/lh3/bwa>`_
:Article: `Bioinformatics` `<https://arxiv.org/abs/1303.3997>`_
:Version: `0.7.15`
:Version: `0.7.17`

cnvkit
~~~~~~
:Source code: `GitHub` `<https://github.com/etal/cnvkit>`_
:Article: `PLOS Computational Biology` `<http://dx.doi.org/10.1371/journal.pcbi.1004873>`_
:Version: `0.9.4`

csvkit
~~~~~~
:Source code: `GitHub` `<https://github.com/wireservice/csvkit>`_
:Article: `-`
:Version: `1.0.4`
:Version: `0.9.9`

delly
~~~~~~~
Expand All @@ -52,13 +46,13 @@ ensembl-vep
~~~~~~~~~~~
:Source code: `GitHub` `<https://github.com/Ensembl/ensembl-vep>`_
:Article: `Genome Biology` `<https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-0974-4>`_
:Version: `100.2`
:Version: `104.3`

fastp
~~~~~
:Source code: `GitHub` `<https://github.com/OpenGene/fastp>`_
:Article: `Bioinformatics` `<https://doi.org/10.1093/bioinformatics/bty560>`_
:Version: `0.20.1`
:Version: `0.23.2`

fastqc
~~~~~~
Expand All @@ -82,31 +76,31 @@ multiqc
~~~~~~~
:Source code: `GitHub` `<https://github.com/ewels/MultiQC>`_
:Article: `Bioinformatics` `<http://dx.doi.org/10.1093/bioinformatics/btw354>`_
:Version: `1.11`
:Version: `1.12`

mosdepth
~~~~~~~~
:Source code: `GitHub` `<https://github.com/brentp/mosdepth>`_
:Article: `Bioinformatics` `<https://academic.oup.com/bioinformatics/article/34/5/867/4583630?login=true>`_
:Version: `0.2.9`
:Version: `0.3.3`

picard
~~~~~~
:Source code: `GitHub` `<https://github.com/broadinstitute/picard>`_
:Article: `-`
:Version: `2.25.0`
:Version: `2.27.1`

sambamba
~~~~~~~~
:Source code: `GitHub` `<https://github.com/biod/sambamba>`_
:Article: `Bioinformatics` `<https://pubmed.ncbi.nlm.nih.gov/25697820/>`_
:Version: `0.6.6`
:Version: `0.8.2`

samtools
~~~~~~~~
:Source code: `GitHub` `<https://github.com/samtools/samtools>`_
:Article: `Bioinformatics` `<https://pubmed.ncbi.nlm.nih.gov/19505943/>`_
:Version: `1.12`
:Version: `>1.11`

sentieon-tools
~~~~~~~~~~~~~~
Expand All @@ -124,7 +118,7 @@ tabix
~~~~~
:Source code: `GitHub` `<https://github.com/samtools/tabix>`_
:Article: `Bioinformatics` `<https://academic.oup.com/bioinformatics/article/27/5/718/262743>`_
:Version: `0.2.6`
:Version: `1.11`

vardict
~~~~~~~
Expand Down
2 changes: 1 addition & 1 deletion tests/utils/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -187,7 +187,7 @@ def test_get_bioinfo_tools_version():

# THEN assert it is a dictionary and versions are correct
assert isinstance(bioinfo_tools_dict, dict)
assert set(observed_versions).issubset(set(["1.12", "1.11", "1.9"]))
assert set(observed_versions).issubset(set(["1.15.1", "1.12", "1.11", "1.9"]))


def test_get_delivery_id():
Expand Down

0 comments on commit 523026d

Please sign in to comment.