Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Test GA behaviour #1403

Closed
wants to merge 32 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions .github/workflows/pytest_and_coveralls.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,16 +14,16 @@ on:
jobs:
pytest_coveralls:
name: run PyTest
runs-on: ubuntu-22.04
runs-on: ubuntu-latest
steps:
# Checkout BALSAMIC
- name: Git checkout
id: git_checkout
uses: actions/checkout@v3
uses: actions/checkout@v4.1.1
# Conda env create
- name: setup conda
id: setup_conda
uses: conda-incubator/setup-miniconda@v2
uses: conda-incubator/setup-miniconda@v3.0.2
with:
activate-environment: balsamic
environment-file: BALSAMIC/conda/balsamic.yaml
Expand Down Expand Up @@ -55,7 +55,7 @@ jobs:
SENTIEON_INSTALL_DIR: dummy_install_dir
# Run Codecov
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v3
uses: codecov/codecov-action@v4.0.1
with:
token: ${{ secrets.CODECOV_TOKEN }}
file: ./coverage.xml
Expand Down
90 changes: 90 additions & 0 deletions BALSAMIC/assets/scripts/igh_dux4_detection.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
#!/bin/bash

# Check if at least 3 arguments are provided
if [ "$#" -lt 3 ]; then
echo "Usage: $0 <genome_version> <output_vcf> <tumor_bam> [normal_bam]"
exit 1
fi

# Assign variables
genome_version="$1"
output_vcf="$2"
tumor_bam="$3"
normal_bam="$4"
output_vcf_tmp="$output_vcf.tmp"

# Print given arguments
echo "genome_version: $genome_version"
echo "output vcf: $output_vcf"
echo "tumor bam: $tumor_bam"
echo "normal bam: $normal_bam"

# Set chr positions depending on the genome version
if [ "$genome_version" = "hg19" ]; then
igh_chr="14"
igh_pos="106032614"
dux4_chr="4"
dux4_pos="190988100"
elif [ "$genome_version" = "hg38" ]; then
igh_chr="14"
igh_pos="105586437"
dux4_chr="4"
dux4_pos="190173000"
else
echo "Invalid genome version. Accepted values: hg19, hg38. Given: $genome_version"
exit 1
fi


# Define functions
get_supporting_reads() {
# Get number of supporting reads for IGH::DUX4 rearrangement in a given BAM file
local bam="$1"
if [ "$genome_version" = "hg19" ]; then
local supporting_reads=$(samtools view -F 1024 -c \
-e '(rnext == "4" && pnext > 190988100 && pnext < 191007000) || (rnext == "10" && pnext > 135477000 && pnext < 135500000) || (rnext == "GL000228.1" && pnext > 70000 && pnext < 115000) || ([SA] =~ "10,1354[789][0-9]{4}") || ([SA] =~ "4,19(09[8-9][0-9]|100[0-7])[0-9]{3}" || [SA] =~ "GL000228.1,([7-9][0-9]{4}|1[0-1][0-5][0-9]{3})")' \
{input.bamT} 14:106032614-107288051 )
elif [ "$genome_version" = "hg38" ]; then
local supporting_reads=$(samtools view -F 1024 -c \
-e '(rnext == "4" && pnext > 190173000 && pnext < 190176000) || ([SA] =~ "4,19017[345][0-9]{3}")' \
{input.bamT} chr14:105586437-106879844 )
fi
echo $supporting_reads
}

# Set information for tumor and normal
supporting_reads_tumor=$(get_supporting_reads $tumor_bam)
samples_header="TUMOR"
samples_field="${supporting_reads_tumor}"
if [ -n "$normal_bam" ]; then
supporting_reads_normal=$(get_supporting_reads $normal_bam)
samples_header="NORMAL\tTUMOR"
samples_field="${supporting_reads_normal}\t${supporting_reads_tumor}"
fi


# If supporting reads are found in the tumor, set filter to PASS. Otherwise add: no_supporting_reads
if [ "$supporting_reads_tumor" -gt 0 ]; then
vcf_filter="PASS"
else
vcf_filter="no_supporting_reads"
fi

echo "supporting reads tumor: $supporting_reads_tumor"
echo "supporting reads normal: $supporting_reads_normal"
echo "vcf filter: $vcf_filter"

# Write vcf entry
{
echo '##fileformat=VCFv4.2'
echo '##ALT=<ID=BND,Description="Break end">'
echo '##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">'
echo '##INFO=<ID=IMPRECISE,Number=0,Type=Flag,Description="Imprecise structural variation">'
echo '##FORMAT=<ID=DV,Number=1,Type=Integer,Description="Number of paired-ends that support the event">'
echo -e "#CHROM\tPOS\tID\tREF\tALT\tQUAL\tFILTER\tINFO\tFORMAT\t${samples_header}"
echo -e "${igh_chr}\t${igh_pos}\tsamtools_igh_dux4\tN\tN[${dux4_chr}:${dux4_pos}[\t.\t${vcf_filter}\tSVTYPE=BND;IMPRECISE;\tDV\t${samples_field}"
} >> $output_vcf_tmp

bgzip $output_vcf_tmp > $output_vcf
tabix -p vcf $output_vcf
rm $output_vcf_tmp
4 changes: 4 additions & 0 deletions BALSAMIC/constants/cluster_analysis.json
Original file line number Diff line number Diff line change
Expand Up @@ -411,5 +411,9 @@
"samtools_qc": {
"time": "04:00:00",
"n": 16
},
"igh_dux4_detection": {
"time": "02:00:00",
"n": 1
}
}
7 changes: 7 additions & 0 deletions BALSAMIC/constants/workflow_params.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,13 @@
"sequencing_type": ["wgs"],
"workflow_solution": ["BALSAMIC"],
},
"igh_dux4": {
"mutation": "somatic",
"mutation_type": "SV",
"analysis_type": ["single", "paired"],
"sequencing_type": ["wgs"],
"workflow_solution": ["BALSAMIC"],
},
"svdb": {
"mutation": "somatic",
"mutation_type": "SV",
Expand Down
1 change: 1 addition & 0 deletions BALSAMIC/models/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,7 @@ class VCFModel(BaseModel):
dellycnv: VarcallerAttribute
tiddit: VarcallerAttribute
cnvpytor: VarcallerAttribute
igh_dux4: VarcallerAttribute
svdb: VarcallerAttribute


Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -402,6 +402,33 @@ else:
rm {input.ascat_cnv};
"""


rule igh_dux4_detection_tumor_normal:
input:
fa = config["reference"]["reference_genome"],
bamT = config_model.get_final_bam_name(bam_dir = bam_dir, sample_name = tumor_sample),
bamN = config_model.get_final_bam_name(bam_dir = bam_dir, sample_name=normal_sample),
output:
vcf = vcf_dir + "SV.somatic." + config["analysis"]["case_id"] + ".igh_dux4.vcf.gz",
benchmark:
benchmark_dir + 'igh_dux4_detection_tumor_normal_' + config["analysis"]["case_id"] + ".tsv"
singularity:
Path(singularity_image, config["bioinfo_tools"].get("samtools") + ".sif").as_posix()
params:
genome_version = config["reference"]["genome_version"],
custom_sv_detection_script = get_script_path("igh_dux4_detection.sh"),
case_name = config["analysis"]["case_id"],
threads:
get_threads(cluster_config, "igh_dux4_detection")
message:
"Detecting IGH::DUX4 rearrangement for {params.case_name} using samtools."
shell:
"""
bash {params.custom_sv_detection_script} {params.genome_version} {output.vcf} {input.bamT} {input.bamN}
"""



rule svdb_merge_tumor_normal:
input:
vcf = expand(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -288,6 +288,31 @@ tabix -p vcf -f {output.delly_sv};
tabix -p vcf -f {output.delly_cnv};
"""


rule igh_dux4_detection_tumor_only:
input:
fa = config["reference"]["reference_genome"],
bamT = config_model.get_final_bam_name(bam_dir = bam_dir, sample_name = tumor_sample)
output:
vcf = vcf_dir + "SV.somatic." + config["analysis"]["case_id"] + ".igh_dux4.vcf.gz",
benchmark:
benchmark_dir + 'igh_dux4_detection_tumor_only_' + config["analysis"]["case_id"] + ".tsv"
singularity:
Path(singularity_image, config["bioinfo_tools"].get("samtools") + ".sif").as_posix()
params:
genome_version = config["reference"]["genome_version"],
custom_sv_detection_script = get_script_path("igh_dux4_detection.sh"),
case_name = config["analysis"]["case_id"],
threads:
get_threads(cluster_config, "igh_dux4_detection")
message:
"Detecting IGH::DUX4 rearrangement for {params.case_name} using samtools."
shell:
"""
bash {params.custom_sv_detection_script} {params.genome_version} {output.vcf} {input.bamT}
"""


rule svdb_merge_tumor_only:
input:
vcf = expand(
Expand Down
8 changes: 8 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
[X.X.X]
-------

Added:
^^^^^^
* custom samtools script for the detection of IGH::DUX4 rearrangements https://github.com/Clinical-Genomics/BALSAMIC/pull/1397


[14.0.0]
-------

Expand Down
16 changes: 15 additions & 1 deletion docs/balsamic_sv_cnv.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,11 +44,21 @@ Depending on the sequencing type, BALSAMIC is currently running the following st
- tumor-only
- somatic
- CNV
* - igh_dux4 (see note below)
- WGS
- tumor-normal, tumor-only
- somatic
- SV

Further details about a specific caller can be found in the links for the repositories containing the documentation for SV and CNV callers along with the links for the articles are listed in `bioinfo softwares <https://balsamic.readthedocs.io/en/latest/bioinfo_softwares.html>`_.

Note that igh_dux4 is not a variant caller itself. This is a custom script that uses samtools to detect read pairs supporting IGH::DUX4 rearrangements. In short, the command identifies discordant reads mapping to the IGH region and to either DUX4 or its homologous DUX4-like regions (see references for details). The inclusion of this feature aims to alleviate the failure of callers to detect this rearrangement. It is important to note, however, that the reported breakpoints are fixed to the IGH and DUX4 coordinates and are, therefore, imprecise and uncertain. Therefore, we advise caution when interpreting this information.


It is mandatory to provide the gender of the sample from BALSAMIC version >= 10.0.0 For CNV analysis.



**Pre-merge Filtrations**
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand Down Expand Up @@ -82,6 +92,9 @@ Manta calls are filtered using bcftools to only keep variants that have evidence
* - Manta
- low_pr_sr_count
- SUM(FORMAT/PR[0:1]+FORMAT/SR[0:1]) < 4.0
* - igh_dux4
- samtools_igh_dux4
- DV < 1


Further information regarding the TIDDIT tumor normal filtration: As translocation variants are represented by 2 BNDs in the VCF which allows for mixed assignment of soft-filters, a requirement for assigning soft-filters to translocations is that neither BND is PASS.
Expand Down Expand Up @@ -117,12 +130,13 @@ Further information regarding the TIDDIT tumor normal filtration: As translocati
| 3. ascat
| 4. dellycnv
| 5. tiddit
| 6. igh_dux4
- | 1. manta
| 2. dellysv
| 3. dellycnv
| 4. tiddit
| 5. cnvpytor

| 6. igh_dux4


The merged `SNV.somatic.<CASE_ID>.svdb.vcf.gz` file retains all the information for the variants from the caller in which the variants are identified, which are then annotated using `ensembl-vep`.
Expand Down
2 changes: 2 additions & 0 deletions docs/resources.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,8 @@ Relevant publications
#. **Two case studies and a pipeline (unpublished)**: Noll, A. C., Miller, N. A., Smith, L. D., Yoo, B., Fiedler, S., Cooley, L. D., … Kingsmore, S. F. (2016). Clinical detection of deletion structural variants in whole-genome sequences. Npj Genomic Medicine, 1(1), 16026. https://doi.org/10.1038/npjgenmed.2016.26
#. **Review on driver gene methods**: Tokheim, C. J., Papadopoulos, N., Kinzler, K. W., Vogelstein, B., & Karchin, R. (2016). Evaluating the evaluation of cancer driver genes. Proceedings of the National Academy of Sciences, 113(50), 14330–14335. https://doi.org/10.1073/pnas.1616440113

#. **Detection of IGH::DUX4 rearrangement**: Rezayee, F., Eisfeldt, J., Skaftason, A., Öfverholm, I., Sayyab, S., Syvänen, A. C., … & Barbany, G. (2023). Feasibility to use whole-genome sequencing as a sole diagnostic method to detect genomic aberrations in pediatric B-cell acute lymphoblastic leukemia. Frontiers in Oncology, 13, 1217712. https://doi.org/10.3389/fonc.2023.1217712

*Resource, or general notable papers including resource and KB papers related to cancer genomics*

#. **GIAB**: Zook, J. M., Catoe, D., McDaniel, J., Vang, L., Spies, N., Sidow, A., … Salit, M. (2016). Extensive sequencing of seven human genomes to characterize benchmark reference materials. Scientific Data, 3, 160025. https://doi.org/10.1038/sdata.2016.25
Expand Down
14 changes: 14 additions & 0 deletions tests/test_data/config.json
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,20 @@
"BALSAMIC"
]
},
"igh_dux4": {
"mutation": "somatic",
"mutation_type": "SV",
"analysis_type": [
"single",
"paired"
],
"sequencing_type": [
"wgs"
],
"workflow_solution": [
"BALSAMIC"
]
},
"svdb": {
"mutation": "somatic",
"mutation_type": "SV",
Expand Down
Loading
Loading