- MSIsensor-pro container #1444
- MSI analysis to the tumor-normal workflow #1454
- Sentieon install directory path to case config arguments #1461
- QC threshold for lymphoma_MRD panel #1479
- MSI tumor-normal analysis to housekeeper storage #1483
- UMI extraction and deduplication to TGA workflow #1358
- GENS input files for TGA #1448
- Padding of bed-regions for CNVkit to minimum 100 bases #1469
- Added min mapq 20 to CNVkit PON workflow #1465
- CNVkit PONs for Exome comprehensive 10.2, GMSsolid 15.2, GMCKsolid 4.2 #1465
- Merged VarDict with TNscope in all TGA workflows #1475
- New filter for VarDict for tumor in normal contamination #1475
- Export TMP environment variables to rules that lack them #1475
- Added genmod ranked VCFs to be delivered #1475
- Added family-id to genmod in order to get ranked variants to Scout #1475
- Added Raw TNscope calls and unfiltered research-annotated SNVs to delivery #1475
- Argument for SNV Artefact LoqusDB to all workflows #1481
- TNscope tag to variant info-field for TGA workflow #1497
- Cluster scheduler script for immediate submit #1372
- SLEEP_BEFORE_START to 600s #1372
- Updated Multiqc to version 1.22.3 #1441
- Upgrade vcf2cytosure version to 0.9.1 and remove hardcoded versions #1456
- Create new PONs for GMCKSolid v4.1, GMSMyeloid v5.3, and GMSlymphoid v7.3 #1465
- Refactored CNVkit rules #1465
- Refactored BCFtools filter rules #1475
- Renamed final UMI bamfile to ensure hsmetrics is picked up by multiqc #1475
- Changed ranking model VCF from research to clinical #1475
- Lowered minimum AF for TGA from 0.007 to 0.005 #1475
- Lowered maximal SOR for TNscope in TGA tumor only cases from 3 to 2.7 #1475
- Fixed TNscope research VCF filters to either PASS or triallelic site #1475
- Increased maximal amount of redirects for lychee test following links in docs to 10 #1488
- Updated readthedocs tools versions #1489
- Renamed UMI consensusfiltered bamfile to be picked up by multiqc #1490
- GATK3 #1432
- gatk_contest rule #1432
- SGE (qsub) support #1372
- Fastq quality and UMI trimming command-line options #1358
- ML model for TNscope #1475
- All code associated with TNhaplotyper #1475
- Removed research.filtered.pass files from delivery #1475
- Removed VarDict germline filter, replaced by relative normal af / tumor af filter #1497
- Corrected tool name in deduplication metrics #1441
- MSI table #1459
- Pin numpy version in CNVkit container #1457
- CNVkit incorrect version in the documentation #1457
- MSIsensor-pro container and updated msisensor to version 1.3.0 #1486
- Somalier container and updated somalier to version 0.2.19 #1487
- Vardict memory and tmpdir allocation #1492
- Vardict tumor only allocates dynamic number of cores #1495
- CLI option for the minimum raw reads supporting each UMI group filter
- high_normal_tumor_af_frac filter in bcftools for TNscope T+N filtering out more than 30% TINC #1289
- New option for exome samples --exome with modified bcftools filters compared to standard targeted workflow #1414
- Custom samtools script for the detection of IGH::DUX4 rearrangements #1397
- Reduced stringency of minimum MQ for all TGA to 30 from 40 #1414
- Removed -u flag from VarDict T+N and T only rules to remove calling only in reverse reads of overlapping mates #1414
- Removed -U flag to VarDict T+N rule to start calling SVs #1414
- alt_allele_in_normal filter from TNscope T+N workflows #1289
- initial filter keeping only PASS or triallelic-site from T+N bcftools quality filter rule has been removed #1424
- PureCN fail due to bash strict mode #1406
- Corrected name of CNVkit container in the CNVkit PON creation workflow #1412
- bcftools filters for PR:SR evidence in Manta calls #1371
- --exome argument to Manta runs in TGA cases #1371
- MultiQC intermediate files to deliverables #1388
- Extra bcftools filters that allows MaxDepth filtered variants in the final SV VCF #1371
- Unused arguments from delivery.py #1388
- ASCAT-Ngs container #1395
- bcftools in manta_tumor_normal uses correct column for tumor read filtering #1400
- Sleep rule before start to fix key_error #1311
- Missing __init__.py in snakemake_rules folders #1383
- Fastq concatenation #1069
- CADD SNV references #1126
- CADD SNV annotation #1150
- Samtools stats, flagstat, idxstat to WGS workflow #1176
- Functionality for dynamically assigning fastq-info to sample dict in config from input fastq-dir #1176
- Annotate SNVs with cancer germline SNV observations from Loqusdb #1178
- Annotate SNVs with somatic SNV observations from Loqusdb #1187
- Tests for Annotation with Cancer germline, somatic and clinical observations, and swegen frequencies https://github/Clinical-Genomics/BALSAMIC/pull/1190
- Annotate SVs with somatic SV observations from Loqusdb #1194
- Support singularity bind paths with different destination directories https://github/Clinical-Genomics/BALSAMIC/pull/1211
- Added --rerun-trigger mtime option to Snakemake command #1217
- CADD container #1222
- Container ettiquette to ReadtheDocs #1232
- htslib (samtools, bcftools tabix) container #1234
- Release version support for cache generation #1231
- CADD scores for INDELs #1238
- CADD reference to tests https://githuc.com/Clinical-Genomics/BALSAMIC/pull/1241
- Add cache version option to config case #1244
- cnvkit container #1252
- PureCN container #1255
- GATK container #1266
- Resolved FASTQ paths to sample dictionary (balsamic logging) #1275
- Picard HsMetrics and CollectGcBiasMetrics for WGS #1288
- LOH to TGA workflow #1278
- CNVs from PureCN to TGA workflow #1278
- Command-line arguments and rules for creation of GENS files #1279
- Somatic and germline Loqusdb annotation to ReadtheDocs #1317
- Postprocess step before VarDict in TGA #1332
- CNV report for TGA workflow #1339
- wkhtmltopdf to system requirements #1339
- Store WGS CNV report plots #1347
- Changed CN header field in cnvpytor in cnvpytor_tumor_only to be Float instead of Integer #1182
- Changed samples in case_config.json from being a dict to a list of dicts #1176
- Updated snakemake version to 7.25.0 #1099
- Updated cryptography version to 41.0.1 #1173
- Refactor bam and fastq inputs in snakemake to call pydantic model functions #1176
- Standardised alignment workflows to WGS-workflow #1176
- Implemented parallel trimming and alignment in all workflows per lane #1176
- All bam-QC tools take the final dedup.realign bamfile as input #1176
- Validation of pydantic models done both during config and run #1176
- Refactored fastp rules, and changed order of UMI-trimming and quality trimming #1176
- Fix pydantic version (<2.0) #1191
- Refactor constants #1174
- Move models to their own folder #1176
- Balsamic init workflow refactoring #1188
- Updated cryptography version to 41.0.2 #1205
- Refactor snakemake executable command generation https://github/Clinical-Genomics/BALSAMIC/pull/1211
- Updated Python version to 3.11 and its dependencies #1216
- Tools versions in doc https:/github.com/Clinical-Genomics/BALSAMIC/pull/1239
- Reuse common Balsamic CLI options #1242
- Update reference.json file to use relative paths #1251
- Update pydantic to v2 while maintaining support for v1 models #1253
- PCT_PF_READS_IMPROPER_PAIRS QC threshold lowered to 5% #1265
- Migrate Metrics models to pydantic v2 #1270
- Migrate Snakemake models to pydantic v2 #1268
- Migrate Cache models to pydantic v2 #1274
- Made BALSAMIC compatible with multiple PON creation workflows #1279
- Use StrEnum from python enum #1303
- Renamed final cram bamfile to format <tumor/normal>.<LIMS_ID>.cram #1307
- Updated snakemake version to 7.32.4 #1308
- Migrate analysis models to pydantic v2 #1306
- Split analysis model into config and params models #1306
- Renamed name in sample column of final clincial vcfs #1310
- Update Gens HK tags #1319
- Increased memory and threads for VarDict #1332
- Updated ReadtheDocs with GENS and structural pipeline changes #1327
- Migrate WGS CNV report generation to pypdf & pdfkit #1346
- vcf2cytosure container #1159
- Link external fastqs to case folder & create case directory #1195
- vcf2cytosure container missing constants #1198
- Bash commands in vep_somatic_clinical_snv #1200
- Fix SVDB annotation intermediate rule #1218
- Broken documentation links #1226
- Updated contributors in main README #1237
- CNVpytor container #1246
- Restored balsamic container in UMI concatenation rule #1261
- CNVpytor container, fixing numpy version #1273
- QC workflow store #1295
- MultiQC rule missing input files #1321
- gens_preprocessing rule missing python directive #1322
- CADD annotations container path and code smells #1323
- Sonarcloud reported issues #1348
- Loqusdb SV annotation somatic fields #1354
- Config folder #1175
- Quality trimming of fastqs for UMI workflow #1176
- Balsamic container #1230
- Plugin CLI #1245
- Realignment step for TGA workflow #1272
- Archived/outdated workflows and scripts #1296
- Sed command to convert CNVpytor integer to float, deprecated by updated CNVpytor version #1310
- Removed max AF 1 filter from bcftools #1338
- Extra samtools sort command from WGS cases #1334
- Missing Number in VCF header for SVs #1203
- Fix cyvcf2 to version 0.30.22 #1206
- Fix pydantic version (<2.0) #1206
- Update varcall-cnvkit container versions #1207
- WGS QC criteria for PCT_PF_READS_IMPROPER_PAIRS (condition: <= 0.1) #1164
- Logged version of Delly (changing it to v1.0.3) #1170
- PIP specific missing tools to config #1096
- Filtering script to remove normal variants from TIDDIT #1120
- Store TMB files in HK #1144
- Fixed all conda container dependencies #1096
- Changed --max_sv_size in VEP params to the size of chr1 for hg19 #1124
- Increased time-limit for sambamba_exon_depth and picard_markduplicates to 6 hours #1143
- Update cosmicdb to v97 #1147
- Updated read the docs with the changes relevant to mention #1153
- Update cryptography version (39.0.1) due to security alert #1087
- Bump cryptography to v40.0.2 and gsutil to v5.23 #1154
- Pytest file saved in balsamic directory #1093
- Fix varcall_py3 container bcftools dependency error #1097
- AscatNgs container #1155
- Number of variants are increased with triallelic_site #1089
- Added somalier integration and relatedness check: #1017
- Cluster resources for CNVPytor tumor only #1083
- triallelic_site in quality filter for SNV #1052
- Compression of SNV, research and clinical, VCF files #1060
- test_write_json failing locally #1063
- Container build and push via github actions by setting buildx provenance flag to false #1071
- Added buildx to the submodule workflow #1072
- Change user in somalier container to defaultuser #1080
- Reference files for hg38 #1081
- Code owners #1050
- MaxDepth in quality filter for SV #1051
- Incorrect raw TNscope VCF delivered #1042
- Use of PON reference, if exists for CNVkit tumor-normal analysis #982
- Added PON version to CLI and config.json #983
- cnvpytor to varcallpy3 container #991
- cnvpytor for tumor only workflow #994
- R packages to cnvkit container #996
- Missing R packages to cnvkit container #997
- add rlang to cnvkit container #998
- AnnotSV and bedtools to annotate container #1005
- cosmicdb to TNscope for tumor only and tumor normal workflows #1006
- loqusDB dump files to the config through the balsamic config case CLI #992
- Pre-annotation quality filters for SNVs annd added research to output files #1007
- Annotation of snv_clinical_observations for somatic snv #1012
- Annotation of sv_clinical_observations for somatic sv and SV CNV filter rules #1013
- Swegen SNV and SV frequency database for WGS #1014
- triallelic_sites and variants with MaxDepth to the VCFs #1021
- Clinical VCF for TGA workflow #1024
- CNVpytor plots into the CNV PDF report #1023
- Research and clinical housekeeper tags #1023
- Cluster configuration for rules #1028
- Variant filteration using loqusDB and Swegen annotations #1029
- Annotation resources to readsthedocs #1031
- Delly CNV rules for TGA workflow #103
- cnvpytor container and removed cnvpytor from varcallpy3 #1037
- Added version number to the PON reference filename (.cnn) #982
- Update TIDDIT to v3.3.0, SVDB to v2.6.4, delly to v1.1.3, vcf2cytosure to v0.8 #987
- toml config file for vcfanno #1012
- Split vep_germline rule into tumor and normal #1018
- Extract number of variants from clinical files #1022
- Reverted pandas version (from 1.3.5 to 1.1.5) #1018
- Mate in realigned bam file #1019
- samtools command in merge bam and names in toml for vcfanno #1020
- If statement in vep_somatic_clinical_snv rule #1022
- Invalid flag second of pair validation error #1025
- Invalid flag second of pair validation error using picardtools #1027
- Samtools command for mergetype tumor #1030
- varcall_py3 container building #1036
- Picard and fastp commands params and cluster config for umi workflow #1032
- Set channels in varcall_py3 container #1035
- Delly command for tumor-normal analysis #1039
- tabix command in bcftools_quality_filter_TNscope_umi_tumor_only rule #1040
- case ID from the PON .cnn output file #983
- TNhaplotyper for paired WGS analysis #988
- TNhaplotyper for tumor only WGS analysis #1006
- TNhaplotyper for TGS #1022
- Update vcf2cytosure version to v0.8 #1010
- Update GitHub action images to ubuntu-20.04 #1010
- Update GitHub actions to their latest versions #1010
- Increase sambamba_exon_depth rule run time #1001
- Input VCF files for cnvkit rules, cnvkit command and container #995
- TIDDIT delivery rule names (undo rule name changes made in Balsamic 10.0.1) #977
- BALSAMIC readthedocs CLI documentation generation #965
- Time allocation in cluster configuration for SV rules #973
- New option analysis-workflow to balsamic config case CLI #932
- New python script to edit INFO tags in vardict and tnscope_umi VCF files #948
- Added cyvcf2 and click tools to the varcallpy3 container #948
- Delly TIDDIT and vcf2cytosure for WGS #947
- Delly TIDDIT vcf2cytosure and method to process SVs and CNVs for WGS #947
- SV and CNV analysis and TIDDIT to balsamic ReadtheDocs #951
- Gender to config.json #955
- Provided gender as input for vcf2cyosure #955
- SV CNV doc to balsamic READTHEDOCS #960
- Germline normal SNV VCF file header renaming to be compatible with genotype uploads #882
- Add tabix and gzip to vcf2cytosure container #969
- UMI-workflow for panel cases to be run only with balsamic-umi flag #896
- Update codecov action version to @v2 #941
- QC-workflow for panel cases to be run only with balsamic-qc #942
- get_snakefile function takes the argument analysis_workflow to trigger the QC workflow when necessary #942
- bcftools_counts input depending on analysis_workflow #942
- UMI output filename TNscope_umi is changed to tnscope_umi #948
- Update delly to v1.0.3 #950
- Update versions of delly in ReadtheDocs #951
- Provided gender as input for ascat and cnvkit #955
- Update QC criteria for panel and wgs analysis according to https://github.com/Clinical-Genomics/project-planning/issues/338#issuecomment-1132643330. #952
- For uploads to scout, increasing the number of variants failing threshold from 10000 to 50000 #952
- GENOME_VERSION set to the different genome_version options and replaced with config["reference"]["genome_version"] #942
- run_validate.sh script #952
- Somatic SV tumor normal rules #959
- Missing genderChr flag for ascat_tumor_normal rule #963
- Command in vcf2cytosure rule and updated ReadtheDocs #966
- Missing name analysis_dir in QC.smk #970
- Remove sample_type wildcard from the vcfheader_rename_germline rule and change genotype file name #971
- Removed qc_panel config in favor of standard config #942
- Removed cli --analysis_type for balsamic report deliver command and balsamic run analysis #942
- Removed analysis_type: qc_panel and replace the trigger for QC workflow by analysis_workflow: balsamic-qc #942
- Outdated balsamic report files (balsamic_report.html & balsamic_report.md) #952
- Snakemake workflow to create canfam3 reference #843
- Call umi variants using TNscope in bed defined regions #821
- UMI duplication metrics to report in multiqc_picard_dups.json #844
- Option to use PON reference in cnv calling for TGA tumor-only cases #851
- QC default validation conditions (for not defined capture kits) #855
- SVdb to the varcall_py36 container #872
- SVdb to WGS workflow #873
- Docker container for vcf2cytosure #869
- Snakemake rule for creating .cgh files from CNVkit outputs #880
- SVdb to TGA workflow #879
- SVdb merge SV and CNV #886
- Readthedocs for BALSAMIC method descriptions #906
- Readthedocs for BALSAMIC variant filters for WGS somatic callers #906
- bcftools counts to varcall filter rules #899
- Additional WGS metrics to be stored in
<case>_metrics_deliverables.yaml
#907 - ascatNGS copynumber file #914
- ReadtheDocs for BALSAMIC annotation resources #916
- Delly CNV for tumor only workflow #923
- Delly CNV Read-depth profiles for tumor only workflows #924
- New metric to be extracted and validated:
NUMBER_OF_SITES
(bcftools
counts) #925
- Merge QC metric extraction workflows #833
- Changed the base-image for balsamic container to 4.10.3-alpine #869
- Updated SVdb to 2.6.0 #901
- Upgrade black to 22.3.0
- For UMI workflow, post filter gnomad_pop_freq value is changed from 0.005 to 0.02 #919
- updated delly to 0.9.1 #920
- container base_image (align_qc, annotate, coverage_qc, varcall_cnvkit, varcall_py36) to 4.10.3-alpine #921
- update container (align_qc, annotate, coverage_qc, varcall_cnvkit,varcall_py36) bioinfo tool versions #921
- update tool versions (align_qc, annotate, coverage_qc, varcall_cnvkit) in methods and softwares docs #921
- Updated the list of files to be stored and delivered #915
- Moved
collect_custom_qc_metrics
rule frommultiqc.rule
#925
- Automate balsamic version for readthedocs install page #888
collect_qc_metrics.py
failing for WGS cases with emptycapture_kit
argument #850- QC metric validation for different panel bed version #855
- Fixed development version of
fpdf2
to2.4.6
#878 - Added missing svdb index file #848
--qc-metrics/--no-qc-metrics
flag from thebalsamic report deliver
command #833- Unused pon option for SNV calling with TNhaplotyper tumor-only #851
- SV and CNV callers from annotation and filtering #889
- vcfanno and COSMIC from SV annotation #891
- Removed MSK_impact and MSK_impact_noStrelka json files from config #903
- Cleanup of strelka, pindel , mutect2 variables from BALSAMIC #903
- bcftools_stats from vep #898
- QC delivery report workflow (generating the
<case>_qc_report.html
file) #878 --sample-id-map
and--case-id-map
flags from thebalsamic report deliver
command #878- Removed gatk_haplotypecaller for reporting panel germline variants #918
- libopenblas=0.3.20 dependency to annotate container for fixing bcftools #909
- bcftools version locked at 1.10 #909
- base image of balsamic container to 4.10.3-alphine #909
- Replaced annotate container tests with new code #909
- Removed failed vcf2cytosure installation from annotate container #909
- Added slurm qos tag express #885
- Included more text about UMI-workflow variant calling settings to the readthedocs #888
- Extend QCModel to include n_base_limit which outputs in config json QC dict
- Automate balsamic version for readthedocs install page #888
- Upgrade black to 22.3.0
- fastp default setting of n_base_limit is changed to 50 from 5
- Added the readthedocs page for BALSAMIC variant-calling filters #867
- Project requirements (setup.py) to build the docs #874
- Generate cram from umi-consensus called bam files #865
- Updated the bioinfo tools version numbers in BALSAMIC readthedocs #867
- Sphinx version fixed to <0.18 #874
- Sphinx GitHub action triggers only on master branch PRs
- VAF filter for reporting somatic variants (Vardict) is minimised to 0.7% from 1% #876
- cyvcf2 mock import for READTHEDOCS environment #874
- Fixes fastqc timeout issues for wgs cases #861
- Fix cluster configuration for vep and vcfanno #857
- Set right qos in scheduler command #856
- balsamic.sif container installation during cache generation #841
- Execution of create_pdf python script inside the balsamic container #841
--hgvsg
annotation to VEP #830ascatNgs
PDF delivery (plots & statistics) #828
- Add default for gender if
purecn
captures dual gender values #824
- Updated
purecn
and its dependencies to latest versions
ascatNGS
tumor normal delivery #810
- QC metrics delivery tag #820
- Refactor tmb rule that contains redundant line #817
cnvkit
gender comparison operator bug #819
- Added various basic filters to all variant callers irregardless of their delivery status #750
- BALSAMIC container #728
- BALSAMIC reference generation via cluster submission for both reference and container #686
- Container specific tests #770
- BALSAMIC quality control metrics extraction and validation #754
- Delly is added as a submodule and removed from rest of the conda environments #787
- Store research VCFs for all filtered and annotated VCF files
- Added .,PASS to all structural variant filter rules to resolve the issues with missing calls in filtered file
- Handling of QC metrics validation errors #783
- Github Action workflow that builds the docs using Sphinx #809
- Zenodo integration to create citable link #813
- Panel BED specific QC conditions #800
- Metric extraction to a YAML file for Vogue #802
- refactored main workflow with more readible organization #614
- refactored conda envs within container to be on base and container definition is uncoupled #759
- renamed umi output file names to fix issue with picard HSmetrics #804
- locked requirements for graphviz io 0.16 #811
- QC metric validation is performed across all metrics of each of the samples #800
- The option of running umiworkflow independently with balsamic command-line option "-a umi"
- Removed source activate from reference and pon workflows #764
- Pip installation failure inside balsamic container #758
- Fixed issue #768 with missing
vep_install
command in container - Fixed issue #765 with correct input bam files for SV rules
- Continuation of CNVkit even if
PURECN
fails and fixPureCN
conda paths #774 #775 - Locked version for
cryptography
package - Bumped version for
bcftools
in cnvkit container - Fixed issues #776 and #777 with correct install paths for gatk and manta
- Fixed issue #782 for missing AF in the vcf INFO field
- Fixed issues #748 #749 with correct sample names
- Fixed issue #767 for ascatngs hardcoded values
- Fixed missing output option in bcftools filters for tnhaplotyper #793
- Fixed issue #795 with increasing resources for vep and filter SV prior to vep
- Building
wheel
forcryptography
bug inside BALSAMIC container #801 - Fixed badget for docker container master and develop status
- ReadtheDocs building failure due to dependencies, fixed by locking versions #773
- Dev requirements installation for Sphinx docs (Github Action) #812
- Changed path for main Dockerfile version in
.bumpversion.cfg
- Workflow to check PR tiltes to make easier to tell PR intents #724
bcftools stats
to calculate Ti/Tv for all post annotate germline and somatic calls #93- Added reference download date to
reference.json
#726 ascatngs
hg38 references to constants #683- Added ClinVar as a source to download and to be annotated with VCFAnno #737
- Updated docs for git FAQs #731
- Rename panel of normal filename Clinical-Genomics/cgp-cancer-cnvcall#10
- Fixed bug with using varcall_py36 container with VarDict #739
- Fixed a bug with VEP module in MultiQC by excluding #746
- Fixed a bug with
bcftools stats
results failing in MultiQC #744
- Fixed breaking shell command for VEP annotation rules #734
- Fixed context for Dockerfile for release content #720
samtools
flagstats and stats to workflow and MultiQCdelly v0.8.7
somatic SV caller #644delly
containter #644bcftools v1.12
todelly
container #644tabix v0.2.6
todelly
container #644- Passed SV calls from Manta to clinical delivery
- An extra filter to VarDict tumor-normal to remove variants with STATUS=Germline, all other will still be around
- Added
vcf2cytosure
to annotate container git
to the container definition- prepare_delly_exclusion rule
- Installation of
PureCN
rpackage incnvkit
container - Calculate tumor-purity and ploidy using
PureCN
forcnvkit
call ascatngs
as a submodule #672- GitHub action to build and test
ascatngs
container - Reference section to
docs/FAQ.rst
ascatngs
download references from reference_file repository #672delly
tumor only rule #644ascatngs
download container #672- Documentation update on setting sentieon env variables in
bashrc
ascatngs
tumor normal rule for wgs cases #672- Individual rules (i.e. ngs filters) for cnv and sv callers. Only Manta will be delivered and added to the list of output files. #708
- Added "targeted" and "wgs" tags to variant callers to provide another layer of separation. #708
manta
convert inversion #709- Sentieon version to bioinformatic tool version parsing #685
- added
CITATION.cff
to cite BALSAMIC
- Upgrade to latest sentieon version 202010.02
- New name
MarkDuplicates
topicard_markduplicates
inbwa_mem
rule andcluster.json
- New name rule
GATK_contest
togatk_contest
- Avoid running pytest github actions workflow on
docs/**
andCHANGELOG.rst
changes - Updated
snakemake
tov6.5.3
#501 - Update
GNOMAD
URL - Split Tumor-only
cnvkit batch
into individual commands - Improved TMB calculation issue #51
- Generalized ascat, delly, and manta result in workflow. #708
- Generalized workflow to eliminate duplicate entries and code. #708
- Split Tumor-Normal
cnvkit batch
into individual commands - Moved params that are used in multiple rules to constants #711
- Changed the way conda and non-conda bioinfo tools version are parsed
- Python code formatter changed from Black to YAPF #619
- post-processing of the umi consensus in handling BI tags
- vcf-filtered-clinical tag files will have all variants including PASS
- Refactor snakemake
annotate
rules according to snakemake etiquette #636 - Refactor snakemake
align
rules according to snakemake etiquette #636 - Refactor snakemake
fastqc
vep
contest andmosdepth
rules according tosnakemake
etiquette #636 - Order of columns in QC and coverage report issue #601
delly
not showing in workflow at runtime #644ascatngs
documentation links inFAQs
#672varcall_py36
container build and push #703- Wrong spacing in reference json issue #704
- Refactor snakemake
quality control
rules according to snakemake etiquette #636
- Cleaned up unused container definitions and conda environment files
- Remove cnvkit calling for WGS cases
- Removed the install.sh script
- Updated COSMIC path to use version 94
- Updated path for gnomad and 1000genomes to a working path from Google Storage
- Updated sentieon util sort in umi to use Sentieon 20201002 version
- Fixed memory issue with vcfanno in vep_somatic rule fixes #661
- An error with Sentieon for better management of memory fixes #621
- Rename Github actions to reflect their content
- Changelog reminder workflow to Github
- Snakemake workflow for created PON reference
- Balsamic cli config command(pon) for creating json for PON analysis
- tumor lod option for passing tnscope-umi final variants
- Git guide to make balsamic release in FAQ docs
- Expanded multiqc result search dir to whole analysis dir
- Simple test for docker container
- Correctly version bump for Dockerfile
- Removed unused Dockerfile releases
- Removed redundant genome version from
reference.json
- Bug in
ngs_filter
rule set for tumor-only WGS - Missing delivery of tumor only WGS filter
- only pass variants are not part of delivery anymore
- delivery tag file ids are properly matched with sample_name
- tabix updated to 0.2.6
- fastp updated to 0.20.1
- samtools updated to 1.12
- bedtools updated to 2.30.0
- sentieon-dedup rule from delivery
- Removed all pre filter pass from delivery
- Target coverage (Picard HsMetrics) for UMI files is now correctly calculated.
- TNscope calculated AF values are fetched and written to AFtable.txt.
- ngs_filter_tnscope is also part of deliveries now
- rankscore is now a research tag instead of clinical
- Some typo and fixes in the coverage and constant metrics
- Delivery process is more verbose
- CNVKit output is now properly imported in the deliveries and workflow
- CSS style for qc coverage report is changed to landscape
- update download url for 1000genome WGS sites from ftp to http
- bump picard to version 2.25.0
assets
path is now added to bind path
- umi_workflow config json is set as true for panel and wgs as false.
- Rename umiconsensus bam file headers from {samplenames} to TUMOR/NORMAL.
- Documentation autobuild on RTFD
- Moved all requirements to setup.py, and added all package_data there. Clean up unused files.
tnsnv
removed from WGS analysis, both tumor-only and tumor-normal- GATK-BaseRecalibrator is removed from all workflows
- Fixed issue 577 with missing
tumor.merged.bam
andnormal.merged.bam
- Issue 448 with lingering tmp_dir. It is not deleted after analysis is properly finished.
- All variant calling rules use proper
tumor.merged.bam
ornormal.merged.bam
as inputs
- Updated docs with FAQ for UMI workflow
- fix job scheduling bug for benchmarking
- rankscore's output is now a proper vcf.gz file
- Manta rules now properly make a sample_name file
- github action workflow to autobuild release containers
balsamic init
to download reference and related containers done in PRs #464 #538balsamic config case
now only take a cache path instead of container and reference #538- UMI workflow added to main workflow in series of PRs #469 #477 #483 #498 #503 #514 #517
- DRAGEN for WGS applications in PR #488
- A framework for QC check PR #401
--quiet`
option forrun analysis
PR #491- Benchmark SLURM jobs after the analysis is finished PR #534
- One container per conda environment (i.e. decouple containers) PR #511 #525 #522
--disable-variant-caller
command forreport deliver
PR #439- Added genmod and rankscore in series of two PRs #531 and #533
- Variant filtering to Tumor-Normal in PR #534
- Split SNV/InDels and SVs from TNScope variant caller PR #540
- WGS Tumor only variant filters added in PR #548
- Update Manta to 1.6.0 PR #470
- Update FastQC to 0.11.9 PR #532
- Update BCFTools to 1.11 PR #537
- Update Samtools to 1.11 PR #537
- Increase resources and runtime for various workflows in PRs #482
- Python package dependenicies versions fixed in PR #480
- QoL changes to workflow in series of PR #471
- Series of documentation updates in PRs #489 #553
- QoL changes to scheduler script PR #491
- QoL changes to how temporary directories are handlded PR #516
- TNScope model apply rule merged with TNScope variant calling for tumor-normal in WGS #540
- Decoupled
fastp
rule into two rules to make it possible to use it for UMI runs #570
- A bug in Manta variant calling rules that didn't name samples properly to TUMOR/NORMAL in the VCF file #572
- Changed hk delivery tag for coverage-qc-report
- No UMI trimming for WGS applications #486
- Fixed a bug where BALSAMIC was checking for sacct/jobid file in local mode PR #497
readlink
command invep_germline
,vep_somatic
,split_bed
, andGATK_popVCF
#533- Fix various bugs for memory handling of Picardtools and its executable in PR #534
- Fixed various issues with
gsutils
in PR #550
gatk-register
command removed from installing GATK PR #496
- Fixed a bug with missing QC templates after
pip install
- CLI option to expand report generation for TGA and WES runs. Please see
balsamic report deliver --help
- BALSAMIC now generates a custom HTML report for TGA and WES cases.
- Reduces MQ cutoff from 50 to 40 to only remove obvious artifacts PR #535
- Reduces AF cutoff from 0.02 to 0.01 PR #535
config case
subcommand now has--tumor-sample-name
and--normal-sample-name
- Manta resource allocation is now properly set PR #523
- VarDict resource allocation in cluster.json increased (both core and time allocation) PR #523
- minimum memory request for GATK mutect2 and haplotypecaller is removed and max memory increased PR #523
- Document for Snakemake rule grammar PR #489
- removed
gatk3-register
command from Dockerfile(s) PR #508
- A secondary path for latest jobids submitted to cluster (slurm and qsub) PR #465
- UMI workflow using Sentieon tools. Analysis run available via balsamic run analysis --help command. PR #359
- VCFutils to create VCF from flat text file. This is for internal purpose to generate validation VCF. PR #349
- Download option for hg38 (not validated) PR #407
- Option to disable variant callers for WES runs. PR #417
- Missing cyvcf2 dependency, and changed conda environment for base environment PR #413
- Missing numpy dependency PR #426
- COSMIC db for hg19 updated to v90 PR #407
- Fastp trimming is now a two-pass trimming and adapter trimming is always enabled. This might affect coverage slightly PR #422
- All containers start with a clean environment #425
- All Sentieon environment variables are now added to config when workflow executes #425
- Branching model will be changed to gitflow
- Vardict-java version fixed. This is due to bad dependency and releases available on conda. Anaconda is not yet update with vardict 1.8, but vardict-java 1.8 is there. This causes various random breaks with Vardict's TSV output. #403
- Refactored Docker files a bit, preparation for decoupling #403
- In preparation for GATK4, IndelRealigner is removed #404
- Temp directory for various rules and workflow wide temp directory #396
- Refactored tags for housekeeper delivery to make them unique #395
- Increased core requirements for mutect2 #396
- GATK3.8 related utils run via jar file instead of gatk3 #396
- Config.json and DAG draph included in Housekeeper report #372
- New output names added to cnvkit_single and cnvkit_paired #372
- New output names added to vep.rule #372
- Delivery option to CLI and what to delivery with delivery params in rules that are needed to be delivered #376
- Reference data model with validation #371
- Added container path to install script #388
- Delivery file format simplified #376
- VEP rules have "all" and "pass" as output #376
- Downloaded reference structure changed #371
- genome/refseq.flat renamed to genome/refGene.flat #371
- reverted CNVKit to version 0.9.4 #390
- Missing pygments to requirements.txt to fix travis CI #364
- Wildcard resolve for deliveries of vep_germline #374
- Missing index file from deliverables #383
- Ambiguous deliveries in vep_somatic and ngs_filters #387
- Updated documentation to match with installation #391
- Temp files removed from list of outputs in vep.rule #372
- samtools.rule and merged it with bwa_mem #375
- Models to build config case JSON. The models and descriptions of their contents can now be found in BALSAMIC/utils/models.py
- Added analysis_type to report deliver command
- Added report and delivery capability to Alignment workflow
- run_validate.sh now has -d to handle path to analysis_dir (for internal use only) #361
- Fastq files are no longer being copied as part of creation of the case config file. A symlink is now created at the destination path instead
- Config structure is no longer contained in a collestion of JSON files. The config models are now built using Pydantic and are contained in BALSAMIC/utils/models.py
- Removed command line option "--fastq-prefix" from config case command
- Removed command line option "--config-path" from config case command. The config is now always saved with default name "case_id.json"
- Removed command line option "--overwrite-config" from config-case command The command is now always executed with "--overwrite-config True" behavior
- Refactored BALSAMIC/commands/config/case.py: Utility functions are moved to BALSAMIC/utils/cli.py Models for config fields can be found at BALSAMIC/utils/models.py Context aborts and logging now contained in pilot function Tests created to support new architecture
- Reduce analysis directory's storage
- Report generation warnings supressed by adding workdirectory
- Missing tag name for germline annotated calls #356
- Bind path is not added as None if analysis type is wgs #357
- Changes vardict to vardict-java #361
- pydantic to validate various models namely variant caller filters
- Variant caller filters moved into pydantic
- Install script and setup.py
- refactored install script with more log output and added a conda env suffix option
- refactored docker container and decoupled various parts of the workflow
- Added cram files for targeted sequencing runs fixes #286
- Added mosdepth to calculate coverage for whole exome and targeted sequencing
- Filter models added for tumor-only mode
- Enabling adapter trim enables pe adapter trim option for fastp
- Annotate germline variant calls
- Baitset name to picard hsmetrics
- Sambamba coverage and rules will be deprecated
- Fixed latest tag in install script
- Fixed lack of naming final annotated VCF TUMOR/NORMAL
- Increased run time for various slurm jobs fixes #314
- Enabled SV calls for VarDict tumor-only
- Updated ensembl-vep to v100.2
- Fixed sort issue with bedfiles after 100 slop
- Added Docker container definition for release and bumpversion
- Quality of life change to rtfd docs
- Fix Docker container with faulty git checkout
- Add "SENTIEON_TMPDIR" to wgs workflow
- Add docker container pull for correct version of install script
- CNV output as VCF
- Vep output for PASSed variants
- Report command with status and delivery subcommands
- Bed files are slopped 100bp for variant calling fix #262
- Disable vcfmerge
- Picard markduplicate output moved from log to output
- Vep upgraded to 99.1
- Removed SVs from vardict
- Refactored delivery plugins to produce a file with list of output files from workflow
- Updated snakemake to 5.13
- Fixed a bug where threads were not sent properly to rules
- Removed coverage annotation from mutect2
- Removed source deactivate from rules to suppress conda warning
- Removed
plugins delivery
subcommand - Removed annotation for germline caller results
- VEP now also produces a tab delimited file
- CNVkit rules output genemetrics and gene break file
- Added reference genome to be able to calculate AT/CG dropouts by Picard
- coverage plot plugin part of issue #75
- callable regions for CNV calling of tumor-only
- Increased time for indel realigner and base recalib rules
- decoupled vep stat from vep main rule
- changed qsub command to match UGE
- scout plugin updated
- WGS qc rules - updated with correct options (picard - CollectMultipleMetrics, sentieon - CoverageMetrics)
- Log warning if WES workflow cannot find SENTIEON* env variables
- Fixes issue with cnvkit and WGS samples #268
- Fix #267 coverage issue with long deletions in vardict
- dependencies for workflow report
- sentieon variant callers germline and somatic for wes cases
- housekeeper file path changed from basename to absolute
- scout template for sample location changed from delivery_report to scout
- rule names added to benchmark files
SGE qsub support release
install.sh
now also downloads latest container- Docker image for balsamic as part of ci
- Support for qsub alongside with slurm on
run analysis --profile
- Documentation updated
- Test fastq data and test panel bed file with real but dummy data
- Various links for reference genome is updated with working URL
- Config reference command now print correct output file
somatic vcfmerge release
- QC metrics for WGS workflow
- refGene.txt download to reference.json and reference workflow
- A new conda environment within container
- A new base container built via Docker (centos7:miniconda3_4_6_14)
- VCFmerge package as VCF merge rule (https://github.com/hassanfa/VCFmerge)
- A container for develop branch
- Benchmark rules to variant callers
- SLURM resource allocation for various variancalling rules optimized
- mergetype rule updated and only accepts one single tumor instead of multiple
- Removed unused output files from cnvkit which caused to fail on targetted analysis
- Removed target file from cnvkit batch
- CNVkit single missing reference file added
- CNVkit to WGS workflow
- get_thread for runs
- Optimized resources for SLURM jobs
- Removed hsmetrics for non-mark duplicate bam files
- Fixes a bug where missing capture kit bed file error for WGS cases
- benchmark path bug issue #221
- libreadline.so.6 symlinking and proper centos version for container
- Proper tag retrieval for release ### Changed
- BALSAMIC container change to latest and version added to help line
TL;DR:
- QoL changes to WGS workflow
- Simplified installation by moving all tools to a container
- Benchmarking using psutil
- ML variant calling for WGS
--singularity
option toconfig case
andconfig reference
- Fixed a bug with boolean values in analysis.json
install.sh
simplified and will be depricated- Singularity container updated
- Common somatic and germline variant callers are put in single file
- Variant calling workflow and analysis config files merged together
balsamic install
is removed- Conda environments for py36 and py27 are removed
- Permissions on
analysis/qc
dir are 777 now
This is major release. TL;DR:
- Major changes to CLI. See documentation for updates.
- New additions to reference generation and reference config file generation and complete overhaul
- Major changes to reposityory structure, conda environments.
- Creating and downloading reference files:
balsamic config reference
andbalsamic run reference
- Container definitions for install and running BALSAMIC
- Bunch of tests, setup coveralls and travis.
- Added Mutliqc, fastp to rule utilities
- Create Housekeeper and Scout files after analysis completes
- Added Sentieon tumor-normal and tumor only workflows
- Added trimming option while creating workflow
- Added multiple tumor sample QC analysis
- Added pindle for indel variant calling
- Added Analysis finish file in the analysis directory
- Multiple fixes to snakemake rules
- Running analysis through:
balsamic run analysis
- Cluster account and email info added to
balsamic run analysis
umi
workflow through--umi
tag. [workflow still in evaluation]sample-id
replaced bycase-id
- Plan to remove FastQC as well
balsamic config report
andbalsamic report
sample.config
andreference.json
from config directory- Removed cutadapt from workflows
- picard hsmetrics now has 50000 cov max
- cnvkit single wildcard resolve bug fixed
- Various fixes to umi_single mode
- analysis_finish file does not block reruns anymore
- Added missing single_umi to analysis workflow cli
- vardict in single mode has lower AF threshold filter (0.005 -> 0.001)
- Reference to issue #141, fix for 3 other workflows
- CNVkit rule update for refflat file
- An analysis finish file is generated with date and time inside (%Y-%M-%d T%T %:z)
- picard version update to 2.18.11 github.com/hassanfa/picard
- Mutect single mode table generation fix
- Vardict single mode MVL annotation fix
- CNVkit single sample mode now in workflow
- MVL list from cheng et al. 2015 moved to assets
- Simple table for somatic variant callers for single sample mode added
- Fixes an issue with conda that unset variables threw an error issue #141
- Readme structure and example
- Mutect2's single sample output is similar to paired now
- cli path structure update
- test data and sample inputs
- A dag PDF will be generated when config is made
- umi specific variant calling
- VEP's perl module errors
- CoverageRep.R now properly takes protein_coding transcatipts only
UMI single sample align and QC
- Added rules and workflows for UMI analysis: QC and alignment
Germline single sample
- Germline single sample addition ### Changed
- Minor fixes to some rules to make them compatible with tumor mode
- Various bugs with DAG to keep popvcf and splitbed depending on merge bam file
- install script script fixed and help added
- Vardict, Strelka, and Manta separated from GATK best practice pipeline
- minro bugs with strelka_germline and freebayes merge ### Changed
- removed ERC from haplotypecaller
Germline patch
- Germline caller tested and added to the paired analysis workflow: Freebayes, HaplotypeCaller, Strelka, Manta
- Analysis config files updated
- Output directory structure changed
- vep rule is now a single rule
- Bunch of rule names updated and shortened, specifically in Picard and GATK
- Variant caller rules are all updated and changed
- output vcf file names are now more sensible: {SNV,SV}.{somatic,germline}.sampleId.variantCaller.vcf.gz
- Job limit increased to 300
- removed bcftools.rule for var id annotation
- Ugly and godforsaken
runSbatch.py
is now dumping sacct files with job IDs. Yikes!
- added
--fastq-prefix
option forconfig sample
to set fastq prefix name. Linking is not changed.
- patched a bug for copying results for strelka and manta which was introduced in
2.5.0
variant_panel
changed tocapture_kit
- sample config file takes balsamic version
- bioinfo tool config moved bioinfotool to cli_utils from
config report
- bioinfo tool versions is now added to analysis config file
balsamic run
has 3 stop points: paired variant calling, single mode variant calling, and QC/Alignment mode.balsamic run [OPTIONS] -S ...
is depricated, but it supersedesanalysis_type
mode if provided.
- CSV output for variants in each variant caller based on variant filters
- DAG image of workflow ### Changed
- Input for variant filter has a default value
delivery_report
is no created during config generation- Variant reporter R script cmd updated in
balsamic report
- Fastq files are now always linked to
fastq
directory within the analysis directory
balsamic config sample
now accepts individual files and paths. See README for usage.
- CollectHSmetric now run twice for before and after markduplicate
- Sample config file now includes a list of chromosomes in the panel bed file
- Non-matching chrom won't break the splitbed rule anymore
- collectqc rules now properly parse tab delimited metric files
- Coverage plot to report
- target coverage file to report json
- post-cutadapt fastqc to collectqc
- A header to report pdf
- list of bioinfo tools used in the analysis added to report ### Changed
- VariantRep.R now accepts multiple inputs for each parameter (see help)
- AF values for MSKIMPACT config ### Fixed
- Output figure for coverageplot is now fully square :-)
- normalized coverage plot script
- fastq file IO check for config creation
- added qos option to
balsamic run
### Fixed - Sambamba depth coverage parameters
- bug with picard markduplicate flag
- Added qos option for setting qos to run jobs with a default value of low
- Fixed package dependencies with vep and installation
Variant reporter patch and cli update
- Added
balsamic config sample
andbalsamic config report
to generate run analysis and reporting config - Added
VariantRep.R
script to information from merged variant table: variant summry, TMB, and much more - Added a workflow for single sample mode alignment and QC only
- Added QC skimming script to qccollect to generate nicely formatted information from picard ### Changed
- Change to CLI for running and creating config
- Major overhaul to coverage report script. It's now simpler and more readable! ### Fixed
- Fixed sambamba depth to include mapping quality
- Markduplicate now is now by default on marking mode, and will NOT remove duplicates
- Minor formatting and script beautification happened
- fixed a typo in MSKMVL config
- fixed a bug in strelka_simple for correct column orders
- rule for all three variant callers for paired analysis now generate a simple VCF file
- rule for all three variant callers for paired analysis to convert VCF into table format
- MVL config file and MVL annotation to VCF calls for SNV/INDEL callers
- CALLER annotation added to SNV/INDEL callers
- exome specific option for strelka paired
- create_config subcommand is now more granular, it accepts all enteries from sample.json as commandline arguments
- Added tabQuery to the assets as a tool to query the tabulated output of summarized VCF
- Added MQ annotation field to Mutect2 output see #67 ### Changed
- Leaner VCF output from mutect2 with coverage and MQ annotation according to #64
- variant ids are now updated from simple VCF file ### Fixed
- Fixed a bug with sambamba depth coverage reporting wrong exon and panel coverage see #68
- The json output is now properly formatted using yapf
- Strelka rule doesn't filter out PASS variants anymore fixes issue #63
Coverage report patch
- Added a new script to retrieve coverage report for a list of gene(s) and transcripts(s)
- Added sambamba exon depth rule for coverage report
- Added a new entry in reference json for exon bed file, this file generated using: https://github.com/hassanfa/GFFtoolkit ### Changed
- sambamba_depth rule changed to sambama_panel_depth
- sambamba depth now has fix-mate-overlaps parameter enabled
- sambamba string filter changed to
unmapped or mate\_is\_unmapped) and not duplicate and not failed\_quality\_control
. - sambamba depth for both panel and exon work on picard flag (rmdup or mrkdup). ### Fixed
- Fixed sambamba panel depth rule for redundant coverage parameter
create config patch for single and paired mode
- create_config is now accepting a paired|single mode instead of analysis json template (see help for changes). It is not backward compatible ### Added
- analysis_{paired single}.json for creating config. Analysis.json is now obsolete. ### Fixed
- A bug with writing output for analysis config, and creating the path if it doesn't exist.
- A bug with manta rule to correctly set output files in config.
- A bug that strelka was still included in sample analysis.
- Markduplicate flag to analysis config
- Single mode for vardict, manta, and mutect.
- merge type for tumor only ### Changed
- Single mode variant calling now has all variant calling rules ### Fixed
- run_analaysis now accepts workflows for testing pyrposes
- picard create bed interval rule moved into collect hsmetric
- split bed is dependent on bam merge rule
- vardict env now has specific build rather than URL download (conda doesn't support URLs anymore) ### Fixed
- new logs and scripts dirs are not re-created if they are empty
- A source altered picard to generated more quality metrics output is added to installation and rules
- report subcommand for generating a pdf report from a json input file
- Added fastqc after removing adapter ### Changed
- Markduplicate now has both REMOVE and MARK (rmdup vs mrkdup)
- CollectHSMetrics now has more steps on PCT_TARGET_BASES
- New log and script directories are now created for each re-run ### Fixed
- Picardtools' memory issue addressed for large samples
- single sample analysis mode
- alignment and insert size metrics are added to the workflow ### Changed
- collectqc and contest have their own rule for paired (tumor vs normal) and single (tumor only) sample.
- bed file for panel analysis is now mandatory to create analaysis config
- vep execution path
- working directory for snakemake
- sbatch submitter and cluster config now has an mail field ### Changed
create_config
now only requires sample and output json. The rest are optional
- snakefile and cluster config in run analysis are now optional with a default value
- vardict installation was failing without conda-forge channel
- gatk installation was failing without correct jar file
- gatk-register tmp directory
- create config sub command added as a new feature to create input config file
- templates to generate a config file for analysis added
- code style template for YAPF input created. see: https://github.com/google/yapf
- vt conda env added
- install script changed to create an output config
- README updated with usage
- fastq location for analysis config is now fixed
- lambda rules removed from cutadapt and fastq
- Added sbatch submitter to handle it outside snakemake ### Changed
- sample config file structure changed
- coding styles updated
- Added vt environment ### Fixed
- conda envs are now have D prefix instead of P (develop vs production)
- install_conda subcommand now accepts a proper conda prefix
- snakemake rules are now externally linked
- run_analysis subcommand
- Mutational Signature R script with CLI
- unittest to install_conda
- a method to semi-dynamically retrieve suitable conda env for each rule
- install.sh updated with gatk and proper log output
- conda environments updated
- vardict now has its own environment and it should not raise anymore errors
- install.sh to install balsamic
- balsamic barebone cli
- subcommand to install required environments
- README.md updated with basic installation instructions
- conda environment yaml files