Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: command in vcf2cytosure rule and update ReadtheDocs #966

Merged
merged 15 commits into from
Jun 23, 2022
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ elif config["analysis"]["sequencing_type"] == "wgs" and config["analysis"]["anal
message: "Converting CNVs from VCF to the CGH format using vcf2cytosure for {params.case_name}"
shell:
"""
grep -E "#|PASS" {input.ascat_vcf} | bgzip -l 9 -c > {output.ascat_vcf};
zgrep -E "#|PASS" {input.ascat_vcf} | bgzip -l 9 -c > {output.ascat_vcf};

vcf2cytosure --vcf {output.ascat_vcf} --coverage {input.tiddit_cov_tumor} --out {output.cgh_tumor} --sex {params.gender} --bins 20

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ rule manta_tumor_normal:
tumor = get_sample_type(config["samples"], "tumor"),
normal = get_sample_type(config["samples"], "normal"),
case_name = config["analysis"]["case_id"],
manta_install_path = "/opt/conda/share/manta-1.6.0-1"
manta_install_path = "/opt/conda/share/manta-1.6.0-2"
threads:
get_threads(cluster_config, "manta_tumor_normal")
message:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ rule manta_tumor_only:
runmode = "local",
tumor = get_sample_type(config["samples"], "tumor"),
case_name = config["analysis"]["case_id"],
manta_install_path= "/opt/conda/share/manta-1.6.0-1"
manta_install_path= "/opt/conda/share/manta-1.6.0-2"
threads:
get_threads(cluster_config, "manta_tumor_only")
message:
Expand Down
1 change: 1 addition & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ Fixed:
* `run_validate.sh` script https://github.com/Clinical-Genomics/BALSAMIC/pull/952
* Somatic SV tumor normal rules https://github.com/Clinical-Genomics/BALSAMIC/pull/959
* Missing `genderChr` flag for `ascat_tumor_normal` rule https://github.com/Clinical-Genomics/BALSAMIC/pull/963
* Command in vcf2cytosure rule and updated ReadtheDocs https://github.com/Clinical-Genomics/BALSAMIC/pull/966

Removed
^^^^^^^
Expand Down
56 changes: 0 additions & 56 deletions docs/FAQs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -96,59 +96,3 @@ Make a pull request to master at this point. After pull request is approved and
- Never force rebase commits into either `master` or `develop` branches.
- When merging pull requests commits into `master` branch, use **Create a merge commit**, which helps to capture all the commit history. On contrary, when merging pull requests into `develop` branch, use **Squash and merge** button, which combines multiple commits messages into one commit.

**References**
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**How to generate reference files for ascatNGS**

Detailed information is available from `ascatNGS <https://github.com/cancerit/ascatNgs>`_ documentation

Briefly, ascatNGS needs gender loci file if gender information for the input sample is not available. The second file is *SnpGcCorrections.tsv*, which is prepared from the 1000 genome SNP panel.

1. **Gender loci file:**

GRCh37d5_Y.loci contains the following contents:

.. line-block::
Y 4546684
Y 2934912
Y 4550107
Y 4549638


2. **GC correction file:**

First step is to download the 1000 genome snp file and convert it from .vcf to .tsv. The detailed procedure to for this step is available from `ascatNGS-reference-files <https://github.com/cancerit/ascatNgs/wiki/Human-reference-files-from-1000-genomes-VCFs>`_ (Human reference files from 1000 genomes VCFs)

.. code-block::

export TG_DATA=ftp://ftp.ensembl.org/pub/grch37/release-83/variation/vcf/homo_sapiens/1000GENOMES-phase_3.vcf.gz


Followed by:

.. code-block::

curl -sSL $TG_DATA | zgrep -F 'E_Multiple_observations' | grep -F 'TSA=SNV' |\
perl -ane 'next if($F[0] !~ m/^\d+$/ && $F[0] !~ m/^[XY]$/);\
next if($F[0] eq $l_c && $F[1]-1000 < $l_p); $F[7]=~m/MAF=([^;]+)/;\
next if($1 < 0.05); printf "%s\t%s\t%d\n", $F[2],$F[0],$F[1];\
$l_c=$F[0]; $l_p=$F[1];' > SnpPositions_GRCh37_1000g.tsv


--or--

.. code-block::

curl -sSL $TG_DATA | zgrep -F 'E_Multiple_observations' | grep -F 'TSA=SNV' |\
perl -ane 'next if($F[0] !~ m/^\d+$/ && $F[0] !~ m/^[XY]$/); $F[7]=~m/MAF=([^;]+)/;\
next if($1 < 0.05); next if($F[0] eq $l_c && $F[1]-1000 < $l_p);\
printf "%s\t%s\t%d\n", $F[2],$F[0],$F[1]; $l_c=$F[0]; $l_p=$F[1];'\
> SnpPositions_GRCh37_1000g.tsv

Second step is to use *SnpPositions.tsv* file and generate *SnpGcCorrections.tsv* file, more details see `ascatNGS-convert-snppositions <https://github.com/cancerit/ascatNgs/wiki/Convert-SnpPositions.tsv-to-SnpGcCorrections.tsv>`_

.. code-block::

ascatSnpPanelGcCorrections.pl genome.fa SnpPositions.tsv > SnpGcCorrections.tsv

2 changes: 1 addition & 1 deletion docs/README.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
=========
Build Doc
Build doc
=========

Following steps explains how to build documents locally.
Expand Down
2 changes: 1 addition & 1 deletion docs/balsamic_annotation.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
***********************************
Annotation Resources
Annotation resources
***********************************

BALSAMIC annotates somatic single nucleotide variants (SNVs) using ``ensembl-vep`` and ``vcfanno``. Somatic structural variants (SVs), somatic copy-number variants (CNVs) and germline single nucleotide variants are annotated using only ``ensembl-vep``. All SVs and CNVs are merged using ``SVDB`` before annotating for `Target Genome Analysis (TGA)` or `Whole Genome Sequencing (WGS)` analyses.
Expand Down
49 changes: 46 additions & 3 deletions docs/balsamic_filters.rst
Original file line number Diff line number Diff line change
@@ -1,9 +1,52 @@
***********************************
Calling and Filtering Variants
Calling and filtering variants
***********************************

In BALSAMIC, various bioinfo tools are integrated for reporting somatic and germline variants. Also, the choice of these tools differs between the type of analysis,
e.g.: `Target Genome Analysis (TGA)` or analysis of `Whole Genome Sequencing (WGS)`. Various filters (Pre-call and Post-call filtering) are applied at different levels to report high-confidence variant calls.
In BALSAMIC, various bioinfo tools are integrated for reporting somatic and germline variants summarized in the table below. The choice of these tools differs between the type of analysis, `Target Genome Analysis (TGA)` or analysis of `Whole Genome Sequencing (WGS)`.


.. list-table:: SNV and small-Indel callers
:widths: 22 27 25 20 20
:header-rows: 1

* - Variant caller
- Sequencing type
- Analysis type
- Somatic/Germline
- Variant type
* - DNAscope
- WGS
- tumor-normal, tumor-only
- germline
- SNV, InDel
* - TNhaplotyper
- TGA, WES, WGS :superscript:`1`
- tumor-normal, tumor-only
- somatic
- SNV, InDel
* - TNscope :superscript:`2`
- WGS
- tumor-normal, tumor-only
- somatic
- SNV, InDel
* - TNScope_umi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* - TNScope_umi
* - TNscope_umi

- TGA, WGS
- tumor-normal, tumor-only
- somatic, germline
- SNV, InDel
* - VarDict
- TGA, WGS
- tumor-normal, tumor-only
- somatic
- SNV, InDel

:superscript:`1` TNhaplotyper is only executed for tumor-only if a WGS case is being analysed

:superscript:`2` TNscope output is being merged with TNhaplotyper calls for TO-WGS analysis



Various filters (Pre-call and Post-call filtering) are applied at different levels to report high-confidence variant calls.

**Pre-call filtering** is where the variant-calling tool decides not to add a variant to the VCF file if the default filters of the variant-caller did not pass the filter criteria. The set of default filters differs between the various variant-calling algorithms.

Expand Down
6 changes: 3 additions & 3 deletions docs/balsamic_methods.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
========
Methods
========
===================
Method description
===================

Target Genome Analysis
~~~~~~~~~~~~~~~~~~~~~~
Expand Down
54 changes: 52 additions & 2 deletions docs/balsamic_sv_cnv.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
************************************
Structural and Copy Number Variants
Structural and Copy Number variants
************************************

Depending on the sequencing type, BALSAMIC is currently running the following structural and copy number variant callers:
Expand Down Expand Up @@ -42,6 +42,8 @@ Depending on the sequencing type, BALSAMIC is currently running the following st

Further details about a specific caller can be found in the links for the repositories containing the documentation for SV and CNV callers along with the links for the articles are listed in `bioinfo softwares <https://github.com/Clinical-Genomics/BALSAMIC/blob/master/docs/bioinfo_softwares.rst>`_.

It mandatory to provide the gender of the sample from BALSAMIC version >= 10.0.0 For CNV analysis.

The copy number variants, identified using ascatNgs and `dellycnv`, are converted to deletion and duplications before they are merged using `SVDB` with `--bnd_distance = 5000` (distance between end points for the variants from different callers) and `--overlap = 0.80` (percentage for overlapping bases for the variants from different callers). `SVDB` prioritizes the merging of variants from SV and CNV callers to fetch position and genotype information, in the following order:

.. list-table:: SVDB merge caller priority order
Expand Down Expand Up @@ -81,4 +83,52 @@ The following command can be used to fetch the variants identified by a specific

::

zgrep -E "#|<Caller>" <*.svdb.vcf.gz>
zgrep -E "#|<Caller>" <*.svdb.vcf.gz>



**Genome Reference Files**
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**How to generate genome reference files for ascatNGS**

Detailed information is available from `ascatNGS <https://github.com/cancerit/ascatNgs>`_ documentation

The file *SnpGcCorrections.tsv* prepared from the 1000 genome SNP panel.

**GC correction file:**

First step is to download the 1000 genome snp file and convert it from .vcf to .tsv. The detailed procedure to for this step is available from `ascatNGS-reference-files <https://github.com/cancerit/ascatNgs/wiki/Human-reference-files-from-1000-genomes-VCFs>`_ (Human reference files from 1000 genomes VCFs)

.. code-block::

export TG_DATA=ftp://ftp.ensembl.org/pub/grch37/release-83/variation/vcf/homo_sapiens/1000GENOMES-phase_3.vcf.gz


Followed by:

.. code-block::

curl -sSL $TG_DATA | zgrep -F 'E_Multiple_observations' | grep -F 'TSA=SNV' |\
perl -ane 'next if($F[0] !~ m/^\d+$/ && $F[0] !~ m/^[XY]$/);\
next if($F[0] eq $l_c && $F[1]-1000 < $l_p); $F[7]=~m/MAF=([^;]+)/;\
next if($1 < 0.05); printf "%s\t%s\t%d\n", $F[2],$F[0],$F[1];\
$l_c=$F[0]; $l_p=$F[1];' > SnpPositions_GRCh37_1000g.tsv


--or--

.. code-block::

curl -sSL $TG_DATA | zgrep -F 'E_Multiple_observations' | grep -F 'TSA=SNV' |\
perl -ane 'next if($F[0] !~ m/^\d+$/ && $F[0] !~ m/^[XY]$/); $F[7]=~m/MAF=([^;]+)/;\
next if($1 < 0.05); next if($F[0] eq $l_c && $F[1]-1000 < $l_p);\
printf "%s\t%s\t%d\n", $F[2],$F[0],$F[1]; $l_c=$F[0]; $l_p=$F[1];'\
> SnpPositions_GRCh37_1000g.tsv

Second step is to use *SnpPositions.tsv* file and generate *SnpGcCorrections.tsv* file, more details see `ascatNGS-convert-snppositions <https://github.com/cancerit/ascatNgs/wiki/Convert-SnpPositions.tsv-to-SnpGcCorrections.tsv>`_

.. code-block::

ascatSnpPanelGcCorrections.pl genome.fa SnpPositions.tsv > SnpGcCorrections.tsv

2 changes: 1 addition & 1 deletion docs/bioinfo_softwares.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
=================================
List of bioinformatics software
Tools and software
=================================

BALSAMIC ( **version** = 9.0.1 ) uses myriad of tools and softwares to analyze fastq files. This section covers why each
Expand Down
4 changes: 2 additions & 2 deletions docs/cli_package.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
=============
CLI reference
CLI usage
=============

.. click:: BALSAMIC.commands.base:cli
:prog: BALSAMIC
:show-nested:
:nested: full
4 changes: 2 additions & 2 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
import os
import sys

sys.path.insert(0, os.path.abspath("../"))
sys.path.insert(0, os.path.abspath(".."))

# -- Project information -----------------------------------------------------

Expand All @@ -31,7 +31,7 @@
"sphinx.ext.mathjax",
"sphinx.ext.viewcode",
"sphinxcontrib.napoleon",
"sphinx_click.ext",
"sphinx_click",
"sphinxarg.ext",
"recommonmark",
]
Expand Down
2 changes: 1 addition & 1 deletion docs/git_etiquette.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
=============
Git Etiquette
Git etiquette
=============

It is recommended to follow a system to standardize the commit messages loosely. Following up from commit messages discussed on https://github.com/Clinical-Genomics/development/pull/97 , the format below is recommended for commit messages:
Expand Down
2 changes: 1 addition & 1 deletion docs/history.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
CHANGELOG
Changelog
=========

.. include:: ../CHANGELOG.rst
24 changes: 5 additions & 19 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,38 +8,23 @@

install
user_guide
cli_package


.. toctree::
:caption: Resources
:name: resources
:caption: Detailed documentation
:name: detailed documentation
:hidden:
:maxdepth: 1

balsamic_filters
balsamic_sv_cnv
balsamic_annotation
balsamic_methods
history
bioinfo_softwares


.. toctree::
:caption: CLI reference
:name: api_cli_reference
:hidden:
:maxdepth: 1

cli_package

.. toctree::
:caption: Other Info
:name: other_info
:hidden:
:maxdepth: 1

history
resources

.. toctree::
:caption: Development guide
:name: development_guide
Expand All @@ -51,3 +36,4 @@
README
semver
FAQs
resources
2 changes: 1 addition & 1 deletion docs/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ docutils>=0.14,<0.18
recommonmark==0.7.1
sphinx==4.2.0
sphinx-argparse==0.3.1
sphinx-click==3.0.1
sphinx-click==3.0.2
sphinx_rtd_theme==1.0.0
sphinxcontrib-napoleon==0.7
furo==2021.10.9
7 changes: 2 additions & 5 deletions docs/resources.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,8 @@
===============
================
Other resources
===============
================


Resources
---------

*Main resources including knowledge base and databases necessary for pipeline development*


Expand Down
2 changes: 1 addition & 1 deletion docs/snakemake_etiquette.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
===================
Snakemake Etiquette
Snakemake etiquette
===================

The bioinformatics core analysis in BALSAMIC is defined by set of rules written as a Snakemake rules (``*.rule``) and Snakemake
Expand Down