Skip to content

Commit

Permalink
Merge pull request #185 from TheJacksonLaboratory/release-v1.0.0-RC2
Browse files Browse the repository at this point in the history
Release v1.0.0 rc2
  • Loading branch information
Daniel Danis authored Jul 12, 2021
2 parents f03263b + 4f8530e commit 162e0f1
Show file tree
Hide file tree
Showing 149 changed files with 1,202 additions and 6,308 deletions.
15 changes: 14 additions & 1 deletion CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,24 @@
Changelog
=========


----------
v1.0.0-RC2
----------

- Implement VCF output format
- Clean up the repo from the obsolete code
- Improve documentation & test coverage
- Bug fixes
- remove null pointer in ``GeneService``
- do not run coverage filter if the coverage data is missing for a variant


----------
v1.0.0-RC1
----------

- Rename `annotate` CLI command to `prioritize`
- Rename ``annotate`` CLI command to ``prioritize``
- Multiple minor adjustments


Expand Down
85 changes: 8 additions & 77 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,83 +1,14 @@
# SvAnna
# SvAnna - Structural Variant Annotation and Analysis

![Java CI with Maven](https://github.com/TheJacksonLaboratory/SvAnna/workflows/Java%20CI%20with%20Maven/badge.svg)
[![Documentation Status](https://readthedocs.org/projects/squirls/badge/?version=latest)](https://svanna.readthedocs.io/en/latest/?badge=latest)

![Java CI with Maven](https://github.com/TheJacksonLaboratory/SvAnna/workflows/Java%20CI%20with%20Maven/badge.svg)
[![Documentation Status](https://readthedocs.org/projects/svanna/badge/?version=latest)](https://svanna.readthedocs.io/en/latest/?badge=latest)

Efficient and accurate pathogenicity prediction for coding and regulatory structural variants in long-read genome sequencing

Most users should download the latest SvAnna distribution ZIP file from
the [Releases page](https://github.com/TheJacksonLaboratory/SvAnna/releases).

Please consult the Read the docs site for detailed documentation - TODO - setup RTD.

## Attic

**The text below is out of sync, and the most useful parts of the text will be moved to *Read the docs*.**

**The documentation needs to be completed.**

### Creating the Jannovar transcript file
[Jannovar](https://github.com/charite/jannovar) is a Java app/library for annotating
VCF files. Its main use case is for small variants and their intersection with
protein coding sequences. We will use it here to extract the positions of genes and
SVs, but it may be easier just to start with a gencode GFF file in the future.

Jannovar downloads various files and creates a transcript file that it uses for VCF annotation.
At present, NCBI etc has changed the location of some files so that only the develop branch
of Jannovar works. Enter the following commands to create the transcript file

```
git clone
https://github.com/charite/jannovar.git
cd jannovar
git checkout develop
mvn package
java [-Xmx8g] -jar jannovar-cli-0.36-SNAPSHOT.jar download -d hg38/refseq_curated
```
This command downloads various files and generates `data/hg38_refseq_curated.ser`. either move
this to the data subdirectory in this project or softlink it (from 'data', enter `ln -s <path>`).
Thus, for now, this project expects the path `data/data/refseq_curated.ser`.

## Running svann

Enter the following command to see options. The LIRICAL file is the
LIRICAL TSV output file. The enhancers file is created by the
https://github.com/pnrobinson/tspec app. To use the enhancers file
it is required to also use an HPO term with the major phenotypic abnormality,
e.g., [Abnormality of the immune system](https://hpo.jax.org/app/browse/term/HP:0002715).

```
$ java -jar target/svann.jar annotate -h
Usage: svann annotate [-hV] [-e=<enhancerFile>] [-g=<geneCodePath>]
[-j=<jannovarPath>] [-t=<hpoTermIdList>] -v=<vcfFile>
[-x=<outprefix>]
annotate VCF file
-e, --enhancer=<enhancerFile>
tspec enhancer file
-g, --gencode=<geneCodePath>
-h, --help Show this help message and exit.
-j, --jannovar=<jannovarPath>
prefix for output files (default:
data/data/hg38_refseq_curated.ser )
-t, --term=<hpoTermIdList> HPO term IDs (comma-separated list)
-v, --vcf=<vcfFile>
-V, --version Print version information and exit.
-x, --prefix=<outprefix> prefix for output files (default: L2O )
```




# Documentation

Generate the read the docs documentation locally by going to the ``docs`` subdirectory.
First generate a virtual environment and install the required sphinx packages. ::

virtualenv p38
source p38/bin/activate
pip install sphinx sphinx-rtd-theme

To create the documentation, ensure you are using the ``p38`` environment and enter the following command. ::

source p38/bin/activate
make html

This will generate HTML pages under ``_build/html``.
Please consult the Read the docs site for [detailed documentation](https://svanna.readthedocs.io/en/latest).
5 changes: 2 additions & 3 deletions docs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,10 @@

# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = python -msphinx
SPHINXPROJ = svann
SPHINXBUILD = sphinx-build
SPHINXPROJ = SvAnna
SOURCEDIR = .
BUILDDIR = _build
html_static_path = ['..']

# Put it first so that "make" without argument is like "make help".
help:
Expand Down
6 changes: 3 additions & 3 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@

# General information about the project.
project = u'SvAnna'
copyright = u'2021'
copyright = u'2021, Daniel Danis, Peter N Robinson'
author = u'Daniel Danis, Peter Robinson'

# The version info for the project you're documenting, acts as replacement for
Expand All @@ -56,7 +56,7 @@
# The short X.Y version.
version = u'1.0'
# The full version, including alpha/beta/rc tags.
release = u'1.0.0-RC1'
release = u'1.0.0-RC2'

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
Expand Down Expand Up @@ -142,7 +142,7 @@
# author, documentclass [howto, manual, or own class]).
latex_documents = [
(master_doc, 'SvAnna.tex', u'svann Documentation',
u'Peter Robinson', 'manual'),
u'Daniel Danis, Peter N Robinson', 'manual'),
]


Expand Down
20 changes: 10 additions & 10 deletions docs/index.rst
Original file line number Diff line number Diff line change
@@ -1,22 +1,22 @@
SvAnna: Annotation of Structural Variants in VCF files
=====================================================
SvAnna:
=======

Efficient and accurate pathogenicity prediction for coding and regulatory structural variants in long-read genome sequencing

SvAnna
~~~~~

This application annotates structural variants in VCF files, focussing specifically on long-read WGS analysis
SvAnna performs phenotype-driven prioritization of structural variants in VCF files, focusing specifically on long-read WGS analysis
of germline variants.



.. toctree::
:maxdepth: 2
:caption: Contents:

quickstart
setup
enhancers
running
BND<bndannotations>
structuralvariation
outputformats


.. structuralvariation
.. enhancers
.. BND<bndannotations>
70 changes: 70 additions & 0 deletions docs/outputformats.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
.. _rstoutputformats:

==============
Output formats
==============

SvAnna supports storing results in 4 output formats: *HTML*, *VCF* *CSV*, and *TSV*. Use the ``--output-format`` option
to select one or more of the desired output formats (e.g. ``--output-format html,vcf``).

HTML output format
^^^^^^^^^^^^^^^^^^

SvAnna creates an *HTML* file with the analysis summary and with variants sorted by the :math:`TAD_{SV}` score
in descending order.
By default, top 100 variants are included into the report. The number of the reported variants can be adjusted by
the ``--report-top-variants`` option.

The report consists of several parts:

* *Analysis summary* - Details of HPO terms of the proband, paths of the input files, and the analysis parameters.
* *Variant counts* - Breakdown of the number of the variant types of the different categories.
* *Prioritized SVs* - Visualizations of the prioritized variants.

.. TODO - write more about the HTML report
.. note::
Only the variants that passed all the filters are visualized in the *Prioritized SVs* section

The ``--no-breakends`` excludes breakend/translocation variants from the report.

VCF output format
^^^^^^^^^^^^^^^^^
When including ``vcf`` into the ``--output-format`` option, a VCF file with all input variants is created.
The prioritization adds a novel *INFO* field to each variant:

* ``TADSV`` - an *INFO* field containing :math:`TAD_{SV}` score for the variant.

.. note::
* ``--report-top-variants`` option has no effect for the *VCF* output format.
* add ``--uncompressed-output`` flag if you want to get uncompressed VCF file


CSV/TSV output format
^^^^^^^^^^^^^^^^^^^^^
To write *n* most deleterious variants into a *CSV* (or *TSV*) file, use ``csv`` (``tsv``) in the ``--output-format`` option.

The results are written into a tabular file with the following columns:

* *contig* - name of the contig/chromosome (e.g. ``1``, ``2``, ``X``)
* *start* - 0-based start coordinate (excluded) of the variant on positive strand
* *end* - 0-based end coordinate (included) of the variant on positive strand
* *id* - variant ID as it was present in the input VCF file
* *vtype* - variant type, one of {``DEL``, ``DUP``, ``INV``, ``INS``, ``BND``, ``CNV``}
* *failed_filters* - the names of filters that the variant failed to pass. The names are separated by semicolon (``;``)
* ``filter`` - the variant failed previous VCF filters - at least one filter flag is present in the variant VCF line, except for ``PASS``.
* ``coverage`` - the variant is supported by less reads than specified by ``--min-read-support`` option
* *tadsv* - the :math:`TAD_{SV}` score value

.. table:: Tabular output

======== ========= ========== ====== ======= ================= =====================
contig start end id vtype failed_filters tadsv
======== ========= ========== ====== ======= ================= =====================
11 31130456 31671718 abcd DEL 109.75766900764305
18 46962113 46969912 efgh DUP filter;coverage 3.2
... ... ... ... ... ... ...
======== ========= ========== ====== ======= ================= =====================

.. note::
add ``--uncompressed-output`` flag if you want to get uncompressed tabular file
101 changes: 101 additions & 0 deletions docs/quickstart.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
.. _rstquickstart:

==========
Quickstart
==========

This document is intended for the impatient users who want to quickly setup and prioritize variants with SvAnna.

Prerequisites
^^^^^^^^^^^^^

SvAnna is written in Java 11 and needs Java 11+ to be present in the runtime environment. Please verify that you are
using Java 11+ by running::

java -version


SvAnna setup
^^^^^^^^^^^^

SvAnna is install by running the following three steps.

1. Download SvAnna distribution ZIP
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Download and extract SvAnna distribution ZIP archive from `here <https://github.com/TheJacksonLaboratory/SvAnna/releases>`_.
Expand the *Assets* menu and download the ``svanna-cli-{version}-distribution.zip``. Choose the latest stable version,
or a release candidate (RC).

After unzipping the distribution archive, run the following command to display the help message::

java -jar svanna-cli-1.0.0-RC1.jar --help

.. note::
If things went OK, the command above will print the following help message::

Structural variant prioritization
Usage: svanna-cli.jar [-hV] [COMMAND]
-h, --help Show this help message and exit.
-V, --version Print version information and exit.
Commands:
generate-config, G Generate a configuration YAML file
prioritize, P Prioritize the variants
See the full documentation at `https://github.com/TheJacksonLaboratory/SvAnna`

2. Download SvAnna database files
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Run the following::

wget https://svanna.s3.amazonaws.com/svanna.zip && unzip svanna.zip
wget https://squirls.s3.amazonaws.com/jannovar_v0.35.zip && unzip jannovar_v0.35.zip


3. Generate & fill the configuration file
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Generate the configuration file::

java -jar `pwd`/svanna/svanna-cli-1.0.0-RC1.jar generate-config svanna-config.yml

Now open the generated file in your favorite text editor and provide absolute paths to the following two resources:

* ``dataDirectory:`` - the absolute path to the folder where SvAnna database files were extracted
* ``jannovarCachePath`` - the absolute path to selected Jannovar ``*.ser`` file, e.g. ``/path/to/hg38_refseq.ser``

.. tip::
The YAML syntax requires a whitespace to be present between the *key*: *value* pairs.

Note the location of the configuration file, as the path to the configuration file must be provided for all SvAnna runs.
Having completed the steps above, you are good to prioritize variants in a VCF file.

Prioritize structural variants in VCF file
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Let's annotate a toy VCF file containing eight SVs reported in the SvAnna manuscript.

First, let's download the VCF file::

wget https://github.com/TheJacksonLaboratory/SvAnna/blob/master/svanna-cli/src/examples/example.vcf

The variants were sourced from published clinical case reports and each variant led to a Mendelian disease.

For the purpose of this test run, let's assume that the VCF file contains SVs identified in a short/long read
sequencing run of a patient presenting with the following clinical symptoms:

* *HP:0011890* - Prolonged bleeding following procedure
* *HP:0000978* - Bruising susceptibility
* *HP:0012147* - Reduced quantity of Von Willebrand factor

Now, let's prioritize the variants::

java -jar svanna/svanna-cli-1.0.0-RC1.jar prioritize --config svanna-config.yml --output-format html,csv,vcf --vcf example.vcf --term HP:0011890 --term HP:0000978 --term HP:0012147

The variant with ID ``Othman-2010-20696945-VWF-index-FigS7`` that disrupts a promoter of the *von Willenbrand factor*
(*VWF*) gene (`Othman et al., 2010 <https://pubmed.ncbi.nlm.nih.gov/20696945>`_)
receives the highest :math:`TAD_{SV}` score of 25.61, and the variant is placed on rank 1.

SvAnna stores prioritization results in *HTML*, *CSV*, and *VCF* output formats next to the input VCF file.

Read the :ref:`rstsetup` and :ref:`rstrunning` sections to learn all details regarding setting up and running SvAnna.
Loading

0 comments on commit 162e0f1

Please sign in to comment.