-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #185 from TheJacksonLaboratory/release-v1.0.0-RC2
Release v1.0.0 rc2
- Loading branch information
Showing
149 changed files
with
1,202 additions
and
6,308 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,83 +1,14 @@ | ||
# SvAnna | ||
# SvAnna - Structural Variant Annotation and Analysis | ||
|
||
![Java CI with Maven](https://github.com/TheJacksonLaboratory/SvAnna/workflows/Java%20CI%20with%20Maven/badge.svg) | ||
[![Documentation Status](https://readthedocs.org/projects/squirls/badge/?version=latest)](https://svanna.readthedocs.io/en/latest/?badge=latest) | ||
|
||
![Java CI with Maven](https://github.com/TheJacksonLaboratory/SvAnna/workflows/Java%20CI%20with%20Maven/badge.svg) | ||
[![Documentation Status](https://readthedocs.org/projects/svanna/badge/?version=latest)](https://svanna.readthedocs.io/en/latest/?badge=latest) | ||
|
||
Efficient and accurate pathogenicity prediction for coding and regulatory structural variants in long-read genome sequencing | ||
|
||
Most users should download the latest SvAnna distribution ZIP file from | ||
the [Releases page](https://github.com/TheJacksonLaboratory/SvAnna/releases). | ||
|
||
Please consult the Read the docs site for detailed documentation - TODO - setup RTD. | ||
|
||
## Attic | ||
|
||
**The text below is out of sync, and the most useful parts of the text will be moved to *Read the docs*.** | ||
|
||
**The documentation needs to be completed.** | ||
|
||
### Creating the Jannovar transcript file | ||
[Jannovar](https://github.com/charite/jannovar) is a Java app/library for annotating | ||
VCF files. Its main use case is for small variants and their intersection with | ||
protein coding sequences. We will use it here to extract the positions of genes and | ||
SVs, but it may be easier just to start with a gencode GFF file in the future. | ||
|
||
Jannovar downloads various files and creates a transcript file that it uses for VCF annotation. | ||
At present, NCBI etc has changed the location of some files so that only the develop branch | ||
of Jannovar works. Enter the following commands to create the transcript file | ||
|
||
``` | ||
git clone | ||
https://github.com/charite/jannovar.git | ||
cd jannovar | ||
git checkout develop | ||
mvn package | ||
java [-Xmx8g] -jar jannovar-cli-0.36-SNAPSHOT.jar download -d hg38/refseq_curated | ||
``` | ||
This command downloads various files and generates `data/hg38_refseq_curated.ser`. either move | ||
this to the data subdirectory in this project or softlink it (from 'data', enter `ln -s <path>`). | ||
Thus, for now, this project expects the path `data/data/refseq_curated.ser`. | ||
|
||
## Running svann | ||
|
||
Enter the following command to see options. The LIRICAL file is the | ||
LIRICAL TSV output file. The enhancers file is created by the | ||
https://github.com/pnrobinson/tspec app. To use the enhancers file | ||
it is required to also use an HPO term with the major phenotypic abnormality, | ||
e.g., [Abnormality of the immune system](https://hpo.jax.org/app/browse/term/HP:0002715). | ||
|
||
``` | ||
$ java -jar target/svann.jar annotate -h | ||
Usage: svann annotate [-hV] [-e=<enhancerFile>] [-g=<geneCodePath>] | ||
[-j=<jannovarPath>] [-t=<hpoTermIdList>] -v=<vcfFile> | ||
[-x=<outprefix>] | ||
annotate VCF file | ||
-e, --enhancer=<enhancerFile> | ||
tspec enhancer file | ||
-g, --gencode=<geneCodePath> | ||
-h, --help Show this help message and exit. | ||
-j, --jannovar=<jannovarPath> | ||
prefix for output files (default: | ||
data/data/hg38_refseq_curated.ser ) | ||
-t, --term=<hpoTermIdList> HPO term IDs (comma-separated list) | ||
-v, --vcf=<vcfFile> | ||
-V, --version Print version information and exit. | ||
-x, --prefix=<outprefix> prefix for output files (default: L2O ) | ||
``` | ||
|
||
|
||
|
||
|
||
# Documentation | ||
|
||
Generate the read the docs documentation locally by going to the ``docs`` subdirectory. | ||
First generate a virtual environment and install the required sphinx packages. :: | ||
|
||
virtualenv p38 | ||
source p38/bin/activate | ||
pip install sphinx sphinx-rtd-theme | ||
|
||
To create the documentation, ensure you are using the ``p38`` environment and enter the following command. :: | ||
|
||
source p38/bin/activate | ||
make html | ||
|
||
This will generate HTML pages under ``_build/html``. | ||
Please consult the Read the docs site for [detailed documentation](https://svanna.readthedocs.io/en/latest). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,22 +1,22 @@ | ||
SvAnna: Annotation of Structural Variants in VCF files | ||
===================================================== | ||
SvAnna: | ||
======= | ||
|
||
Efficient and accurate pathogenicity prediction for coding and regulatory structural variants in long-read genome sequencing | ||
|
||
SvAnna | ||
~~~~~ | ||
|
||
This application annotates structural variants in VCF files, focussing specifically on long-read WGS analysis | ||
SvAnna performs phenotype-driven prioritization of structural variants in VCF files, focusing specifically on long-read WGS analysis | ||
of germline variants. | ||
|
||
|
||
|
||
.. toctree:: | ||
:maxdepth: 2 | ||
:caption: Contents: | ||
|
||
quickstart | ||
setup | ||
enhancers | ||
running | ||
BND<bndannotations> | ||
structuralvariation | ||
outputformats | ||
|
||
|
||
.. structuralvariation | ||
.. enhancers | ||
.. BND<bndannotations> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
.. _rstoutputformats: | ||
|
||
============== | ||
Output formats | ||
============== | ||
|
||
SvAnna supports storing results in 4 output formats: *HTML*, *VCF* *CSV*, and *TSV*. Use the ``--output-format`` option | ||
to select one or more of the desired output formats (e.g. ``--output-format html,vcf``). | ||
|
||
HTML output format | ||
^^^^^^^^^^^^^^^^^^ | ||
|
||
SvAnna creates an *HTML* file with the analysis summary and with variants sorted by the :math:`TAD_{SV}` score | ||
in descending order. | ||
By default, top 100 variants are included into the report. The number of the reported variants can be adjusted by | ||
the ``--report-top-variants`` option. | ||
|
||
The report consists of several parts: | ||
|
||
* *Analysis summary* - Details of HPO terms of the proband, paths of the input files, and the analysis parameters. | ||
* *Variant counts* - Breakdown of the number of the variant types of the different categories. | ||
* *Prioritized SVs* - Visualizations of the prioritized variants. | ||
|
||
.. TODO - write more about the HTML report | ||
.. note:: | ||
Only the variants that passed all the filters are visualized in the *Prioritized SVs* section | ||
|
||
The ``--no-breakends`` excludes breakend/translocation variants from the report. | ||
|
||
VCF output format | ||
^^^^^^^^^^^^^^^^^ | ||
When including ``vcf`` into the ``--output-format`` option, a VCF file with all input variants is created. | ||
The prioritization adds a novel *INFO* field to each variant: | ||
|
||
* ``TADSV`` - an *INFO* field containing :math:`TAD_{SV}` score for the variant. | ||
|
||
.. note:: | ||
* ``--report-top-variants`` option has no effect for the *VCF* output format. | ||
* add ``--uncompressed-output`` flag if you want to get uncompressed VCF file | ||
|
||
|
||
CSV/TSV output format | ||
^^^^^^^^^^^^^^^^^^^^^ | ||
To write *n* most deleterious variants into a *CSV* (or *TSV*) file, use ``csv`` (``tsv``) in the ``--output-format`` option. | ||
|
||
The results are written into a tabular file with the following columns: | ||
|
||
* *contig* - name of the contig/chromosome (e.g. ``1``, ``2``, ``X``) | ||
* *start* - 0-based start coordinate (excluded) of the variant on positive strand | ||
* *end* - 0-based end coordinate (included) of the variant on positive strand | ||
* *id* - variant ID as it was present in the input VCF file | ||
* *vtype* - variant type, one of {``DEL``, ``DUP``, ``INV``, ``INS``, ``BND``, ``CNV``} | ||
* *failed_filters* - the names of filters that the variant failed to pass. The names are separated by semicolon (``;``) | ||
* ``filter`` - the variant failed previous VCF filters - at least one filter flag is present in the variant VCF line, except for ``PASS``. | ||
* ``coverage`` - the variant is supported by less reads than specified by ``--min-read-support`` option | ||
* *tadsv* - the :math:`TAD_{SV}` score value | ||
|
||
.. table:: Tabular output | ||
|
||
======== ========= ========== ====== ======= ================= ===================== | ||
contig start end id vtype failed_filters tadsv | ||
======== ========= ========== ====== ======= ================= ===================== | ||
11 31130456 31671718 abcd DEL 109.75766900764305 | ||
18 46962113 46969912 efgh DUP filter;coverage 3.2 | ||
... ... ... ... ... ... ... | ||
======== ========= ========== ====== ======= ================= ===================== | ||
|
||
.. note:: | ||
add ``--uncompressed-output`` flag if you want to get uncompressed tabular file |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,101 @@ | ||
.. _rstquickstart: | ||
|
||
========== | ||
Quickstart | ||
========== | ||
|
||
This document is intended for the impatient users who want to quickly setup and prioritize variants with SvAnna. | ||
|
||
Prerequisites | ||
^^^^^^^^^^^^^ | ||
|
||
SvAnna is written in Java 11 and needs Java 11+ to be present in the runtime environment. Please verify that you are | ||
using Java 11+ by running:: | ||
|
||
java -version | ||
|
||
|
||
SvAnna setup | ||
^^^^^^^^^^^^ | ||
|
||
SvAnna is install by running the following three steps. | ||
|
||
1. Download SvAnna distribution ZIP | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Download and extract SvAnna distribution ZIP archive from `here <https://github.com/TheJacksonLaboratory/SvAnna/releases>`_. | ||
Expand the *Assets* menu and download the ``svanna-cli-{version}-distribution.zip``. Choose the latest stable version, | ||
or a release candidate (RC). | ||
|
||
After unzipping the distribution archive, run the following command to display the help message:: | ||
|
||
java -jar svanna-cli-1.0.0-RC1.jar --help | ||
|
||
.. note:: | ||
If things went OK, the command above will print the following help message:: | ||
|
||
Structural variant prioritization | ||
Usage: svanna-cli.jar [-hV] [COMMAND] | ||
-h, --help Show this help message and exit. | ||
-V, --version Print version information and exit. | ||
Commands: | ||
generate-config, G Generate a configuration YAML file | ||
prioritize, P Prioritize the variants | ||
See the full documentation at `https://github.com/TheJacksonLaboratory/SvAnna` | ||
|
||
2. Download SvAnna database files | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Run the following:: | ||
|
||
wget https://svanna.s3.amazonaws.com/svanna.zip && unzip svanna.zip | ||
wget https://squirls.s3.amazonaws.com/jannovar_v0.35.zip && unzip jannovar_v0.35.zip | ||
|
||
|
||
3. Generate & fill the configuration file | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Generate the configuration file:: | ||
|
||
java -jar `pwd`/svanna/svanna-cli-1.0.0-RC1.jar generate-config svanna-config.yml | ||
|
||
Now open the generated file in your favorite text editor and provide absolute paths to the following two resources: | ||
|
||
* ``dataDirectory:`` - the absolute path to the folder where SvAnna database files were extracted | ||
* ``jannovarCachePath`` - the absolute path to selected Jannovar ``*.ser`` file, e.g. ``/path/to/hg38_refseq.ser`` | ||
|
||
.. tip:: | ||
The YAML syntax requires a whitespace to be present between the *key*: *value* pairs. | ||
|
||
Note the location of the configuration file, as the path to the configuration file must be provided for all SvAnna runs. | ||
Having completed the steps above, you are good to prioritize variants in a VCF file. | ||
|
||
Prioritize structural variants in VCF file | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
Let's annotate a toy VCF file containing eight SVs reported in the SvAnna manuscript. | ||
|
||
First, let's download the VCF file:: | ||
|
||
wget https://github.com/TheJacksonLaboratory/SvAnna/blob/master/svanna-cli/src/examples/example.vcf | ||
|
||
The variants were sourced from published clinical case reports and each variant led to a Mendelian disease. | ||
|
||
For the purpose of this test run, let's assume that the VCF file contains SVs identified in a short/long read | ||
sequencing run of a patient presenting with the following clinical symptoms: | ||
|
||
* *HP:0011890* - Prolonged bleeding following procedure | ||
* *HP:0000978* - Bruising susceptibility | ||
* *HP:0012147* - Reduced quantity of Von Willebrand factor | ||
|
||
Now, let's prioritize the variants:: | ||
|
||
java -jar svanna/svanna-cli-1.0.0-RC1.jar prioritize --config svanna-config.yml --output-format html,csv,vcf --vcf example.vcf --term HP:0011890 --term HP:0000978 --term HP:0012147 | ||
|
||
The variant with ID ``Othman-2010-20696945-VWF-index-FigS7`` that disrupts a promoter of the *von Willenbrand factor* | ||
(*VWF*) gene (`Othman et al., 2010 <https://pubmed.ncbi.nlm.nih.gov/20696945>`_) | ||
receives the highest :math:`TAD_{SV}` score of 25.61, and the variant is placed on rank 1. | ||
|
||
SvAnna stores prioritization results in *HTML*, *CSV*, and *VCF* output formats next to the input VCF file. | ||
|
||
Read the :ref:`rstsetup` and :ref:`rstrunning` sections to learn all details regarding setting up and running SvAnna. |
Oops, something went wrong.