Skip to content

Commit

Permalink
Merge pull request #339 from bcgsc/release/v3.1.0
Browse files Browse the repository at this point in the history
Release/v3.1.0
  • Loading branch information
creisle authored Nov 15, 2022
2 parents 479fdeb + 60817e7 commit dbbd9a3
Show file tree
Hide file tree
Showing 47 changed files with 2,795 additions and 1,839 deletions.
6 changes: 4 additions & 2 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,9 @@ jobs:
steps:
- uses: actions/checkout@v2
- name: install machine dependencies
run: sudo apt-get install -y libcurl4-openssl-dev
run: |
sudo apt-get update
sudo apt-get install -y libcurl4-openssl-dev
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
Expand Down Expand Up @@ -93,7 +95,7 @@ jobs:
- name: Install workflow dependencies
run: |
python -m pip install --upgrade pip setuptools wheel
pip install mavis_config pandas snakemake
pip install mavis_config pandas
- uses: eWaterCycle/setup-singularity@v6
with:
singularity-version: 3.6.4
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ The simplest way to use MAVIS is via Singularity. The MAVIS docker container use
by singularity will take care of installing the aligner as well.

```bash
pip install -U setuptools pip
pip install -U setuptools pip wheel
pip install mavis_config # also installs snakemake
```

Expand Down
10 changes: 10 additions & 0 deletions docs/background/citations.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,11 @@ Chen,X. et al. (2016) Manta: rapid detection of structural variants
and indels for germline and cancer sequencing applications.
Bioinformatics, 32, 1220--1222.

## Chiu-2021

Chiu,R. et al. (2021) Straglr: discovering and genotyping tandem repeat
expansions using whole genome long-read sequences. Genome Biol., 22, 224.

## Haas-2017

Haas,B et al. (2017) STAR-Fusion: Fast and Accurate Fusion
Expand Down Expand Up @@ -62,6 +67,11 @@ Saunders,C.T. et al. (2012) Strelka: accurate somatic small-variant
calling from sequenced tumor--normal sample pairs. Bioinformatics,
28, 1811--1817.

## Uhrig-2021

Uhrig,S. et al. (2021) Accurate and efficient detection of gene
fusions from RNA sequencing data. Genome Res., 31, 448--460.

## Yates-2016

Yates,A. et al. (2016) Ensembl 2016. Nucleic Acids Res., 44,
Expand Down
5 changes: 5 additions & 0 deletions docs/glossary.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,11 @@ install instructions.
Community based standard of reccommendations for variant notation.
See [http://varnomen.hgvs.org/](http://varnomen.hgvs.org/)

## Arriba

Arriba is an SV caller. Source for Arriba can be found
[here](https://github.com/suhrig/arriba) [Uhrig-2021](../background/citations#uhrig-2021)

## BreakDancer

BreakDancer is an SV caller. Source for BreakDancer can be found
Expand Down
4 changes: 4 additions & 0 deletions docs/inputs/.pages
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
nav:
- reference.md
- standard.md
- ...
28 changes: 28 additions & 0 deletions docs/inputs/non_python_dependencies.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Non-python Dependencies

MAVIS integrates with
[SV callers](./sv_callers.md),
[job schedulers](#job-schedulers), and
[aligners](#aligners). While some of
these dependencies are optional, all currently supported options are
detailed below. The versions column in the tables below list all the
versions which were tested for each tool. Each version listed is known
to be compatible with MAVIS.

## Job Schedulers

MAVIS v3 uses [snakemake](https://snakemake.readthedocs.io/en/stable/) to handle job scheduling

## Aligners

Two aligners are supported [bwa](../../glossary/#bwa) and
[blat](../../glossary/#blat) (default). These are both included in the docker image by default.

| Name | Version(s) | Environment Setting |
| ---------------------------------------------- | ----------------------- | ------------------------- |
| [blat](../../glossary/#blat) | `36x2` `36` | `MAVIS_ALIGNER=blat` |
| [bwa mem <bwa>](../../glossary/#bwa mem <bwa>) | `0.7.15-r1140` `0.7.12` | `MAVIS_ALIGNER='bwa mem'` |

!!! note
When setting the aligner you will also need to set the
[aligner_reference](../../configuration/settings/#aligner_reference) to match
16 changes: 8 additions & 8 deletions docs/inputs/reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,16 +10,16 @@ To improve the install experience for the users, different
configurations of the MAVIS annotations file have been made available.
These files can be downloaded below, or if the required configuration is
not available,
(instructions on generating the annotations file)[/inputs/reference/#generating-the-annotations-from-ensembl] can be found below.
[instructions on generating the annotations file](/inputs/reference/#generating-the-annotations-from-ensembl) can be found below.

| File Name (Type/Format) | Environment Variable | Download |
| --------------------------------------------------------------------------------------------- | ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [reference genome](../../inputs/reference/#reference-genome) ([fasta](../../glossary/#fasta)) | `MAVIS_REFERENCE_GENOME` | [![](../images/get_app-24px.svg) GRCh37/Hg19](http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz) <br> [![](../images/get_app-24px.svg) GRCh38](http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.tar.gz) |
| File Name (Type/Format) | Environment Variable | Download |
| --------------------------------------------------------------------------------------------- | ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [reference genome](../../inputs/reference/#reference-genome) ([fasta](../../glossary/#fasta)) | `MAVIS_REFERENCE_GENOME` | [![](../images/get_app-24px.svg) GRCh37/Hg19](http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz) <br> [![](../images/get_app-24px.svg) GRCh38](http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.tar.gz) |
| [annotations](../../inputs/reference/#annotations) ([JSON](../../glossary/#json)) | `MAVIS_ANNOTATIONS` | [![](../images/get_app-24px.svg) GRCh37/Hg19 + Ensembl69](http://www.bcgsc.ca/downloads/mavis/v3/ensembl69_hg19_annotations.v3.json.gz) <br> [![](../images/get_app-24px.svg) GRCh38 + Ensembl79](http://www.bcgsc.ca/downloads/mavis/v3/ensembl79_hg38_annotations.v3.json.gz) |
| [masking](../../inputs/reference/#masking-file) (text/tabbed) | `MAVIS_MASKING` | [![](../images/get_app-24px.svg) GRCh37/Hg19](http://www.bcgsc.ca/downloads/mavis/hg19_masking.tab)<br>[![](../images/get_app-24px.svg) GRCh38](http://www.bcgsc.ca/downloads/mavis/GRCh38_masking.tab) |
| [template metadata](../../inputs/reference/#template-metadata) (text/tabbed) | `MAVIS_TEMPLATE_METADATA` | [![](../images/get_app-24px.svg) GRCh37/Hg19](http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/cytoBand.txt.gz)<br>[![](../images/get_app-24px.svg) GRCh38](http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/cytoBand.txt.gz) |
| [DGV annotations](../../inputs/reference/#dgv-database-of-genomic-variants) (text/tabbed) | `MAVIS_DGV_ANNOTATION` | [![](../images/get_app-24px.svg) GRCh37/Hg19](http://www.bcgsc.ca/downloads/mavis/dgv_hg19_variants.tab)<br>[![](../images/get_app-24px.svg) GRCh38](http://www.bcgsc.ca/downloads/mavis/dgv_hg38_variants.tab) |
| [aligner reference](../../inputs/reference/#aligner-reference) | `MAVIS_ALIGNER_REFERENCE` | [![](../images/get_app-24px.svg) GRCh37/Hg19 2bit (blat)](http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.2bit)<br>[![](../images/get_app-24px.svg) GRCh38 2bit (blat)](http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.2bit) |
| [masking](../../inputs/reference/#masking-file) (text/tabbed) | `MAVIS_MASKING` | [![](../images/get_app-24px.svg) GRCh37/Hg19](http://www.bcgsc.ca/downloads/mavis/hg19_masking.tab)<br>[![](../images/get_app-24px.svg) GRCh38](http://www.bcgsc.ca/downloads/mavis/GRCh38_masking.tab) |
| [template metadata](../../inputs/reference/#template-metadata) (text/tabbed) | `MAVIS_TEMPLATE_METADATA` | [![](../images/get_app-24px.svg) GRCh37/Hg19](http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/cytoBand.txt.gz)<br>[![](../images/get_app-24px.svg) GRCh38](http://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/cytoBand.txt.gz) |
| [DGV annotations](../../inputs/reference/#dgv-database-of-genomic-variants) (text/tabbed) | `MAVIS_DGV_ANNOTATION` | [![](../images/get_app-24px.svg) GRCh37/Hg19](http://www.bcgsc.ca/downloads/mavis/dgv_hg19_variants.tab)<br>[![](../images/get_app-24px.svg) GRCh38](http://www.bcgsc.ca/downloads/mavis/dgv_hg38_variants.tab) |
| [aligner reference](../../inputs/reference/#aligner-reference) | `MAVIS_ALIGNER_REFERENCE` | [![](../images/get_app-24px.svg) GRCh37/Hg19 2bit (blat)](http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.2bit)<br>[![](../images/get_app-24px.svg) GRCh38 2bit (blat)](http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.2bit) |

If the environment variables above are set they will be used as the
default values when any step of the pipeline script is called (including
Expand Down
1 change: 1 addition & 0 deletions docs/inputs/support.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ It should be noted however that the tool tracked will only be listed as

| Name | Version(s) | MAVIS input | Publication |
| ------------------------------------------ | ---------------- | --------------------------------------------- | ----------------------------------------------------------- |
| [Arriba](../../glossary/#arriba) | `2.2.1` | `fusions.tsv` | [Uhrig-2021](../../background/citations#uhrig-2021) |
| [BreakDancer](../../glossary/#breakdancer) | `1.4.5` | `Tools main output file(s)` | [Chen-2009](../../background/citations#chen-2009) |
| [BreakSeq](../../glossary/#breakseq) | `2.2` | `work/breakseq.vcf.gz` | [Abyzov-2015](../../background/citations#abyzov-2015) |
| [Chimerascan](../../glossary/#chimerascan) | `0.4.5` | `*.bedpe` | [Iyer-2011](../../background/citations#Iyer-2011) |
Expand Down
158 changes: 158 additions & 0 deletions docs/inputs/sv_callers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
# SV Callers

MAVIS supports output from a wide-variety of SV callers. Assumptions are made for each tool based on interpretation of the output and the publications for each tool.

## Configuring Conversions

Adding a conversion step to your MAVIS run is as simple as adding that section to the input JSON config.

The general structure of this section is as follows

```jsonc
{
"convert": {
"<ALIAS>": {
"file_type": "<TOOL OUTPUT TYPE>",
"name": "<TOOL NAME>", // optional field for supported tools
"inputs": [
"/path/to/tool/output/file"
]
}
}
}
```

A full version of the input configuration file specification can be found in the [configuration](../configuration/general.md) section.

## Supported Tools

The tools and versions currently supported are given below. Versions listed indicate the version of the tool for which output files have been tested as input into MAVIS. MAVIS also supports a [general VCF input](#general-vcf-inputs).

| SV Caller | Version(s) Tested | Files used as MAVIS input |
| --------------------------------------------------------------------------- | ----------------- | --------------------------------------------- |
| [BreakDancer (Chen, 2009)](../../background/citations#chen-2009) | `1.4.5` | `Tools main output file(s)` |
| [BreakSeq (Abyzov, 2015)](../../background/citations#abyzov-2015) | `2.2` | `work/breakseq.vcf.gz` |
| [Chimerascan (Iyer, 2011)](../../background/citations#iyer-2011) | `0.4.5` | `*.bedpe` |
| [CNVnator (Abyzov, 2011)](../../background/citations#abyzov-2011) | `0.3.3` | `Tools main output file(s)` |
| [CuteSV (Jiang, 2020)](../../background/citations#jiang-2020) | `1.0.10` | `*.vcf` |
| [DeFuse (McPherson. 2011)](../../background/citations#mcpherson-2011) | `0.6.2` | `results/results.classify.tsv` |
| [DELLY (Rausch, 2012)](../../background/citations#rausch-2012) | `0.6.1` `0.7.3` | `combined.vcf` (converted from bcf) |
| [Manta (Chen, 2016)](../../background/citations#chen-2016) | `1.0.0` | `{diploidSV,somaticSV}.vcf` |
| [Pindel (Ye, 2009)](../../background/citations#ye-2009) | `0.2.5b9` | `Tools main output file(s)` |
| [Sniffles (Sedlazeck, 2018)](../../background/citations#sedlazeck-2018) | `1.0.12b` | `*.vcf` |
| [STAR-Fusion (Haas, 2017)](../../background/citations#haas-2017) | `1.4.0` | `star-fusion.fusion_predictions.abridged.tsv` |
| [Straglr (Chiu, 2021)](../../background/citations#chiu-2021) | | |
| [Strelka (Saunders, 2012)](../../background/citations#saunders-2012) | `1.0.6` | `passed.somatic.indels.vcf` |
| [Trans-ABySS (Robertson, 2010)](../../background/citations/#robertson-2010) | `1.4.8 (custom)` | `{indels/events_novel_exons,fusions/*}.tsv` | `<output_prefix>.bed` |

!!! note
[Trans-ABySS](../../glossary/#trans-abyss): The trans-abyss version
used was an in-house dev version. However the output columns are
compatible with 1.4.8 as that was the version branched from.
Additionally, although indels can be used from both genome and
transcriptome outputs of Trans-ABySS, it is recommended to only use the
genome indel calls as the transcriptome indels calls (for versions
tested) introduce a very high number of false positives. This will slow
down validation. It is much faster to simply use the genome indels for
both genome and transcriptome.

## [DELLY](../../glossary/#delly) Post-processing

Some post-processing on the delly output files is generally done prior
to input. The output BCF files are converted to a VCF file

```bash
bcftools concat -f /path/to/file/with/vcf/list --allow-overlaps --output-type v --output combined.vcf
```

## General VCF inputs

Assuming that the tool outputting the VCF file follows standard
conventions, then it is possible to use a
[general VCF conversion](../../package/mavis/tools/vcf)
that is not tool-specific. Given the wide variety in content for VCF files,
MAVIS makes a number of assumptions and the VCF conversion may not work
for all VCFs. In general MAVIS follows the [VCF 4.2
specification](https://samtools.github.io/hts-specs/VCFv4.2.pdf). If the
input tool you are using differs, it would be better to use a
[custom conversion script](#custom-conversions).

Using the general VCF tool with a non-standard tool can be done as follows

```json
{
"convert": {
"my_tool_alias": {
"file_type": "vcf",
"name": "my_tool",
"inputs": ["/path/to/my_tool/output.vcf"]
}
}
}
```

### Assumptions on non-standard INFO fields

- `PRECISE` if given, Confidence intervals are ignored if given in favour of exact breakpoint calls using pos and END as the breakpoint positions
- `CT` values if given are representative of the breakpoint orientations.
- `CHR2` is given for all interchromosomal events

### Translating BND type Alt fields

There are four possible configurations for the alt field of a BND type structural variant
based on the VCF specification. These correspond 1-1 to the orientation types for MAVIS
translocation structural variants.

```text
r = reference base/seq
u = untemplated sequence/alternate sequence
p = chromosome:position
```

| alt format | orients |
| ---------- | ------- |
| `ru[p[` | LR |
| `[p[ur` | RR |
| `]p]ur` | RL |
| `ru]p]` | LL |

## Custom Conversions

If there is a tool that is not yet supported by MAVIS and you would like it to be, you can either add a [feature request](https://github.com/bcgsc/mavis/issues) to our GitHub page or tackle writing the conversion script yourself. Either way there are a few things you will need

- A sample output from the tool in question
- Tool metadata for the citation, version, etc

### Logic Example - [Chimerascan](../../glossary/#chimerascan)

The following is a description of how the conversion script for
[Chimerascan](../../background/citations/#iyer-2011) was generated.
While this is a built-in conversion command now, the logic could also
have been put in an external script. As mentioned above, there are a
number of assumptions that had to be made about the tools output to
convert it to the
[standard mavis format](../../inputs/standard/). Assumptions were then verified by reviewing at a series of
called events in [IGV](../../glossary/#igv). In the current
example, [Chimerascan](../../background/citations/#iyer-2011) output
has six columns of interest that were used in the conversion

- start3p
- end3p
- strand3p
- start5p
- end5p
- strand5p

The above columns describe two segments which are joined. MAVIS requires
the position of the join. It was assumed that the segments are always
joined as a [sense fusion](../../glossary/#sense-fusion). Using this
assumption there are four logical cases to determine the position of the
breakpoints.

i.e. the first case would be: If both strands are positive, then the end
of the five-prime segment (end5p) is the first breakpoint and the start
of the three-prime segment is the second breakpoint

### Calling a Custom Conversion Script

Since MAVIS v3+ is run using [snakemake](https://snakemake.readthedocs.io/en/stable/) the simplest way to incorporate your custom conversion scripts is to modify the Snakefile and add them as rules.
4 changes: 2 additions & 2 deletions setup.cfg
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[metadata]
name = mavis
version = 3.0.0
version = 3.1.0
url = https://github.com/bcgsc/mavis.git
download_url = https://github.com/bcgsc/mavis/archive/v2.2.10.tar.gz
description = A Structural Variant Post-Processing Package
Expand Down Expand Up @@ -37,7 +37,7 @@ install_requires =
braceexpand==0.1.2
colour
Distance>=0.1.3
mavis_config>=1.1.0, <2.0.0
mavis_config>=1.2.2, <2.0.0
networkx>=2.5,<3
numpy>=1.13.1
pandas>=1.1, <2
Expand Down
2 changes: 1 addition & 1 deletion src/mavis/bam/read.py
Original file line number Diff line number Diff line change
Expand Up @@ -424,7 +424,7 @@ def sequenced_strand(read: pysam.AlignedSegment, strand_determining_read: int =
else:
strand = STRAND.NEG if not read.is_reverse else STRAND.POS
elif strand_determining_read == 2:
if read.is_read2:
if not read.is_read1:
strand = STRAND.NEG if read.is_reverse else STRAND.POS
else:
strand = STRAND.NEG if not read.is_reverse else STRAND.POS
Expand Down
Loading

0 comments on commit dbbd9a3

Please sign in to comment.