SpliceNouveau

Description

SpliceNouveau is a python algorithm to help users generate vectors containing introns and splice sites. At its core, it is an 'in silico directed evolution' algorithm: based on a set of user-defined sequences (which may be amino acid and/or nucleotide sequences) and constraints (splice-site strengths and types), it attempts to identify synonymous and non-coding mutations which result in the SpliceAI splicing predictions matching those requested by the user.

Features

SpliceNouveau can generate several different types of splicing events:

Cassette exons
Alternative 3' splice sites
Alternative 5' splice sites
Intron retention

In each case, it attempts to alter the sequence to set the SpliceAI predictions at the defined splice sites to the user-requested level, while minimizing the presence of off-target splice sites which could cause mis-splicing.

Applications

In theory, SpliceNouveau can be used to create constitutively-spliced vectors. However, if the user specifies an enrichment of certain motifs that bind a given splicing regulator (or are known to influence splicing in, e.g., a tissue-specific manner) near a given splice site, then SpliceNouveau can help generate alternative spliced vectors. We have used this algorithm extensively to generate vectors which undergo alternative splicing in response to loss of TDP-43 nuclear function.

Getting Started with SpliceNouveau

SpliceNouveau is a Python algorithm designed to help users generate vectors containing introns and splice sites. It is an 'in silico directed evolution' algorithm that identifies synonymous and non-coding mutations to match the SpliceAI splicing predictions requested by the user.

Prerequisites

Python >=3.7
Required Python packages: numpy, pandas, gzip, random, argparse, os, sys, keras, tensorflow, spliceai

Installation

Clone the repository or download the source code.
Install the required Python packages using pip:

pip install numpy pandas gzip

Install SpliceAI by following the instructions on the SpliceAI github page. Note that although a CUDA-enabled GPU significantly increases performance, it is feasible to run SpliceAI/SpliceNouveau using a CPU.

Usage

The SpliceNouveau tool is executed from the command line with various arguments. Here's an example usage:

python3 SpliceNouveau.py --initial_cds ATGGCGAGAACAATGGTTGCTATGGTGTCCAAAGGTGAGGCAGTCATAAAG... \
                         --initial_intron1 GTAAGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTGTGTGTGTG... \
                         --ce_start 43 \
                         --ce_end 159 \
                         --ce_mut_chance 1 \
                         --five_utr CGGCCGCTTCTTGGTGCCAGCTTATCAT... \
                         --three_utr TGATAAACAAATGGTAAGGAAGGGCACAT... \
                         --ignore_end 470 \
                         --output mscar/aars1_inspired_closer_aim_0p5.csv \
                         --target_cryptic_donor 0.5 \
                         --target_cryptic_acc 0.5 \
                         -a 5 \
                         --intron1_mut_chance 0.5 \
                         --intron2_mut_chance 0.5 \
                         -n 2000 \
                         --cds_mut_end_trim 569

This example command specifies the initial CDS, intron sequences, cryptic exon start and end positions, UTR sequences, splice site target scores, and other parameters for the algorithm.

For a complete list of available arguments and their descriptions, run:

python3 SpliceNouveau.py -h

Note that the 'fitness scores' given to each sequence (for historic reasons) are 2 - sum(deviations from desired splicing). In the paper we ignore this 2 value offset to avoid confusion. Thus, a high-performing sequence is expected to have a score of close to 2 (not zero).

Output

The SpliceNouveau tool generates two output files:

<output_filename>.csv: Contains the attempt number, score, sequence, cryptic exon length, and frameshift information for each attempt.
<output_filename>.predictions.csv: Contains the attempt number, position, donor probability, and acceptor probability for each position in the sequence.

If the --track_splice_scores option is used, an additional file <output_filename>.tracked_scores.csv is generated, which tracks the splice scores for each iteration.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.idea		.idea
data		data
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SpliceNouveau.py		SpliceNouveau.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpliceNouveau

Description

Features

Applications

Getting Started with SpliceNouveau

Prerequisites

Installation

Usage

Output

About

Releases 1

Packages

Languages

License

frattalab/SpliceNouveau

Folders and files

Latest commit

History

Repository files navigation

SpliceNouveau

Description

Features

Applications

Getting Started with SpliceNouveau

Prerequisites

Installation

Usage

Output

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages