The impact of sex on alternative splicing

This repository documents the analysis performed for The impact of sex on alternative splicing; note that a manuscript with a modified version of the analysis has been submitted. To reproduce the analysis, users will need to go through several steps.

Get access to the Genotype-Tissue Expression (GTEx) RNAseq data (an application to dbGAP for access to the dataset phs000424.v8.v2 is required)
Align each RNAseq sample using hisat2 and create a matrix of counts for each of a variety of splicing types was generated by the rMATS. Specifically, rMATS was run as a nextflow script. The script may be modified to run on any platform, the results from this study was performed on the cloudOS/lifebit platform.
Run the Jupyter notebooks from this repository to perform the individual analyses.

This repository documents the interactive analysis for the results of running the rmats-nf pipeline.

1. Get access

The RNA-seq samples analyzed in this project are restricted access (dbGAP phs000424.v8.v2). See the database of Genotypes and Phenotypes (dbGaP) for details.

2. Processing the RNA-seq samples

See the manuscript for methods details. In brief, we ran the nextflow script at https://github.com/lifebit-ai/rmats-nf to align the RNA-seq samples with hisat2 and to characterize splicing events with rMATS. Results from individual samples are summarized in 'matrix' files. To run the Jupyter scripts in the next section, you will need to place these files in a results bucket (if you are using the cloudos system) or in some other defined location.

3. Running the notebooks

Each of the results described in the manuscript was generated by one or more Jupyter notebooks in this repository. There are a number of R packages that need to be installed prior to running the notebooks. This process is described from the cloudos environment in this document. If running the notebooks in another environment, simply run the setup scripts.

3.1 Summarizing events

Most of the notebooks require that the raw rMATS files are first processed to generate summary files. This is done by the notebook countGenesAndEvents.ipynb. Additionally, two notebooks are used to perform DGE and DAS analysis. These three notebooks should be run first.

differentialGeneExpressionAnalysis.ipynb. Perform differential gene analysis with voom.
differentialSplicingJunctionAnalysis.ipynb. Regression analysis to characterize sex-biased alternative splicing events.
countGenesAndEvents.ipynb. Set up the overall analysis. Write various files to the data subdirectory that will be used by other scripts.

The remaining notebooks can be run in any order. Most of the notebooks generate a Figure or a Table or a result that is described in the manuscript.

expressionHeatplot.ipynb. Generate a heatplot representing expression across tissues.
totalDGEByTissue.ipynb. Generate a plot representing counts of expression events across tissues.
alternativeSplicingHeatplot.ipynb. Generate a heatplot representing alternative splicing across tissues.
totalAlternativeSplicingByTissue.ipynb. Generate a plot representing counts of alternative splicing across tissues.
XchromosomalEscape.ipynb. Investigate the overlap of alternative splicing and genes on the X chromosome that escape inactivation.
splicingIndex.ipynb. Calculate the splicing index for each chromosome.
spliceTypeByChromosome.ipynb. Calculate the distribution of the 5 types of alternative splicing event analyzed in this manuscript for each chromosome.
altSplicing_events_per_gene.ipynb. Create a plot showing genes that display alternative splicing in many tissues.
tissue_piechart.ipynb. Create a piechart showing distribution of genes according to number of tissues showing differential alternative splicing.

4. Reproducibility note: How can I reproduce the Jupyter Notebooks analysis?

To facilitate reproducing the results from the secondary analysis that generates all the plots and tables of the publication, we have created a helper bash script that can be run to perform the following:

Prepare the environment by installing dependencies
Retrieve the data that we have made available via Zenodo 10.5281/zenodo.5524975
Programmatically executing all Jupyter Notebooks leveraging the papermill library.

You can find the file at ./reproduce.sh.

a) I have conda available in my system and want to reproduce the analysis

Instructions for environments with conda available

The only prerequisite in this case is a machine with conda installed.

IMPORTANT NOTE: Before executing the bash script, make sure your terminal is initialises for using conda. You can do so by running the following command, depending on you default shell:

i) for zsh

## Initialise the terminal for use of conda
conda init zsh && exec -l zsh

ii) for bash

## Initialise the terminal for use of conda
conda init bash && exec -l bash

Copy the following commands in your terminal to reproduce the Jupyter Notebooks analysis:

git clone https://github.com/TheJacksonLaboratory/sbas.git
cd sbas
git checkout adds-rendered-notebooks
conda init zsh && exec -l zsh

After this has finished, run the bash script reproduce.sh:

time bash ./reproduce.sh

b) I have docker but not conda available in my system and want to reproduce the analysis

Instructions for environments with docker but not conda available

The only prerequisite in this case is a machine with docker installed.

You can use a docker image with conda, like this one for example continuumio/miniconda3. Copy the following commands in your terminal to reproduce the Jupyter Notebooks analysis:

## use the container, mount it so tha input and output data are available in PWD
docker run -v $PWD:$PWD -w $PWD -it continuumio/miniconda3

Continue running the commands below (inside the docker container):

## Initialise the terminal for use of conda
conda init zsh && exec -l zsh

Copy the following commands in your terminal to reproduce the Jupyter Notebooks analysis:

git clone https://github.com/TheJacksonLaboratory/sbas.git
cd sbas
git checkout adds-rendered-notebooks
conda init zsh && exec -l zsh

After this has finished, run the bash script reproduce.sh:

time bash ./reproduce.sh

Name		Name	Last commit message	Last commit date
Latest commit History 569 Commits
assets		assets
data		data
dependencies		dependencies
html		html
jupyter		jupyter
pdf		pdf
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
COLLABORATING.md		COLLABORATING.md
GTEx.md		GTEx.md
README.md		README.md
environment.yml		environment.yml
mamba-notebook-packages-install.sh		mamba-notebook-packages-install.sh
mamba-r-packages-install.sh		mamba-r-packages-install.sh
reproduce.sh		reproduce.sh
run_papermill_alone.sh		run_papermill_alone.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

The impact of sex on alternative splicing

1. Get access

2. Processing the RNA-seq samples

3. Running the notebooks

3.1 Summarizing events

4. Reproducibility note: How can I reproduce the Jupyter Notebooks analysis?

a) I have conda available in my system and want to reproduce the analysis

b) I have docker but not conda available in my system and want to reproduce the analysis

About

Releases 9

Packages

Contributors 5

Languages

TheJacksonLaboratory/sbas

Folders and files

Latest commit

History

Repository files navigation

The impact of sex on alternative splicing

1. Get access

2. Processing the RNA-seq samples

3. Running the notebooks

3.1 Summarizing events

4. Reproducibility note: How can I reproduce the Jupyter Notebooks analysis?

a) I have conda available in my system and want to reproduce the analysis

b) I have docker but not conda available in my system and want to reproduce the analysis

About

Resources

Code of conduct

Stars

Watchers

Forks

Releases 9

Packages 0

Contributors 5

Languages

Packages