Compressed Transcriptomics

This repository open sources code and model weights for automated optimum compression of brain transcriptomic data using deep autoencoding, as detailed in our article.

What is this repository for?

The architecture of the brain is too complex to be intuitively surveyable without the use of compressed representations that project its variation into a compact, navigable space.
The task is especially challenging with high-dimensional data, such as transcriptomic / gene expression data (for example provided by the Allen Brain Atlas) where the joint complexity of anatomical and transcriptional patterns demands maximum compression.
The current established practice is to use standard principal component analysis (PCA), which comes at the cost of limited expressibility, which can impact quality of research findings.
As an alternative, we here provide the code and model weights for a series of trained deep auto-encoders.

Why should I use this model?

Expressivity

Even when reducing 15,633 gene expression values into merely 2 components, note the intricate variational structure across auto-encoded (AE) representations, especially in comparison to methods such as PCA, kPCA, or NMF.

Accuracy

Across a number of possible latent dimensionalities, the auto-encoder achieves superior accuracy in reconstructing original source transcriptomic data in comparison to UMAP, PCA, NMF, and kPCA.

N.B. source reconstruction evaluation is presently not possible using t-SNE.

Downstream inference

Deep auto-encoders yield superior predictive utility across signalling, microstructural, and metabolic targets in applied machine prediction using transcriptomic data.

Usage instructions

In the uploaded Jupyter Notebook we provide a tutorial that:

1. Overviews running inference to compress transcriptomic data into an auto-encoded latent space of 2,4,8,16,32,64, or 128 dimensions.
1. Provides the code and architecture to retrain the a auto-encoder for use with other data.

The model weights are open-sourced here.

The NIFTI images depicting the two component latents of all models in both 4mm and 8mm space are open-sourced here.

Use queries

Via github issue log or email to j.ruffle@ucl.ac.uk.

Citation

If using these works, please cite the following paper:

Compressed representation of brain genetic transcription. James K Ruffle, Henry Watkins, Robert J Gray, Harpreet Hyare, Michel Thiebaut de Schotten, Parashkev Nachev. Human Brain Mapping, 2024. DOI https://doi.org/10.1002/hbm.26795.

Funding

The Medical Research Council; The Wellcome Trust; UCLH NIHR Biomedical Research Centre; Guarantors of Brain; British Society of Neuroradiology.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
model_weights		model_weights
niftis		niftis
LICENSE		LICENSE
README.md		README.md
Usage_Tutorial.ipynb		Usage_Tutorial.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Compressed Transcriptomics

Table of Contents

What is this repository for?

Why should I use this model?

Expressivity

Accuracy

Downstream inference

Usage instructions

Use queries

Citation

Funding

About

Releases

Packages

Languages

License

high-dimensional/compressed-transcriptomics

Folders and files

Latest commit

History

Repository files navigation

Compressed Transcriptomics

Table of Contents

What is this repository for?

Why should I use this model?

Expressivity

Accuracy

Downstream inference

Usage instructions

Use queries

Citation

Funding

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages