Smart5UTR is a novel deep generative model designed for creating N1-methyl-pseudouridine (m1Ψ) 5' UTRs in mRNA sequences. Utilizing a multi-task autoencoder framework, Smart5UTR efficiently generates 5' UTR sequences by leveraging the latent features learned from a large dataset. The performance of Smart5UTR-designed mRNA has been validated through in vitro and in vivo experiments. This powerful tool streamlines the design of m1Ψ-5' UTRs, contributing to developing more effective mRNA therapeutics.
data
: Contains the dataset files used for training and testing the models.fig_scripts
: Contains scripts for generating figures related to the project.figs
: Stores the figures generated by the scripts in thefig_scripts
folder.models
: Contains the source code for the Smart5UTR models.tutorials
: Contains tutorials and code examples for reproducing the baseline, training and testing Smart5UTR models, and predicting UTR MRL values using the trained Smart5UTR models.
The required software and packages dependencies are listed below:
joblib
Keras
matplotlib
pandas
numpy
scikit-learn
scipy
tensorflow-gpu
cudnn
We have provided the requirements.txt file for pip. You can use conda and pip to automatically prepare the environment.
git clone https://github.com/deepomicslab/Smart5UTR.git
cd Smart5UTR
conda create -n Smart5UTR python=3.8
conda activate Smart5UTR
python -m pip install -r requirements.txt
The model was developed and tested in an environment with the following specifications:
- CUDA 11.1
- cudatoolkit-11.1.74
- cuDNN 8.0.4
- TensorFlow 2.4.1
Please deploy the environment according to your hardware capabilities. If you have not yet configured the relevant GPU environment, you can install the appropriate cuDNN with the following conda command:
conda install -c nvidia cudnn=<version> ## Replace <version> with the cuDNN version that is compatible with your CUDA version.
The dataset used to train Smart5UTR could be downloaded from Google Drive. We provide Smart5UTR/tutorials/train_Smart5UTR.py
to show how to train a Smart5UTR model using the ribosome binding capacity of the 5' UTR as a label.
This dataset is from the public Gene Expression Omnibus database, accessible via accession number GSE114002, which is derived from the following publication:
Sample, P.J., Wang, B., Reid, D.W. et al. Human 5′ UTR design and variant effect prediction from a massively parallel translation assay. Nat Biotechnol 37, 803–809 (2019). https://doi.org/10.1038/s41587-019-0164-5
If you wish to use data from other sources, we recommend converting it to a similar file format and placing it in the data
directory.
The well-trained .h5
model could be downloaded from Google Drive. Please download the model and place it in the models/Smart5UTR/
directory before running the tutorial code.
To predict the MRL value of any 5' UTR using Smart5UTR, please refer to the tutorial tutorials/MRL_prediction_by_Smart5UTR.ipynb
.
To design the 5' UTR from any reference sequence, please refer to the tutorial tutorials/design_5UTR_by_Smart5UTR.ipynb
.
If you find our work useful in your research or if you use parts of this code in your own projects, please consider citing our publication:
@article{tang2023novel,
title={A novel deep generative model for mRNA vaccine development: Designing 5ʹ UTRs with N1-methyl-pseudouridine modification},
author={Tang, Xiaoshan and Huo, Miaozhe and Chen, Yuting and Huang, Hai and Qin, Shugang and Luo, Jiaqi and Qin, Zeyi and Jiang, Xin and Liu, Yongmei and Duan, Xing and others},
journal={Acta Pharmaceutica Sinica B},
year={2023},
publisher={Elsevier}
}