CNNSplice: Robust Models for Splice Site Prediction Using Deep Convolutional Neural Networks
OluwadareLab, University of Colorado, Colorado Springs
Access Web Server: http://www.cnnsplice.online
Developers:
Algorithm and Model:
Victor Akpokiro
Department of Computer Science
University of Colorado, Colorado Springs
Email: vakpokir@uccs.edu
Web Server:
M. A. Mohit Chowdhury (hchowdhu@uccs.edu), Samuel Olowofila(solowofi@uccs.edu) and Raisa Nusrat (rnusrat@uccs.edu)
Contact:
Oluwatosin Oluwadare, PhD
Department of Computer Science
University of Colorado, Colorado Springs
Email: ooluwada@uccs.edu
CNNSplice can be run in a Docker-containerized environment locally on users computer. Before cloning this repository and attempting to build, the Docker engine, If you are new to docker here is a quick docker tutorial for beginners.
To install and build TADMaster follow these steps.
- Clone this repository locally using the command
git clone https://github.com/OluwadareLab/CNNSplice.git
. - Pull the CNNSplice docker image from docker hub using the command
docker pull oluwadarelab/cnnsplice:latest
. This may take a few minutes. Once finished, check that the image was sucessfully pulled usingdocker image ls
. - Run the CNNSplice container and mount the present working directory to the container using
docker run -v ${PWD}:${PWD} -p 8050:8050 -it oluwadarelab/cnnsplice
. cd
to your file directory.
Exciting! You can now access CNNSplice locally.
Skip this step if you followed the Docker instruction Above
CNNSplice is developed in Python3. All dependencies are included in the Docker environment. We have attached the requirement file for the list of dependencies. For local install of dependencies from the requirement.txt file for virtual environment usage, use command pip install -r requirement.txt
from the current working directory.
- Our constructed dataset permits a Sequence Length of 400
Usage: To train, type in the terminal python train.py -n "model_name" -m mode
For Example: python train.py -n "output_name" -m "balanced"
-
Arguments:
- output_name: A user specified string for output naming convention
- mode: A string to specify either balanced or imbalanced input dataset, i.e ("balanced" or "imbalanced")
- output_name: A user specified string for output naming convention
-
Outputs:
The outputs of training includes:- .h5: The deepslicer model file.
- .txt: The output files (.txt) containig the evaluation metrics results is stored in the log directory.
For Testing, use python test.py -n "output_name" -m mode("balanced" or "imbalanced")
For Example: python test.py -n "output_name" -m "balanced"
-
Arguments:
- output_name: A user specified string for output naming convention
- mode: A string to specify either balanced or imbalanced input dataset, i.e ("balanced" or "imbalanced")
- output_name: A user specified string for output naming convention
-
Outputs:
The outputs of testing includes:- .txt: The output files (.txt) containig the evaluation metrics results is stored in the log directory.
- Dataset sequence length is 400.
- Ensure you have a log directory for text file storage
- Genomic sequence input data should be transfomed using one-hot encoding.