sRNARFTarget: A machine learning-based approach for fast sRNA Target Prediction

Introduction

This repository contains the nextflow pipeline sRNARFTarget to obtain transcriptome-wide bacterial sRNA target predictions, and all the code, data, results and supplementary files related to the sRNARFTarget's manuscript. In the text below, we provide instructions to run sRNARFTarget.

Requirements

**Requirements for running sRNARFTarget with our Python 3.8 docker container:

**Requirements for running sRNARFTarget with Python installed locally:

Nextflow v21.04.1
Python3 (tested on 3.8.10)
Python modules: pickle, biopython v1.79, pprint, scikit-bio v0.5.6, itertools, scikit-learn v0.24.1, pandas v1.2.1, numpy v1.19.5.

The following modules are additionally required for running sRNARFTarget_SHAP and sRNARFTarget_CP

shap v0.39, pyCeterisParibus v0.5.2 and matplotlib v3.3.4

These modules can be installed using pip.

Instructions to run sRNARFTarget

Clone the repository. For instructions about how to clone GitHub repositories see this.
Create the fasta files with the sRNA and mRNA nucleotide (i.e, A,C,G,T) sequences in the folder containing the sRNARFTarget.nf script (referred from now on as sRNARFTarget folder/directory).
Go to the sRNARFTarget folder so that it is the current working directory.
OPTION A (with our Python 3.8 docker container, recommended). After pulling the docker container, type the command below to run sRNARFTarget replacing sRNA.fasta (--s parameter) and mRNA.fasta (--m parameter) with the corresponding filenames of the fasta files containing the sRNAs and mRNAs sequences, respectively. Both files should be located in the sRNARFTarget directory.

 nextflow run sRNARFTarget.nf --s sRNA.fasta --m mRNA.fasta -with-docker penacastillolab/python38env

OPTION B (with Python installed locally). Type the command below to run sRNARFTarget replacing sRNA.fasta (--s parameter) and mRNA.fasta (--m parameter) with the corresponding filenames of the fasta files containing the sRNAs and mRNAs sequences, respectively. Both files should be located in the sRNARFTarget directory.

 nextflow run sRNARFTarget.nf --s sRNA.fasta --m mRNA.fasta

Creation of all possible sRNA-mRNA pairs

sRNARFTarget creates all possible pairs from the input sRNA and mRNA sequences. Each sRNA is paired with all mRNAs. For example, if the input sRNA file has 5 sRNA sequences and mRNA file has 9 mRNA sequences, then it will create 45 sRNA-mRNA pairs, 9 pairs per sRNA.

sRNARFTarget Results

On your terminal, you should see something like this after sRNARFTarget's execution:

N E X T F L O W  ~  version 21.04.1
Launching `sRNARFTarget.nf` [gloomy_easley] - revision: 273666007b
executor >  local (5)
[d0/aab5ef] process > createAllPossiblePairs              [100%] 1 of 1 ✔
[08/c1a4ba] process > getsRNATrinucleotidesFrequncies (1) [100%] 1 of 1 ✔
[c8/e2134c] process > getmRNATrinucleotidesFrequncies (1) [100%] 1 of 1 ✔
[eb/0b3d90] process > runRandomForestModel (1)            [100%] 1 of 1 ✔
[c8/b7f154] process > generateSortedResultFile (1)        [100%] 1 of 1 ✔

Pipeline execution summary
---------------------------
Run as : nextflow run sRNARFTarget.nf --s Multocida_sRNA_gcvb.fasta --m Multocida_mRNA.fasta
Completed at: 2021-06-07T14:28:19.852-02:30
Duration : 40.7s
Success : true
workDir : Afolder/sRNARFTarget/work
exit status : 0

sRNARFTarget's output files are saved in the folder 'sRNARFTargetResult' which is created in the working directory. This folder will contain two files: Prediction_probabilities.csv and FeatureFile.csv.

Prediction_probabilities.csv: this is the main result file and contains results sorted by predicted interaction probability from high to low, rounded to five decimals. It contains three columns, sRNA_ID, mRNA_ID and Prediction_Probability. Here are some lines of a Prediction_probabilities.csv file generated:

sRNA_ID mRNA_ID Prediction_Probability
gcvb    PM0494(+)       0.57444
gcvb    PM_RS03970(-)   0.55257
gcvb    PM_RS00560(-)   0.55193
gcvb    PM_RS06810(-)   0.54968
gcvb    PM_RS02870(+)   0.54926
gcvb    PM_RS00565(-)   0.54756

FeatureFile.csv: this file contains features for all the sRNA-mRNA pairs. This file consists of 66 columns. The first two columns are sRNA_ID and mRNA_ID. The remaining 64 columns are corresponding trinucleotide frequency difference of sRNA-mRNA pairs. This file is later used by sRNARFTarget interpretability scripts.

Interpretation of sRNARFTarget Predictions

We created two python scripts for understanding the predictions generated by sRNARFTarget: sRNARFTarget_SHAP.py and sRNARFTarget_CP.py
You need to run sRNARFTarget first so that the Prediction_probabilities.csv and FeatureFile.csv files are generated.

Instructions to run sRNARFTarget_SHAP

Choose an sRNA-mRNA pair of interest from Prediction_probabilities.csv file.
OPTION A (with our Python 3.8 docker container). After pulling the docker container, run the docker container to execute sRNARFTarget_SHAP.py as shown below. The -v command indicates to docker that the folder "/ABSOLUTE_PATH_TO/sRNARFTarget" will be referred to as "/data" in the command.

docker run -i -v /ABSOLUTE_PATH_TO/sRNARFTarget:/data --rm penacastillolab/python38env python /data/sRNARFTarget_SHAP.py '/data/sRNARFTargetResult/' 'gcvb' 'PM0494(+)'

OPTION B (with Python installed locally). Run sRNARFTarget_SHAP using the below command.

 python sRNARFTarget_SHAP.py 'PATH_TO_FeatureTable.csv' 'sRNA_ID' 'mRNA_ID'

Example usage: python sRNARFTarget_SHAP.py 'sRNARFTargetResult/' 'omrA' 'ompT'

Make sure to use single quotations around the IDs and write the IDs exactly as they appear in the Prediction probabilities.csv.

sRNARFTarget_SHAP will create a decisionPlot.pdf file and ForcePlot.html file

Instructions to run sRNARFTarget_CP

For the same sRNA-mRNA pair that was chosen to run sRNARFTarget_SHAP, choose a feature/variable by looking at the plots generated by sRNARFTarget_SHAP or any one variable of interest.
OPTION A (with our Python 3.8 docker container). After pulling the docker container, run the docker container to execute sRNARFTarget_CP as shown below. The -v command indicates to docker that the folder "/ABSOLUTE_PATH_TO/sRNARFTarget" will be referred to as "/data" in the command.

docker run -i -v /ABSOLUTE_PATH_TO/sRNARFTarget:/data --rm penacastillolab/python38env python /data/sRNARFTarget_CP.py '/data/sRNARFTargetResult/' 'gcvb' 'PM0494(+)' 'TTA'

OPTION B (with Python installed locally). Run sRNARFTarget_CP using the below command.

python sRNARFTarget_CP.py 'PATH_TO_FeatureTable.csv' 'sRNA_ID' 'mRNA_ID' 'feature_name'

Example usage: python sRNARFTarget_CP.py 'sRNARFTargetResult/' 'omrA' 'ompT' 'GCG'

Make sure to use single quotations around each parameter, and write the sRNA and mRNA ID exactly as they appear in the Prediction probabilities.csv file.

sRNARFTarget_CP.py will create a directory called _plots_files and will open the file plots0.html automatically in the default web browser. This file contains an interactive plot showing the predicted interaction probability as a function of the value of the feature provided.

Notes

sRNARFTarget can be run for any number of sRNAs and mRNAs at a time.
sRNARFTarget_SHAP program can only be run for one sRNA-mRNA pair at a time.
sRNARFTarget_CP can only be executed for single sRNA-mRNA pair and a single feature/variable at a time.
Make sure the folder provided to sRNARFTarget_CP.py and sRNARFTarget_SHAP.py as their first argument contains the files Prediction_probabilities.csv and FeatureFile.csv generated by sRNARFTarget.nf.

Citation

If you use this software please cite:

Kratika Naskulwar & Lourdes Peña-Castillo (2022) sRNARFTarget: a fast machine-learning-based approach for transcriptome-wide sRNA target prediction, RNA Biology, 19:1, 44-54, DOI: 10.1080/15476286.2021.2012058

Name		Name	Last commit message	Last commit date
Latest commit History 134 Commits
Codes		Codes
Data		Data
Figures		Figures
Instructions		Instructions
PickledModelData		PickledModelData
LICENSE		LICENSE
README.md		README.md
sRNARFTarget.nf		sRNARFTarget.nf
sRNARFTarget_CP.py		sRNARFTarget_CP.py
sRNARFTarget_SHAP.py		sRNARFTarget_SHAP.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sRNARFTarget: A machine learning-based approach for fast sRNA Target Prediction

Introduction

Requirements

Instructions to run sRNARFTarget

Creation of all possible sRNA-mRNA pairs

sRNARFTarget Results

Interpretation of sRNARFTarget Predictions

Instructions to run sRNARFTarget_SHAP

Instructions to run sRNARFTarget_CP

Notes

Citation

About

Releases

Packages

Contributors 3

Languages

License

BioinformaticsLabAtMUN/sRNARFTarget

Folders and files

Latest commit

History

Repository files navigation

sRNARFTarget: A machine learning-based approach for fast sRNA Target Prediction

Introduction

Requirements

Instructions to run sRNARFTarget

Creation of all possible sRNA-mRNA pairs

sRNARFTarget Results

Interpretation of sRNARFTarget Predictions

Instructions to run sRNARFTarget_SHAP

Instructions to run sRNARFTarget_CP

Notes

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages