Predict RNA-binding proteins from amino acid sequences using string kernel SVMs.
TriPepSVM was developed by the Marsico RNA bioinformatics group at the Max-Planck-Institute for Molecular Genetics in Berlin.
- Unix system
- R (>= 3.2.0)
- HMMER (3.1)
- CDHIT (4.6.4)
- Python (3 and higher) with the following packages: Pandas and BioServices
You can install python packages via pip
. If you don't have sudo rights, you might want to use the --user
option of pip
:
pip install --user bioservices pandas
-
If TriPepSVM is applied to a new taxon id, you need a stable internet connection
-
Please change the PATH system variable:
- Edit the startup file (~/.bashrc)
- Modify PATH variable
- Save and close the file
For example (please adjust your path):
export PATH=$PATH:/home/Programms/cdhit-4.6.4
export PATH=$PATH:/home/Programms/hmmer-3.1b2-linux-intel-x86_64/binaries
./TriPepSVM.sh [OPTION] ... -i INPUT.[fasta|fa]
-i, --input [INPUT.fasta|fa]: AA sequence in fasta format, NO DEFAULT
-o, --output : path to output folder, DEFAULT: current directory
-id, --taxon-id [9606|590|...] : Uniprot taxon id, DEFAULT: 9606 (human)
-c, --cost : change COST parameter, DEFAULT: 1
-k, --oligo-length : change k parameter, DEFAULT: 3
-pos, --pos-class : change positive class weight, DEFAULT: inverse proportional to class size
-neg, --neg-class : change negative class weight, DEFAULT: inverse proportional to class size
-thr, --threshold : change prediction threshold, DEFAULT: 0
-r, --recursive [TRUE|FALSE]: apply recursive mode, DEFAULT: FALSE
-h, --help : help text
Example 1: Salmonella
./TriPepSVM.sh -i salmonellaProteom.fasta -o Results/ -id 590 -r True -posW 1.8 -negW 0.2 -thr 0.68
Example 2: Human
./TriPepSVM.sh -i humanProteom.fasta -o Results/ -id 9606 -posW 1.8 -negW 0.2 -thr 0.28
Result folder contains two files:
-
nameInputFile.TriPepSVM.pred.txt: Main output file containing prediction for the input file
- Identifier
- SVM score
- Classification
sp|P0CL07|GSA_SALTY -0.664768610799015 Non RNA-binding protein sp|O68838|GSH1_SALTY -0.592678648819721 Non RNA-binding protein sp|P43666|EPTB_SALTY -0.443698432714576 Non RNA-binding protein sp|P36555|EPTA_SALTY -0.303451909779383 Non RNA-binding protein ...
-
nameInputFile.featureWeights.txt: Feature weights used by SVM classifier
- Feature (tri-peptide sequences)
- Feature weight
AAA 0.518691300046882 AAC 0.10328499221261 AAD 0.0894537449099789 AAE -0.0464292430990747 ...
- Annkatrin Bressin - bressin
- Roman Schulte-Sasse - Schulte-Sasse