coconet or CoCoNet is a short name for RNA contact prediction using Coevolution and Convolutional Neural Network. It combines state-of-the-art DCA algorithms and a shallow convolutional neural network co enhance RNA contact prediction from multiple sequence alignments of homologous RNAs. It is implemented in Python and requires Python version 3.5 or later versions.
coconet uses pydca to perform computations on the coevolutionary layer. You need to install the most recent version (i.e., version 1.23 ) of pydca. By default the command
pip install pydca
installs the required version.
The package can be manually downloaded or cloned using the command
git clone https://github.com/KIT-MBS/coconet
Once coconet is downloaded change to the directory containing file
setup.py
and execute on the command line
python -m coconet.main <msa_file> --verbose
where <msa_file>
denotes FASTA formatted multiple sequence alignment (MSA) file of an
RNA. Note that the first sequence in the MSA file should be the target/reference sequence.
The optional argument --verbose
allows logging
messages printed on the screen.
By default coconet uses a single 3x3 matrix. However, its possible to specify
the matrix size on the command line using the optional argument msize
as follows.
python -m coconet.main <msa_file> --msize 5 --verbose
The allowed values of msize
are 3, 5, and 7.
In addition, coconet can use two matrices: one for Watson-Crick nucleotide
pairs and the other for non-Watson-Crick ones. This can be achieved using the
optional argument --wc_and_nwc
. For example.
python -m coconet.main <msa_file> --msize 7 --wc_and_nwc --verbose
The above command executes coconet using two 7x7 matrices.
In addition, convolution can be performed on top of plmDCA. To enable this feature, use the --on_plm
optional argument.
Example:
python -m coconet.main <msa_file> --on_plm --num_threads 2 --max_iterations 5000 --verbose
The optional arguments --num_threads
and --max_iterations
control the numbers of threads used (if OpenMP is supported) and
gradient decent iterations, respectively.
Finally, help messages can be prited out on the screen when the command
python -m coconet.main
is executed, i.e., by running the coconet.main
module without any additional input from
the command line.
Also, the network can be trained on the dataset using a five-fold cross validation procedure. For example, the command
python -m coconet.train run --msize 5 --verbose
trains the network using a 5x5 weight matrix using mean-field DCA as a coevolutionary layer. If plmDCA is desired, the --on_plm
optional argument can be provided, for instance as
python -m coconet.train run --msize 7 --on_plm --num_threads 4 --verbose
To see the available arguments to train the network, run the command
python -m coconet.train
Also, a precomputed co-evolutionary data for the RNA dataset and testset using CoCoNet and DCA-based algorithms is available in the directory RAW_COEV_DATA_ALL
. The average positive predictive values (PPV) from this data, e.g., for the RNA dataset
CoCoNet
cross-validation and DCA-based methods, can be computed using
python -m coconet.ppv compute --verbose
This command computes average PPV at rank L
(length of RNAs sequence). More information about computing PPV from
raw co-evolutionary data can be obtained by running the help command as
python -m coconet.ppv --help