This is a reference implementation of veRNAl
, an algorithm for identifying fuzzy recurrent subgraphs in RNA 3D networks.
Please cite:
@article{vernal,
author = {Oliver, Carlos and Mallet, Vincent and Philippopoulos, Pericles and Hamilton, William L and Waldispühl, Jérôme},
title = "{VeRNAl: A Tool for Mining Fuzzy Network Motifs in RNA}",
journal = {Bioinformatics},
year = {2021},
month = {11},
issn = {1367-4803},
doi = {10.1093/bioinformatics/btab768},
url = {https://doi.org/10.1093/bioinformatics/btab768},
note = {btab768},
eprint = {https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btab768/41153095/btab768.pdf},
}
See full paper for complete description of the algorithm.
You can browse the results from an already trained model here.
This repository has three main components:
- Preparing Data
/prepare_data
- Subgraph Embeddings
/train_embeddings
- Motif Building
/build_motifs
Each subdirectory contrains a main.py
file which controls the behaviour of that stage.
For full usage, run python <dir>/main.py -h
The command below will install the full list of dependencies.
The main packages we use are:
- multiset
- NetworkX
- BioPython
- Pytorch
- DGL (Deep Graph Library)
- Scikit-learn
conda env create -f environment.yml
conda activate vernal
This step loads the whole PDBs, creates uniformly-sized chunks (chopper.py
) and builds
newtorkx graphs for each chunk.
We build a rooted subgraph and graphlet hashtable for each node in annotate.py
to
speed up the similarity function computations at training time.
Create two directories where the data will be kept:
mkdir data/graphs
mkdir data/annotated
Data building and loading will take some time (~1 hr), you can skip all the data preparation if you want to use a pre-built dataset, just download and move to the data/annotated/
folder and move to step 2.
Download RNA networks:
Save the crystal structures (first link) to the data/
folder.
Save the whole graphs (second link) to the data/graphs
folder.
Bulid the dataset. This will take some time as it involves loading many large PDB files.
python prepare_data/main.py -n <data-id>
Once the training data is built, we train the RGCN.
python train_embedding/main.py train -n my_model
Finally, the trained RGCN and the whole graphs are used to build motifs.
Here, you have three options:
- Build/load a new meta graph
- Use a meta graph to build motifs
- Use a meta graph to search for matches to a graph query
To build a new meta graph:
If this is the first time you build a meta-graph, create the following folder:
mkdir results/mggs
python build_motifs/main.py -r my_model --mgg_name my_metagraph
The new meta-graph will be built and dumped in the folder results/mggs/my_metagraph.p