This is the code repository for the paper "Gene Essentiality Prediction Using Topological Features From Metabolic Networks" presented at 7th Brazilian Conference on Intelligent Systems (BRACIS) in São Paulo, Brazil.
In this work, we evaluate the influence and the contribution of network topological features for the essencial gene prediction task in metabolic networks.
Selected organisms:
- Escherichia coli
- Mycoplasma genitalium
- Pseudomonas aeruginosa
- Saccharomyces cerevisiae
All metabolic networks were construted from the Kyoto Encyclopedia of Genes and Genomes (KEGG) metabolic pathway database, while essentiality information for each organism was collected from diferent sources:
- Profiling of Escherichia coli Chromosome Database (PEC) for E. coli
- Database of Essential Genes (DEG) for M. genitalium and P. aerguinosa
- Saccharomyces Genome Deletion Project for S. cerevisiae.
DataVisualization.ipynb contains how data was collected for both networks' construction and gene essentiality information, and how these data are integrated in a graph (for each organism) to be used later in the feature extraction step. Some graph information is also avaliable for all organisms, including its visualization according to node essentiality and frequency.
PreProcessing.ipynb shows feature extraction is performed from the constructed graphs, using several topology features from different domains, to be later discussed.
MachineLearningApproach.ipynb displays both experiments scenarios: pairwise and leave-one-out, including its results (assessed by ROC curve) and runtime, as well as the selected classifiers.
We have opted to build graphs using an undirected representation, increasing the number of potential topology features to be selected into the model, including:
In graph theory, centrality is a measure that evaluates how important a node is in a network, according to different criteria depending on how importance is characterized in the centrality. Graph centrality measures are commonly used in gene essentiality prediction in different biological networks, the most common ones being:
- Degree centrality, the number of neighbors a node has More info
- Betweeness centrality, the number of shortest paths that pass through a node More info
- Closeness centrality, the average length of the shortest paths between a node and all other nodes More info
- Eigenvector centrality, the sum of the centrality values of the node neighbors More info
In addition to these measures, we decided to include other four graph centralities:
- Load centrality, a variant of betweeness centrality More info
- Local reaching centrality, the number of nodes that can be reached from a node More info
- Harmonic centrality, the harmonic mean of the shortest paths between a node and all other nodes More info
- Subgraph centrality, the number of subgraphs in the graph a node is part of More info
Besides centrality, another topology measure is widely used in essentiality prediction in biological networks:
- Clustering coefficient, the amount of triplets/triangles (nodes connected by two or three undirected links) in the graph a node is part of More info
Since this measure is closely related to social network analysis and information retrieval, we decided to use two link analysis measures from this domain:
- HITS algorithm, which estimates a node value based on incoming and outgoing link scores More info
- PageRank, which ranks nodes based on the quality of incoming links More info
Two more metrics were implemented, the first one from graph theory:
- Length of a random maximal independent (stable) set, an independent set that is not a subset of any other independent set
And the last one designed specifically for gene essenciality prediction in metabolic networks:
- Damage, proposed by Lemke et al. 2004
J. S. Nagai, H. Sousa, A. H. Aono, A. C. Lorena and R. M. Kuroshu, "Gene essentiality prediction using topological features from metabolic networks", 2018 7th Brazilian Conference on Intelligent Systems (BRACIS), Oct. 2018.
https://github.com/jsnagai/Gene-Essentiallity-Project-A-topological-approach