This is a Python program that is able to rank biochemical models given a set of experiments. These models should represent changes of concentrations of species of a signaling pathway through a set of ordinary differential equations. The score is given is an estimate of the probability of the observed data being generated by the model, . This estimate is given in a Bayesian approach, and it is based on the work of Tian-Rui Xu et. al in "Inferring Signaling Pathway Topologies from Multiple Perturbation Measurements of Specific Biochemical Species".
The main result used to estimate this probability is the follwing thermodynamic integral:
where is the data, is the model, are model parameters, and is a parameter that defines a power-posterior distribution (an intermediate distribution between prior and posterior distribution). To estimate this integral, we need samples from (power-posterior t of theta) and to do so, we use different variations of the Metropolis-Hastings algorithm.
To estimate the value of the power-posterior distribution we should sample from multiple power-posteriors distributions. To do so, we use the Metropolis-Hastings algorithm with multivariate truncated normal jump distributions. We use three different steps of sampling. On the first step, the proposal distribution used has a diagonal covariance matrix, i.e. the jumps of parameters are proposed independently. On the second and third steps, the covariance of the jump distribution is set as an estimate of the covariance of the current sample of the power-posterior. On the first two steps, the sampling is independent for each power-posterior, however, on the last step we use Populational Monte Carlo Markov Chain, which allow us to mix samples of different power-posteriors.
To run SigNetMS, you will need to provide to the program some arguments that are related to the problem instance and some that are related to the sampling algorithms used to calculate the desired estimate.
SigNetMS.py [-h] [--verbose [VERBOSE]] [--n_process [N_PROCESS]] model priors experiment first_sampling_iterations sigma_update_n second_sampling_iterations third_sampling_iterations
The arguments related to the problem instance are:
model
- an SBML file with defined kinetic laws;priors
- an XML file that with the prior distribution of the model parameters;experiment
- an XML file with the experiments observations. Some examples of these files are provided in theinput
folder.
The arguments related to the sampling algorithms are:
first_sampling_iterations
- number of iterations on the first sampling step;sigma_update_n
- number of iterations between updates of covariance matrix on the first step;second_sampling_iterations
- number of iterations on the second sampling step;third_sampling_iterations
- number of iterations on the third sampling step;
The program also has optinal arguments.
--n_process
the number of process to be used when sampling from the first and second steps.--verbose
if you'd like a verbose run.--help
if you need help.
The example we call bioinformatics is a model selection input presented on the work of Vyshemirsky and Girolami (2007). It is composed of four models. The model model1.xml
, with parameters k1 = 0.07
, k2 = 0.6
, k3 = 0.05
, k4 = 0.3
, V = 0.017
, Km = 0.3
, is used to generate a simulation to which a gaussian error is added, generating 3 observations with noise, presented on experiment.data
file. These observation are used as experimental data for model ranking. Three other models are present in this example, model2.xml
, model3.xml
and model4.xml
; respectively, they represent a simplified version of model1, a version of model1 missing an "important" interaction, and a more complex version of model1. The prior distribution of parameters is available on model.priors
file.
A sample command to run SigNetMS on bioinformatics
dataset:
python SigNetMS.py input/bioinformatics/model[1-4].xml input/bioinformatics/model.priors input/bioinformatics/experiment.data 10000 1000 5000 5000
This example was created by Gustavo Estrela and Marcelo Reis and it is also composed of four models. Model model1.xml
was used with parameters k1 = 1.7e−4
, k2 = 0.4
, kcat3 = 2
, K3m = 1.43e3
,
V4 = 1
and K4m = 1.07e2
to generate 3 artificial observations of a pathway. Three other models are present in this example, model2.xml
, model3.xml
and model4.xml
; respectively, they represent a model that is a simplification of model1, a model that is a more complex version of model1, and a copy of model1 that has two interactions inverted. The prior distribution for each model is available on model*_gamma.priors
file.
A sample command to run SigNetMS on smallest
dataset:
python SigNetMS.py input/smallest/model[1-4].xml input/smallest/model[1-4]_gamma.priors input/smallest/experiment.data 10000 1000 5000 5000
We used the standard testing framework, unittest
. To run the tests, you should first change directories to the test
directory and then type commands such as:
python -m unittest
- to run all tests;python -m unittest discover test_distributions
- to run all tests on thetest_distributions
directory.
SigNetMS is meant to be run with python versions above 3. A list of python packages necessary to use SigNetMS is available on requirements.txt
. The simplest way to install the necessary packages is using pip (version 3 or above):
pip install -r requirements.txt --user
.