-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
128 lines (104 loc) · 4.28 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
---
title: "ZIPBN: Zero-Inflated Poisson Bayesian Networks for Zero-Inflated Count Data"
author: "Junsouk Choi"
output:
github_document:
toc: true
toc_depth: 4
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
The R package `ZIPBN` implements a Markov chain Monte Carlo (MCMC) to fit zero-inflated poisson Bayesian networks for zero-inflated count data such as scRNA-seq data.
It is based on my current reseach on scalable Bayesian networks (also known as directed acyclic graph) for scRNA-seq data which comprise of counts with excessive zeros.
A Bayesian network factorizes the joint distribution of random variables into a set of local distributions.
To deal with an excess of zero counts in data, we model each local distribution to be a zero-inflated poisson model.
We further formulate a hierarchical model in Bayesian perspective by specifing spike-and-slab priors for edge selection to determine graph structure.
The MCMC algorithm basically uses Metropolis-Within-Gibbs sampling to sample parameters from the posterior distributions.
## Installation
To install the latest version from Github, use
```{r install, tidy = 'formatR', eval = FALSE, echo = TRUE}
library(devtools)
devtools::install_github("junsoukchoi/ZIPBN", build_vignettes = TRUE)
```
## Usage
The following example describes how to use package `ZIPBN` to conduct analysis on zero-inflated cound data.
Function`mcmc_ZIPBN` takes zero-inflated count data with some arguments needed for MCMC sampler, and returns MCMC samples from posterior distributions of the parameters defined by ZIPBN models.
It also gives Metropolis sampling acceptance rates, which indicates whether our MCMC sampler goes well.
For more information on `ZIPBN` package, please access the package documentation or [vignettes](https://github.com/junsoukchoi/ZIPBN/blob/master/vignettes/ZIPBN.pdf).
Please feel free to contact the author.
```{r example, tidy='formatR', eval = FALSE, echo = TRUE}
library(ZIPBN)
## Example data
set.seed(7)
# generate a simple graph: X1 -> X2 -> X3
p = 3
A = matrix(0, p, p)
A[3, 2] = A[2, 1] = 1
# parameters of the ZIPBN model, given graph A
alpha = matrix(0, p, p)
alpha[A == 1] = 0.3
beta = matrix(0, p, p)
beta[A == 1] = 0.2
delta = rep(1, p)
gamma = rep(1.5, p)
# generate data from the ZIPBN model
n = 200
x = matrix(0, n, p)
for (j in 1 : p)
{
# calculate pi_j
pi = exp(x %*% alpha[j, ] + delta[j])
pi = pi / (1 + pi)
# calculate mu_j
mu = exp(x %*% beta[j, ] + gamma[j])
# generate data for X_j
x[ , j] = rpois(n, mu) * (1 - rbinom(n, 1, pi))
}
## fit ZIPBN models
# create starting value list
m = colMeans(x)
v = apply(x, 2, var)
starting = list(alpha = matrix(0, p, p),
beta = matrix(0, p, p),
delta = log((v - m) / (m * m)),
gamma = log((v - m + m * m) / m),
A = matrix(0, p, p),
tau = c(10, 10, 1, 1),
rho = 0.1)
# create tuning value list
tuning = list(phi_alpha = c(1e+8, 20),
phi_beta = c(1e+8, 100),
phi_delta = 5,
phi_gamma = 50,
phi_A = c(1e+10, 10, 10, 1, 10))
# create priors list
priors = list(nu = 10000^2,
tau_alpha = c(0.01, 0.01),
tau_beta = c(0.01, 0.01),
tau_delta = c(0.01, 0.01),
tau_gamma = c(0.01, 0.01),
rho = c(0.5, 0.5))
# run mcmc_ZIPBN function
n_sample = 2000
n_burnin = 1000
out = mcmc_ZIPBN(x, starting, tuning, priors, n_sample, n_burnin)
## posterior inference via ZIPBN models
# report Metropolis sampling acceptance percents
out$acceptance
# recover garph structure
cutoff = 0.5
A_est = 1 * (apply(out$samples$A, c(1, 2), mean) > cutoff)
# calculate the posterior mean of each parameter, given the recovered graph
subset = apply(out$samples$A == array(A_est, dim = c(p, p, n_sample - n_burnin)), 3, all)
alpha_est = apply(out$samples$alpha[ , , subset], c(1, 2), mean)
beta_est = apply(out$samples$beta[ , , subset], c(1, 2), mean)
delta_est = rowMeans(out$samples$delta[ , subset])
gamma_est = rowMeans(out$samples$gamma[ , subset])
# report the posterior mean of each parameter with the recoverd graph
A_est
round(alpha_est, digits = 2)
round(beta_est, digits = 2)
round(delta_est, digits = 2)
round(gamma_est, digits = 2)
```