Skip to content

junsoukchoi/project_stat_689_602

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ZIPBN: Zero-Inflated Poisson Bayesian Networks for Zero-Inflated Count Data

Junsouk Choi

The R package ZIPBN implements a Markov chain Monte Carlo (MCMC) to fit zero-inflated poisson Bayesian networks for zero-inflated count data such as scRNA-seq data. It is based on my current reseach on scalable Bayesian networks (also known as directed acyclic graph) for scRNA-seq data which comprise of counts with excessive zeros. A Bayesian network factorizes the joint distribution of random variables into a set of local distributions. To deal with an excess of zero counts in data, we model each local distribution to be a zero-inflated poisson model. We further formulate a hierarchical model in Bayesian perspective by specifing spike-and-slab priors for edge selection to determine graph structure. The MCMC algorithm basically uses Metropolis-Within-Gibbs sampling to sample parameters from the posterior distributions.

Installation

To install the latest version from Github, use

library(devtools)
devtools::install_github("junsoukchoi/ZIPBN", build_vignettes = TRUE)

Usage

The following example describes how to use package ZIPBN to conduct analysis on zero-inflated cound data. Functionmcmc_ZIPBN takes zero-inflated count data with some arguments needed for MCMC sampler, and returns MCMC samples from posterior distributions of the parameters defined by ZIPBN models. It also gives Metropolis sampling acceptance rates, which indicates whether our MCMC sampler goes well. For more information on ZIPBN package, please access the package documentation or vignettes. Please feel free to contact the author.

library(ZIPBN)


## Example data
set.seed(7)

# generate a simple graph: X1 -> X2 -> X3
p = 3
A = matrix(0, p, p)
A[3, 2] = A[2, 1] = 1

# parameters of the ZIPBN model, given graph A
alpha = matrix(0, p, p)
alpha[A == 1] = 0.3
beta = matrix(0, p, p)
beta[A == 1] = 0.2
delta = rep(1, p)
gamma = rep(1.5, p)

# generate data from the ZIPBN model
n = 200
x = matrix(0, n, p)
for (j in 1:p) {
    # calculate pi_j
    pi = exp(x %*% alpha[j, ] + delta[j])
    pi = pi/(1 + pi)
    # calculate mu_j
    mu = exp(x %*% beta[j, ] + gamma[j])
    # generate data for X_j
    x[, j] = rpois(n, mu) * (1 - rbinom(n, 1, pi))
}


## fit ZIPBN models create starting value list
m = colMeans(x)
v = apply(x, 2, var)
starting = list(alpha = matrix(0, p, p), beta = matrix(0, p, p), delta = log((v - 
    m)/(m * m)), gamma = log((v - m + m * m)/m), A = matrix(0, p, p), tau = c(10, 
    10, 1, 1), rho = 0.1)

# create tuning value list
tuning = list(phi_alpha = c(1e+08, 20), phi_beta = c(1e+08, 100), phi_delta = 5, 
    phi_gamma = 50, phi_A = c(1e+10, 10, 10, 1, 10))

# create priors list
priors = list(nu = 10000^2, tau_alpha = c(0.01, 0.01), tau_beta = c(0.01, 0.01), 
    tau_delta = c(0.01, 0.01), tau_gamma = c(0.01, 0.01), rho = c(0.5, 0.5))

# run mcmc_ZIPBN function
n_sample = 2000
n_burnin = 1000
out = mcmc_ZIPBN(x, starting, tuning, priors, n_sample, n_burnin)


## posterior inference via ZIPBN models report Metropolis sampling acceptance
## percents
out$acceptance

# recover garph structure
cutoff = 0.5
A_est = 1 * (apply(out$samples$A, c(1, 2), mean) > cutoff)

# calculate the posterior mean of each parameter, given the recovered graph
subset = apply(out$samples$A == array(A_est, dim = c(p, p, n_sample - n_burnin)), 
    3, all)
alpha_est = apply(out$samples$alpha[, , subset], c(1, 2), mean)
beta_est = apply(out$samples$beta[, , subset], c(1, 2), mean)
delta_est = rowMeans(out$samples$delta[, subset])
gamma_est = rowMeans(out$samples$gamma[, subset])

# report the posterior mean of each parameter with the recoverd graph
A_est
round(alpha_est, digits = 2)
round(beta_est, digits = 2)
round(delta_est, digits = 2)
round(gamma_est, digits = 2)

About

Final project for STAT 689-602

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published