Reference implementation of the deconfounder

This folder contains the code for:

the empirical study about smoking (Section 6.1)
the empirical study about genome-wide association studies (GWAS) (Section 6.2)
the empirical study about movies (Section 6.3)

Acknowledgments

We thank Justin Grimmer, Dean Knox, and Brandon Stewart for pointing out concerns with a previous version of the code.

Smoking study:
- There was a bug with how the code handled heldout data in the Stan file for fitting factor models. This bug has been fixed.
- Regarding the definition of bias, variance, and MSE metrics: We now have added an explanation of these metrics in utils.R. These metrics are defined as average per-simulation posterior bias/variance/MSE, following Korattikara et al. (2014); Chen et al. (2015); Gustafson (2015).
- Regarding the number of posterior samples drawn to compute the Monte Carlo estimate of posterior bias/variance/MSE: We have increased the default number of posterior samples to 10. (This number can be set as a hyperparameter and further increased.)
Movie study:
- Grimmer et al. (2020) points out that conditioning on observed covariates can be important for causal estimation. Using the movie data, they illustrate how the deconfounder without conditioning on observed covariates can produce unreasonable causal estimates.
  
  The exploratory results on actors in Wang and Blei (2019) do not condition on covariates. The "intervened test set" analyses in the supplement of Wang and Blei (2019) includes such conditioning.
  
  For users that are interested in exploring different analyses, the causal estimation section of movie_py/movie-actor-py2-causalest.ipynb includes a block of code that conditions on observed covariates in addition to the substitute confounder.

References

Chen, C., Ding, N., & Carin, L. (2015). On the convergence of stochastic gradient MCMC algorithms with high-order integrators. In Advances in Neural Information Processing Systems (pp. 2278-2286).

Grimmer, J., Knox, D., & Stewart, B. (2020). Naive regression requires weaker assumptions than factor models to adjust for multiple cause confounding [link]

Gustafson, P. (2015). Bayesian inference for partially identified models: Exploring the limits of limited data (Vol. 140). CRC Press.

Korattikara, A., Chen, Y., & Welling, M. (2014). Austerity in MCMC land: Cutting the Metropolis-Hastings budget. In International Conference on Machine Learning (pp. 181-189).

Wang, Y. and Blei, D.M. (2019) The Blessings of Multiple Causes. Journal of American Statistical Association, 114:528, 1574-1596. [link]

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
gene_tf		gene_tf
movie_py		movie_py
smoking_R		smoking_R
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reference implementation of the deconfounder

Acknowledgments

References

About

Releases

Packages

Languages

License

blei-lab/deconfounder_public

Folders and files

Latest commit

History

Repository files navigation

Reference implementation of the deconfounder

Acknowledgments

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages