-
Notifications
You must be signed in to change notification settings - Fork 31
Management and comparison of MCMC runs from multiple packages
Markov chain Monte Carlo (MCMC) is an important family of algorithms for generating simulations from complicated posterior distributions in Bayesian analysis. There are now numerous software systems that provide MCMC for generally specified models, including JAGS, WinBUGS, OpenBUGS, Stan, and NIMBLE, among others. NIMBLE is an R package, and there are R packages to call the other MCMC packages from R. The different MCMC systems provide different levels of algorithm customization, from modules in JAGS to writing and combining new samplers in NIMBLE. In addition the different packages have different systems for writing models, from variants of the BUGS language to Stan's separate language.
There is a need to automate the comparison of different MCMC packages, and different customizations within each package, for purposes of choosing the best package for a particular problem and improving packages by revealing their strengths and weaknesses. This project would be to develop a new R package for that purpose.
NIMBLE includes a first version of the desired functionality, but we propose for it to be completely refactored and separately packaged to make it more generally useful. Examples of the results of automated comparisons can be found here. Documentation can be found in ?MCMCsuite
and ?compareMCMCs
in the nimble package.
The NIMBLE implementation of this idea grew out of a narrow goal of comparing multiple MCMCs created within NIMBLE for the same model. Then it was expanded to call MCMCs from other packages and generate comparison pages such as the one linked above. However, the processing flow that evolved from this history limits flexibility. It would be desirable to easily mix and match information from different runs into new comparisons, to modularize the choice of comparison metrics and allow multiple metrics, and to allow general mappings between different variable names and alternative formulations of equivalent models used in different cases. In addition, the NIMBLE package is already large, and its MCMC comparison features are really a sideshow distinct from its core goals. These are the motivations for re-writing a system for managing and comparing MCMC runs within and across packages and distributing it as an independent package.
As currently envisioned, the coding project would involve roughly the following steps:
-
Studying the existing NIMBLE
compareMCMCs
andMCMCsuite
systems to gain understanding of the problem scope and limitations of the current design. -
Designing and implementing classes for the results, metadata, and comparison metrics of a single MCMC run and a collection of MCMC runs.
-
Designing and implementing interfaces to call each MCMC system based on a common representation of run details.
-
Creating methods to collect multiple MCMC runs for comparisons that can be chosen in a flexible and extensible way.
-
Adapting and generalizing the existing graph- and html-generation system to be more flexible and extensible.
-
Writing tests and documentation.
-
Putting it all together into an R package.
-
Using shiny to make the generated comparison figures interactive.
-
Adding statistical comparisons of results from different MCMC engines to determine if they are generating consistent results.
The new package will make it easy for researchers to compare different MCMCs for the same model. Since NIMBLE allows a high degree of customization to MCMCs, including writing new samplers, this will allow researchers to quickly generate results on what works best in different cases. For people comparing across packages, this will allow greater shared understanding of when different packages perform well.
Here is a series of steps of increasing degree of challenge.
- Write a reference class definition (sampleInfoClass) for objects that hold a numeric vector and a list of arbitrary metrics of the vector.
- Write a reference class definition for a collection (sampleInfoCollectionClass) of sampleInfoClass objects.
- Add a method to sampleInfoCollectionClass that takes as input a function that computes a scalar metric from a numeric vector. It should apply that function to the numeric vector in each sampleInfoClass object and add the result to the list of calculated metrics for the same sampleInfoClass object.
- Add a method to sampleInfoCollectionClass that takes as input the name of a metric and a function that creates a barplot. It should create a data frame by extracting the named metric from each sampleInfoClass object and pass that data frame to the plotting function to generate a barplot.
- Add another method to sampleInfoCollectionClass that generates an html page displaying a barplot generated as above.
Matt Piekenbrock: https://github.com/peekxc/Reference-Class-Testing
Anant Gowadiya: https://github.com/AnantGowadiya/Management-and-comparison-of-MCMC-runs-from-multiple-packages
Helen HAN: https://github.com/myloveecho/Gsoc2017