Welcome to categorical_from_binary
, a Python code package that lets you perform fast Bayesian inference on linear models for categorical data.
This is the code repository for our ICML 2022 paper:
Easy Variational Inference for Categorical Models via an Independent Binary Approximation
Michael T Wojnowicz, Shuchin Aeron, Eric L Miller, Michael C. Hughes
Proceedings of the 39th International Conference on Machine Learning (ICML), 2022
https://proceedings.mlr.press/v162/wojnowicz22a.html • https://arxiv.org/abs/2206.00093
Our methods have two main advantages:
- simplicity: We use an independent Binary approximation that makes inference "easy" using well-known Bayesian methods for binary outcomes
- scalability : Our code can handle hundreds or thousands of categories easily
You can use this repo to:
- reproduce the accuracy-over-time experiments in the paper
- run our categorical-from-binary models on your own data
Jump to: Installation • Demo • Usage • References
Installation requires python3 and the tox
package.
This package is not yet published to PyPI, so the first step is to clone this repo.
Using make
, we can create a virtual environment and install categorical_from_binary via the command:
make env
Run unit tests:
make test
To work interactively (e.g. in IPython), be sure to load the virtual environment (source env/bin/activate
) before proceeding.
The following code (using these configs by default) will provide a quick demo of how categorical-from-binary models can be used to obtain fast Bayesian inference on categorical models.
from categorical_from_binary.performance_over_time.main import run_performance_over_time
path_to_configs = "configs/performance_over_time/demo_sims.yaml"
run_performance_over_time(path_to_configs)
For a quick illustration, we simulate a small categorical regression dataset from a softmax model.
Then we do approximate Bayesian inference.
- We apply our proposed approximate Bayesian inference method:
-
IB-CAVI (Independent Binary - Coordinate Ascent Variational Inference).
This method uses categorical-from-binary (CB) likelihoods, which have natural independent binary approximations. In particular, we consider the CB-Logit and CB-Probit likelihoods.
- For baseline methods, we fit the softmax likelihood using three different inference strategies:
- Automatic differentiation variational inference (ADVI)
- NUTS (No U-Turn Sampler)
- Gibbs sampling (via Polya-Gamma augmentation)
The code will automatically write holdout-performance-over-time plots to data/results/demo_sims/
. The plots will look like this:
More extensive experiments can take substantially longer to run than this quick demo. Results from those experiments are summarized in the paper.
Would you like to apply IB-CAVI to your own data? Usage is demonstrated in this python script.
from categorical_from_binary.ib_cavi.multi.inference import (
IB_Model,
compute_ib_cavi_with_normal_prior,
)
results = compute_ib_cavi_with_normal_prior(
IB_Model.PROBIT,
labels_train,
covariates_train,
labels_test=labels_test,
covariates_test=covariates_test,
variational_params_init=None,
convergence_criterion_drop_in_mean_elbo=0.01,
)
If you have any questions, feel free to email Michael Wojnowicz. (Link to email address can be found on that page.)
If you use this package in your research, please cite our paper:
Easy Variational Inference for Categorical Models via an Independent Binary Approximation
Michael T Wojnowicz, Shuchin Aeron, Eric L Miller, Michael Hughes
Proceedings of the 39th International Conference on Machine Learning (ICML), 2022
https://proceedings.mlr.press/v162/wojnowicz22a.html