Eliezer de Souza da Silva, Tomasz Kuśmierczyk, Marcelo Hartmann, Arto Klami; Prior Specification for Bayesian Matrix Factorization via Prior Predictive Matching. Journal of Machine Learning Research. 24(67):1−51, 2023.
@article{JMLR:v24:21-0623,
author = {Eliezer de Souza da Silva and Tomasz Kuśmierczyk and Marcelo Hartmann and Arto Klami},
title = {Prior Specification for Bayesian Matrix Factorization via Prior Predictive Matching},
journal = {Journal of Machine Learning Research},
year = {2023},
volume = {24},
number = {67},
pages = {1--51},
url = {http://jmlr.org/papers/v24/21-0623.html}
}
The code was tested using Python 3.7.4 from Anaconda 2019.10. with TensorFlow 2.1 and TensorFlow Probability 0.9.0. It uses numpy, pandas, seaborn, and matplotlib.
The hetrec-lastfm dataset along with train-test split can be found in the directory named data
.
The code illustrating gradient-based optimization can be found in the directory named gradient_optimization
.
pmf_sgd_optimization.ipynb
- Jupter Notebook illustrating how priors matching requested values of prior predictive expectation and/or variance can be found for Poisson Matrix Factorization (PMF) model using SGD.hpf_sgd_optimization.ipynb
- Jupter Notebook illustrating how priors matching requested values of prior predictive expectation and/or variance can be found for Hierarchical Poisson Matrix Factorization (HPF) model using SGD.pmf_estimators_analysis.ipynb
- Jupter Notebook illustrating bias and variance of the estimators used in pmf_sgd_optimization.ipynb for PMF model.pmf_surface_visualizations.ipynb
- Jupter Notebook illustrating 1D and 2D projections of optimization space for the problem of matching Poisson Matrix Factorization (PMF) prior predicitve distribution variance (minimization of the discrepancy=(Variance-100)^2 ). We consider two parametrizations: abcd vs mu-sgima.
pmf_model.py
- Methods calculating E[Y] and E[Y^2] (and therefore also Var[Y]) over prior predictive distribution for Poisson Matrix Factorization.hpf_model.py
- Methods calculating E[Y] and E[Y^2] (and therefore also Var[Y]) over prior predictive distribution for Hierarchical Poisson Matrix Factorization.
aux.py
,aux_plt.py
,boplotting/*
- Auxiliary functions for tensor processing and plotting.
The code can be found in gradient_optimization_experiments
.
It contains two subfolders PMF_Convergence
and HPF_Convergence
.
The code computing PSIS-LOO on the test subset of fitted PMF can be found in posterior_visualization
. The scripts pmf_precompute_objectives_posterior.py
and python pmf_precompute_objectives_posterior2.py
precompute certain set of configurations specified inside those files and write to respectively pmf_precompute_objectives_posterior.py.csv
and pmf_precompute_objectives_posterior2.py.csv
. The outputs can be then previewed with VISUALIZATION.ipynb
and VISUALIZATION2.ipynb
.
VISUALIZATION_K.ipynb
plots PSIS-LOO on test subset for various K with a,b,c,d set to prior optimal values.
The code can be found in bo_optimization
. To run the experiment use: RUN_EXPERIMENT_BO.sh
.
It requires RoBO - a Robust Bayesian Optimization framework (https://github.com/automl/RoBO) to be preinstalled.
Results can be displayed using the Jupter Notebook VISUALIZATION_BO.ipynb.
The code can be found in sensitivity analysis
folder.
To visualize the experiment results open the jupyter-notebook sensitivity_analysis.ipynb
.
To re-run the experiment run the python scripts
- python poisson_prior_exp_negbin_async.py
: experiment sampling from a Negative Binomial
- python poisson_prior_exp_binomial_async.py
: experiment sampling from a PMF but with a probability of randomly zeroing each of the entries of the matrix.
Both experiments will generate csv files with the results. The files with a suffix final_*.csv
can be analyzed in the sensitivity_analysis.ipynb
notebook simply by adding new cells, keeping the same code from previous cells and just adjusting the file name that is loaded.