The goal of stanova
is to provide a more relevant and interpretable
summary
for Bayesian models with categorical covariates and possibly
interactions and continuous covariates estimated in Stan
. The core
functions are stanova()
which requires specifying which rstanarm
function should be called through argument model_fun
(e.g.,
model_fun = glmer
calls stan_glmer
and allows fitting Bayesian mixed
models) and stanova_brm()
which estimates models using brms::brm()
.
The issue stanova
tries to address is that categorical variables with
k levels need to be transformed into k − 1 numerical model
coefficients. This poses a problem if a model includes a factor with
more than two levels that interacts with another variable. In this case,
the most reasonable parameterization of the model is such that the
intercept correspond to the (unweighted) grand mean and therefore
estimates of the coefficients for the main effects represent average
effects (compared to simple effects). In this parameterization, factors
with k levels, where k > 2 (i.e., more than two levels), cannot
be mapped in a 1-to-1 fashion to the k − 1 coefficients as no such
mapping exists. Thus, the estimates of the model coefficients do not
represent effects of one factor level, but always pertain to more than
one factor level and thus cannot be directly interpreted. In other
words, these coefficients should not be looked at. Instead, stanova
transforms these parameters back such that one gets the information on
the factor levels (or combination of factor levels for interactions).
The default output shows for each factor level the difference from the
intercept which as discussed before corresponds to the (unweighted)
grand mean.
Another problem adressed by stanova
is that for Bayesian models the
mapping of factor-levels to model coefficients needs to be done such
that the marginal prior for each factor-level is the same. If one uses
one of the contrast coding schemes commonly used in frequentist models
in which the intercept corresponds to the (unweighted) grand mean the
marginal prior differs across factor levels. For example, when using
contr.sum
all but the last factor level are mapped to exactly one
model coefficient with positive sign, and the last factor level is
mapped with negative sign on all model coefficients. Thus, the marginal
prior for the last factor level is more diffuse than for the other
factor levels (if k > 2). stanova
per default uses the contrast
coding scheme suggested by Rouder, Morey, Speckman, and Province (2012)
which is such that the marginal prior is the same for all factor levels.
When using this contrast, the sum-to-zero constraint that is necessary
for the intercept to represent the (unweighted) grand mean is also
imposed. This contrast is implemented in the contr.bayes()
function.
For the moment, you can only install the development version from GitHub with:
# install.packages("remotes")
remotes::install_github("bayesstuff/stanova")
At least version 1.5.0
of emmeans
is needed which can be installed
from CRAN:
install.packages("emmeans")
For the brms
support, in addition to a C++ compiler, at least brms
version 2.15.0
is needed. This can be installed from CRAN for now:
install.packages("brms")
For the rstanarm
fucntionality, stanova
attaches the new contrasts
to the model instead of to the data if rstanarm
version 2.21.2
,
which is not yet on CRAN, is installed from source:
Sys.setenv("MAKEFLAGS" = "-j4") ## uses 4 cores during installation
remotes::install_github("stan-dev/rstanarm", build_vignettes = FALSE)
The most basic example only uses a single factor with three levels to demonstrate the output.
library(stanova)
data("Machines", package = "MEMSS")
m_machines <- stanova(score ~ Machine + (Machine|Worker),
model_fun = "glmer",
data=Machines, chains = 2,
warmup = 250, iter = 750)
The summary
method of a stanova
objects first provides some general
information about the fitted model (similar to rstanarm
). It then
provides statistics about the intercept. The next block provides
estimates for each factor-level of the Machine
factor. These estimates
represent the difference of the factor level against the intercept. For
example, for Machine A
, the estimate is around -7
suggesting that
the mean of this factor level is 7 points below the intercept (i.e., the
grand mean).
summary(m_machines)
#>
#> Model Info:
#> function: stanova_glmer
#> family: gaussian [identity]
#> formula: score ~ Machine + (Machine | Worker)
#> algorithm: sampling
#> chains: 2
#> sample: 1000 (posterior sample size)
#> priors: see help('prior_summary', package = 'rstanarm')
#> observations: 54
#> groups: Worker (6)
#>
#> Estimate Intercept:
#> Variable Mean MAD_SD 5% 50% 95% rhat ess_bulk ess_tail
#> 1 (Intercept) 59.722 1.807 56.366 59.670 63.034 1.021 226.845 414.380
#>
#>
#> Estimates 'Machine' - difference from intercept:
#> Variable Mean MAD_SD 5% 50% 95% rhat ess_bulk ess_tail
#> 1 Machine A -7.398 1.190 -9.570 -7.390 -5.265 1.002 293.763 440.176
#> 2 Machine B 0.789 1.215 -1.391 0.734 3.157 0.999 306.112 445.466
#> 3 Machine C 6.609 1.105 4.784 6.585 8.637 1.002 488.434 628.321
If one is not interested in the differences from the factor levels, it
is also possible to obtain the estimates marginal means. For this one
just needs to set diff_intercept = FALSE
.
summary(m_machines, diff_intercept = FALSE)
#>
#> Model Info:
#> function: stanova_glmer
#> family: gaussian [identity]
#> formula: score ~ Machine + (Machine | Worker)
#> algorithm: sampling
#> chains: 2
#> sample: 1000 (posterior sample size)
#> priors: see help('prior_summary', package = 'rstanarm')
#> observations: 54
#> groups: Worker (6)
#>
#> Estimate Intercept:
#> Variable Mean MAD_SD 5% 50% 95% rhat ess_bulk ess_tail
#> 1 (Intercept) 59.722 1.807 56.366 59.670 63.034 1.021 226.845 414.380
#>
#>
#> Estimates 'Machine' - marginal means:
#> Variable Mean MAD_SD 5% 50% 95% rhat ess_bulk ess_tail
#> 1 Machine A 52.324 1.739 49.218 52.312 55.831 1.025 186.573 466.659
#> 2 Machine B 60.511 2.626 55.713 60.522 65.490 1.010 250.168 397.672
#> 3 Machine C 66.331 1.946 63.081 66.319 69.727 1.013 282.753 376.227
The key to the output is the stanova_samples()
function which takes a
fitted model objects and returns the posterior samples transformed to
represent the difference from the intercept for each factor level (or
the marginal means if diff_intercept = FALSE
). The default output is
an array
, but this can be changed to a matrix
or data.frame
with
the return
argument.
out_array <- stanova_samples(m_machines)
str(out_array)
#> List of 2
#> $ (Intercept): num [1:500, 1, 1:2] 63.6 63.2 61.1 57.8 59.6 ...
#> ..- attr(*, "dimnames")=List of 3
#> .. ..$ Iteration: chr [1:500] "1" "2" "3" "4" ...
#> .. ..$ Parameter: chr "(Intercept)"
#> .. ..$ Chain : chr [1:2] "1" "2"
#> $ Machine : num [1:500, 1:3, 1:2] -7.16 -6.93 -8.36 -5.85 -7.14 ...
#> ..- attr(*, "dimnames")=List of 3
#> .. ..$ Iteration: chr [1:500] "1" "2" "3" "4" ...
#> .. ..$ Parameter: chr [1:3] "Machine A" "Machine B" "Machine C"
#> .. ..$ Chain : chr [1:2] "1" "2"
#> ..- attr(*, "estimate")= chr "difference from intercept"
One can also change which dimension of the array
represents the chain
via the dimension_chain
argument.
out_array2 <- stanova_samples(m_machines, dimension_chain = 2)
str(out_array2)
#> List of 2
#> $ (Intercept): num [1:500, 1:2, 1] 63.6 63.2 61.1 57.8 59.6 ...
#> ..- attr(*, "dimnames")=List of 3
#> .. ..$ Iteration: chr [1:500] "1" "2" "3" "4" ...
#> .. ..$ Chain : chr [1:2] "1" "2"
#> .. ..$ Parameter: chr "(Intercept)"
#> $ Machine : num [1:500, 1:2, 1:3] -7.16 -6.93 -8.36 -5.85 -7.14 ...
#> ..- attr(*, "dimnames")=List of 3
#> .. ..$ Iteration: chr [1:500] "1" "2" "3" "4" ...
#> .. ..$ Chain : chr [1:2] "1" "2"
#> .. ..$ Parameter: chr [1:3] "Machine A" "Machine B" "Machine C"
#> ..- attr(*, "estimate")= chr "difference from intercept"
This makes it easy to produce plots via the bayesplot
package on the
level of the differences from the intercept:
bayesplot::mcmc_trace(out_array2$Machine)
We can use the same example for brms
.
library(stanova)
data("Machines", package = "MEMSS")
m_machines_brm <- stanova_brm(score ~ Machine + (Machine|Worker),
data=Machines, chains = 2,
warmup = 250, iter = 750)
The summary
methods works the same. The default shows the difference
from the intercept.
summary(m_machines_brm)
#> Warning: There were 1 divergent transitions after warmup. Increasing adapt_delta
#> above 0.8 may help. See http://mc-stan.org/misc/warnings.html#divergent-
#> transitions-after-warmup
#>
#> Model Info:
#> function: brms::brm
#> family: gaussian(identity)
#> formula: score ~ Machine + (Machine | Worker)
#> algorithm: sampling
#> chains: 2
#> sample: 1000 (posterior sample size)
#> priors: Use brms::prior_summary(object) for prior information
#> observations: 54
#> groups: Worker (6)
#>
#> Estimate Intercept:
#> Variable Mean MAD_SD 5% 50% 95% rhat ess_bulk ess_tail
#> 1 (Intercept) 59.987 2.356 55.798 59.980 64.233 1.005 391.109 378.052
#>
#>
#> Estimates 'Machine' - difference from intercept:
#> Variable Mean MAD_SD 5% 50% 95% rhat ess_bulk ess_tail
#> 1 Machine A -7.275 1.376 -9.807 -7.339 -4.471 1.006 424.105 437.737
#> 2 Machine B 0.704 1.673 -2.304 0.725 3.698 1.003 494.681 544.226
#> 3 Machine C 6.570 1.481 3.991 6.559 9.155 1.000 541.001 568.342
And we can get the estimated marginal means using
diff_intercept = FALSE
.
summary(m_machines_brm, diff_intercept = FALSE)
#> Warning: There were 1 divergent transitions after warmup. Increasing adapt_delta
#> above 0.8 may help. See http://mc-stan.org/misc/warnings.html#divergent-
#> transitions-after-warmup
#>
#> Model Info:
#> function: brms::brm
#> family: gaussian(identity)
#> formula: score ~ Machine + (Machine | Worker)
#> algorithm: sampling
#> chains: 2
#> sample: 1000 (posterior sample size)
#> priors: Use brms::prior_summary(object) for prior information
#> observations: 54
#> groups: Worker (6)
#>
#> Estimate Intercept:
#> Variable Mean MAD_SD 5% 50% 95% rhat ess_bulk ess_tail
#> 1 (Intercept) 59.987 2.356 55.798 59.980 64.233 1.005 391.109 378.052
#>
#>
#> Estimates 'Machine' - marginal means:
#> Variable Mean MAD_SD 5% 50% 95% rhat ess_bulk ess_tail
#> 1 Machine A 52.713 2.384 48.547 52.630 57.334 1.004 481.561 434.842
#> 2 Machine B 60.692 3.382 54.488 60.605 66.948 1.001 387.032 347.542
#> 3 Machine C 66.557 2.640 62.258 66.454 71.246 1.001 354.172 419.227
Rouder, J. N., Morey, R. D., Speckman, P. L., & Province, J. M. (2012). Default Bayes factors for ANOVA designs. Journal of Mathematical Psychology, 56(5), 356-374. https://doi.org/10.1016/j.jmp.2012.08.001