-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implicitly removing the intercept in bain
results in seemingly unexpected behavior
#39
Comments
Dear @thomvolker , I agree with your assessment and I had raised similar questions/concerns during early development of bain. @herberthoijtink we should discuss Tom's concerns, shall I plan a call with him? |
With respect to the first issue. In case users omit the -1 we could add a message to the output stating that we have added it. I like the way it is now, because it protects the user against specifying hypotheses with respect to contrasts, which we can only handle using the 1-group approach, which in case of multiple groups is inferior to the multiple group approach. |
Thank you for your responses! @herberthoijtink I don't entirely understand your second remark. Specifically, I don't entirely understand the difference between "treating groups as groups" and "treating groups as predictors". Is this to say, it depends on whether a researcher is interested in differences between the groups (then groups are treated as groups) or merely includes the groups as a covariate (and then this is including groups as a predictor)? Additionally, I noticed some other strange behavior of In short, # Load required packages
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(magrittr)
#> Warning: package 'magrittr' was built under R version 4.1.2
library(bain)
dat1 <- sesamesim %>%
mutate(sex = sex - 1,
site = as.factor(site))
dat2 <- sesamesim %>%
mutate(sex = factor(sex, labels = c("boy", "girl")),
site = as.factor(site))
fit1 <- glm(sex ~ site + prenumb,
family = binomial,
data = dat1)
summary(fit1)
#>
#> Call:
#> glm(formula = sex ~ site + prenumb, family = binomial, data = dat1)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -1.358 -1.206 1.029 1.130 1.341
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 0.15210 0.41475 0.367 0.714
#> site2 0.34588 0.37716 0.917 0.359
#> site3 0.04317 0.38179 0.113 0.910
#> site4 0.25095 0.40818 0.615 0.539
#> site5 -0.18144 0.54183 -0.335 0.738
#> prenumb -0.00917 0.01360 -0.674 0.500
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 332.29 on 239 degrees of freedom
#> Residual deviance: 330.43 on 234 degrees of freedom
#> AIC: 342.43
#>
#> Number of Fisher Scoring iterations: 3
bf1 <- bain(fit1, "site2 > 0 & site5 < 0")
bf1
#> Bayesian informative hypothesis testing for an object of class glm:
#>
#> Fit Com BF.u BF.c PMPa PMPb
#> H1 0.000 0.249 0.000 0.000 1.000 0.000
#> Hu 1.000
#>
#> Hypotheses:
#> H1: site2>0&site5<0
#>
#> Note: BF.u denotes the Bayes factor of the hypothesis at hand versus the unconstrained hypothesis Hu. BF.c denotes the Bayes factor of the hypothesis at hand versus its complement.
summary(bf1)
#> Parameter n Estimate lb ub
#> 1 site1 60 0.490438410 0.361223888 0.619652932
#> 2 site2 55 0.576339379 0.438023028 0.714655730
#> 3 site3 64 0.501324850 0.370906355 0.631743344
#> 4 site4 43 0.552872746 0.401450005 0.704295487
#> 5 site5 18 0.445372841 0.212542146 0.678203536
#> 6 prenumb 240 -0.002273624 -0.008952232 0.004404983
## ALthough in line with the hypothesis, the fit is zero,
## because bain seems to fit a linear probability model,
## and probabilities cannot be below 0 (this also explains
## why the estimated coefficient of prenumb is much smaller
## here).
fit2 <- glm(sex ~ site + prenumb,
family = binomial,
data = dat2)
bf2 <- bain(fit2, "site2 > 0 & site5 < 0")
#> Warning in model.response(mf, "numeric"): using type = "numeric" with a factor
#> response will be ignored
#> Warning in Ops.factor(y, z$residuals): '-' not meaningful for factors
#> Warning in Ops.factor(r, 2): '^' not meaningful for factors
#> Error in eigen(Sigma): infinite or missing values in 'x'
## Throws an error, because `lm` is not applicable
## to factors.
fit3 <- glm(sex ~ prenumb,
family = binomial,
data = dat2)
## But the problem dissolves when only a continuous predictor is included.
bf3 <- bain(fit3, "prenumb < 0")
bf3
#> Bayesian informative hypothesis testing for an object of class glm:
#>
#> Fit Com BF.u BF.c PMPa PMPb
#> H1 0.712 0.500 1.423 2.468 1.000 0.587
#> Hu 0.413
#>
#> Hypotheses:
#> H1: prenumb<0
#>
#> Note: BF.u denotes the Bayes factor of the hypothesis at hand versus the unconstrained hypothesis Hu. BF.c denotes the Bayes factor of the hypothesis at hand versus its complement.
summary(bf3)
#> Parameter n Estimate lb ub
#> 1 prenumb 240 -0.006814645 -0.03074668 0.01711739 Created on 2022-03-09 by the reprex package (v2.0.0) |
Your first question is one to deal with face to face. With respect to glm, I did not know that bain could process glm objects. These are not discussed in the vignette, these are not debugged, so don't use them as input for bain, use named vectors (see the vignette) instead. In the next version of bain I will arrange that glm objects can not process. I'm rather certain that the results obtained will not be correct. |
Dear @herberthoijtink , I'm pretty sure that BFpack uses |
@CasparVanLissa. Perfect. We will do that in the next version (probably winter 2022). Please note that if BFpack feeds in glm object into bain, it will not give correct results. The proper processing of glm object is quite different from that of a lm object and can currently only be achieved using the named vector input. If you know that BFpack feeds glm into bain, maybe good to warn Joris Mulder!? |
@cjvanlissa @herberthoijtink If the next update it scheduled for winter 2022, you might want to consider adding a bain.glm function. I figure that to a large extent, it should be similar relatively similar to the bain.lm function (but perhaps I am missing some important details). I assume that the most important difference is that it is not at all straightforward to accommodate the multiple group approach, which is dependent on which member of the glm family is fitted. On a side note, you might also want to consider adding robust standard errors to bain as input argument. With the I don't have time to dive into this right now, but I might be able to spend some time on this from May onwards. |
@thomvolker. Let me know if you want to work on these things. Look e.g. at the logistic regression examples in the bain vignette, with named vector input all of this can be done, however, to construct a user friendly wrapper is another thing. I may have a few pointers that might speed things along. |
When estimating the effect of a categorical variable, such as in, for example, an ANOVA,
bain
by default drops the intercept. This seems like a logical decision, but results in seemingly unexpected behavior, because there is no warning message shown. Accordingly, the estimate as given in anlm
model reflects the difference between two categories, whilebain
uses the same name to refer to the group specific mean. This issue becomes apparent in the following reprex.Created on 2022-03-09 by the reprex package (v2.0.0)
On a related note, researchers may also manually dummy code their categories. If this is the case,
bain
does not remove the intercept, but estimates the difference. However, this ignores the fact that there are multiple groups involved, which results in an inconsistent Bayes factor (Hoijtink, Gu & Mulder, 2019). Is it indeed problematic to create numeric dummy variables, and estimate the effect of these usingbain
, because the resulting Bayes factors are highly similar (see reprex below). If this is actually a problematic way to estimate the effects of categorical variables, it may be worthwhile to letbain
recognize numeric dummy variables, and display a warning if these are observed.Created on 2022-03-09 by the reprex package (v2.0.0)
The text was updated successfully, but these errors were encountered: