-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Contrast coding systems for categorical variables #757
Comments
I don't know if anyone is planning to work on that soon, but it seems clearly worth having eventually. @dmbates might have more to say. I'm still planning on rewriting our underlying representation of categorical variables, but that should be orthogonal. |
Another thing: df <- as.data.frame(c("A", "A", "B", "C", "C", "B"))
names(df) <- "var"
df <- model.matrix(~ 0 + df$var) will return df$varA df$varB df$varC
1 1 0 0
2 1 0 0
3 0 1 0
4 0 0 1
5 0 0 1
6 0 1 0 In Julia, if I omit the intercept df = DataFrame(var = ["A", "A", "B", "C", "C", "B"], y = randn(6))
pool!(df, [:var])
mm = ModelMatrix(ModelFrame( y ~ var + 0, df))
ModelMatrix{Float64}(6x2 Array{Float64,2}:
0.0 0.0
0.0 0.0
1.0 0.0
0.0 1.0
0.0 1.0
1.0 0.0,[1,1]) No dummy variable is created for the reference level ("A"). Is this intended? |
That seems like a bug to me. |
should I open a separate issue? |
The contrast generation code was pretty primitive last time I looked, the most obvious thing being that it doesn't check for whether an intercept is present or not. Things get tricky when you have more than one categorical variable (in R, only the first categorical variable gets the full complement of predictors), or interactions between categorical predictors and other predictors (you need to include more levels when an interaction is included but the lower-order terms are not). I think someone had opened an issue or pull request about other contrast types a long time ago but obviously it didn't make it in. |
Fixed by #870. |
Are there any plans to add more contrast coding systems for categorical variables? At the moment,
contr_treatment
seems to be the only one.The text was updated successfully, but these errors were encountered: