-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
default intercept behavior #31
Comments
Just to repeat here some concern that I was expressing in ararslan/Survival.jl#2, for Cox regression we need a formula such that, if we only have a variable
Would there be a syntax for that? |
Your first bullet point should be the default if there's no intercept term. That would just be |
I'd also tend to think that |
I think the Cox model issue is mostly orthogonal to this issue: whether Regarding the question of the implicit intercept, I agree that Last time we discussed this (see also the older discussion on the Google Group), everybody agreed that one should be explicit by using An alternative solution which would be equally safe would be to allow |
This makes a lot of sense. Maybe the suggestion from ararslan/Survival.jl#2 is still valid in this scenario: adding a
in StatsBase that is run every time the user doesn't specify the intercept explicitly (i.e. by typing
I guess Linear Models could instead default to The same mechanism could be used for the other question with a |
With #71 we can have our cake and eat it too: by dispatching on the model type during Also, #71 introduces a trait ( |
I'd like to change the default intercept behavior in formulas. Currently, we have this:
~ 1 + x
intercept column and x~ x
intercept column and x~ 0 + x
or~ -1 + x
or~ x-1
no intercept column, only x.Including the intercept by default might be sensible when all you're doing is regression. But it's not always the most appropriate thing (e.g. JuliaStats/Survival.jl#2) and making people use
~ 0+x
when they don't want an intercept feels like an unnecessary gotcha, Especially if we want formulas to be useful as a general "glue" between tabular data and numerical arrays.I think we've largely agreed that this is the desirable behavior:
~ 1+x
intercept and x~ x
just x~ 0+x
or~ -1+x
or~ x-1
an (informative) error.If that's the case, let's implement that before we release this as a replacement for the statsmodels code in DataFrames.
The text was updated successfully, but these errors were encountered: