-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Terms 2.0: son of Terms #71
Conversation
I intentionally didn't build in missing-skipping in Function composition should work normally: julia> f =@formula(y ~ log(abs2(x)))
FormulaTerm
Response:
y(unknown)
Predictors:
(x)->log(abs2(x))
julia> f.rhs
(x)->log(abs2(x))
julia> f.rhs.fanon
#7 (generic function with 1 method)
julia> f.rhs.fanon(10)
4.605170185988092
julia> log(abs2(10))
4.605170185988092 As for using try/catch to support auto-de-vectorizing of column-wise functions, I'm hesitant. Or I'll need more convincing that it's a good idea and doesn't hurt performance or usability all that much. So better handled as a PR after this is merged :) Or as a "special term" as I suggested above, where you can define behavior directly for an entire column of data by dispatching on the data type in |
Shouldn't |
No, it should yield a Edit: and a function term should return a single value when called with the arguments given in its names parameter pulled from a named tuple |
And actually you could implement both |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge then!
Can you talk a bit more about why you decided to apply functions elementwise? It sounds like a pretty big departure from the rest of Julia syntax. Moreover, using lagged variable (or converting continuous variable to categorical) on the fly sounds like a common use case, but it cannot be done in your framework (AFAIK) |
See also #75 (comment). Probably better move the discussion there, or file a dedicated issue, as this PR is coming dangerously close to 200 comments. |
This is a pretty major re-thinking of how to represent terms in a formula. It builds on #4, #54, and #57. The basic idea is that the
@formula
macro lowers a formula expression to an expression where symbols are "wrapped" in aTerm
struct, and overloads operators like+
,&
, and~
with methods ofTerm
s that generate higher-order terms like interactions. Additionally, this PR includes a mechanism by which calls to functions that don't have special meanings in the formula DSL are lowered to a call tocapture_call
which gets the original function called, the original expression, and an anonymous function that "wraps" that call. The default result of that function is that it passes these onto aFunctionTerm
constructor, but in principle package authors could intercept things at this point.Whether or not a call is considered 'special' is also customizable, dispatching onEdit: For posterity's sake, the extension mechanism is now to provide methods foris_special(Val(::Symbol))
.apply_schema(::FunctionTerm{typeof(myfunc)}, schema, Modeltype)
which return your custom term type.The other major new component is a schema representation that I "borrowed" from JuliaDB.ML. Schemas are computed from a namedtuple of vectors (e.g., what
DataStreams calls a Data.TableTables.jl calls a ColumnTable), and when applied to a formula will replace leafTerm
s withCategorical
/ContinuousTerm
s.The major conceptual difference is that any subtype of
AbstractTerm
can generate model matrix columns, and the way that columns are combined to make higher order model matrices is handled by dispatch. This provides, I think, much more flexibility in how package authors can "plug into" the formula pipeline, because they are no longer restricted to using fully-formedModelFrame
s.That being said, I've tried to keep the ModelFrame/ModelMatrix structure for now to make it easier to see how things have changed. I'd also like to consider how we actually use these structures to generate and fit models (e.g. #32). But that's orthogonal enough that this is worth considering as is.
This is work in progress and I haven't even tried to get the tests passing yet because I wanted to talk about this at juliacon. But I think it's close enough to the kind of structure I've had in mind for a long time that it's worth considering.