Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

applying transformations on formula arguments #25

Closed
mkborregaard opened this issue Jun 9, 2017 · 11 comments
Closed

applying transformations on formula arguments #25

mkborregaard opened this issue Jun 9, 2017 · 11 comments

Comments

@mkborregaard
Copy link
Contributor

mkborregaard commented Jun 9, 2017

Not sure whether this belongs here or on GLM?
A normal use case is to provide exponents (and sometimes log-transforms) directly on model objects. E.g.

using RDatasets, GLM
lm(@formula(SR ~ Pop75 + Pop75^2), LifeCycleSavings) #the key here is the exponent

However, adding the exponent seems not to be supported. The current workaround appears to be to create a LifeCycleSavings[:Pop75_2] = LifeCycleSavings[:Pop75] .^2.

There are use cases where this won't be ideal, because the model does not realize that it's the same input argument that has just been squared. Importantly, for predict you'd need to also do that transformation on the new DataFrame (newX). This in particular means that you cannot plot(newX, predict(mymodel, newX)) (for a single input variable), which is a particularly useful functionality IMHO.

@nalimilan
Copy link
Member

Yes, this is the right place. Some previous discussion happened in JuliaData/DataFrames.jl#19 (one of the oldest issues) and JuliaData/DataFrames.jl#867.

@kleinschmidt
Copy link
Member

I'm working on a pretty major overhaul of the model matrix setup that can handle this (among other things); see https://github.com/kleinschmidt/StreamModels.jl for a prototype.

@mkborregaard
Copy link
Contributor Author

That package looks awesome!

@kleinschmidt
Copy link
Member

Thanks! It's pretty rough at that point so I'd appreciate any bug reports or optimization suggestions if you do try it out.

@Nosferican
Copy link
Contributor

Could some of the transformations be extended to take an index or indices? For example, lag / lead / first-difference.

@xiaodaigh
Copy link

Just to add on: currently @formula(target~a+b) works well. But @formula can't handle arbitrary transformation just yet like I can in R. e.g. @formula(fn(target)~fn1(a) + fn2(b)) doesn't work.

Support for R style literal computation I(a*b) would be nice too! But may use a different syntax

@kleinschmidt
Copy link
Member

This is handled now by #71

@mkborregaard
Copy link
Contributor Author

Awesome! Still too early to close this?

@kleinschmidt
Copy link
Member

Yes I think so, let's wait until that (or something like it) is merged.

@kleinschmidt
Copy link
Member

Closed by #71 (but feel free to re-open if that doesn't suitably address your needs!)

@mkborregaard
Copy link
Contributor Author

Wuuuut! 🎉 :1000:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants