-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow transformation destination to be a function #2897
Conversation
@nalimilan - CI passes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I just wonder whether we should also pass the function, as it could be useful to generate a column name e.g. in a loop over multiple functions and/or inputs. Though that would make the API less convenient. What kind of use case do we expect?
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Could you please give an example or explain it a bit more as I am not sure what you mean here.
You mean use cases of the change I have implemented? The most common will be |
I mean something like: julia> combine(df, [:x, :y, :z] .=> [mean median] =.> string)
4×6 DataFrame
Row │ meanx meany meanz medianx mediany medianz
│ Float64 Float64 Float64 Float64 Float64 Float64
─────┼───────────────────────────────────────────────────────────
1 │ 0.483742 0.4106 0.287058 0.47259 0.32765 0.247748 Given that it's almost the same as the default names maybe that's not useful. It could be useful when there are many input variable names but you would like to use only one of them in the output rather than the "etc" default. That's not super likely but i thought I'd mention it just to be sure. |
Yes.
So that's |
Ah - now I understand. I think it would complicate the API too much.
Yes - Recently @jtrakk asked for this in #2893 so maybe we can get some more comment when it would be useful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but let's wait a bit as it's always interesting to check concrete use cases before merging.
My case was like this: transform(df, [:X,:Y,:Z] .=> (v->v.-mean(v)) => (s->"$(s)demeaned")) The case of "for each variable, for each function" combine(df, [:x, :y, :z] .=> [mean median] .=> string) is an interesting one and does suggest taking the function as well, but I agree it would complicate the API. Perhaps there could be a wrapper combine(df, [:x, :y, :z] .=> [mean median] .=> Rename((s,f)->"$s$f")) and passing only a function like |
@jtrakk - thank you very much for the valuable feedback. |
Thank you! Essentially as @jtrakk shows the use case is when the function name does not describe well the operation you perform and you prefer to give an explicit name for the transformation. This is especially useful when transformations are anonymous functions. |
Fixes #2876
The crucial API decision wis what function should take. I assume it is passed a string or a vector of strings.
The vector is passed if multicolumn selector is used (and
AsTable
is always considered to be multicolumn selector in particular)