Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow transformation destination to be a function #2897

Merged
merged 2 commits into from
Oct 10, 2021

Conversation

bkamins
Copy link
Member

@bkamins bkamins commented Oct 7, 2021

Fixes #2876

The crucial API decision wis what function should take. I assume it is passed a string or a vector of strings.
The vector is passed if multicolumn selector is used (and AsTable is always considered to be multicolumn selector in particular)

@bkamins bkamins added the feature label Oct 7, 2021
@bkamins bkamins added this to the 1.3 milestone Oct 7, 2021
@bkamins
Copy link
Member Author

bkamins commented Oct 7, 2021

@nalimilan - CI passes

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I just wonder whether we should also pass the function, as it could be useful to generate a column name e.g. in a loop over multiple functions and/or inputs. Though that would make the API less convenient. What kind of use case do we expect?

docs/src/man/split_apply_combine.md Outdated Show resolved Hide resolved
docs/src/man/split_apply_combine.md Outdated Show resolved Hide resolved
src/abstractdataframe/selection.jl Outdated Show resolved Hide resolved
src/abstractdataframe/selection.jl Outdated Show resolved Hide resolved
src/abstractdataframe/selection.jl Outdated Show resolved Hide resolved
src/abstractdataframe/selection.jl Outdated Show resolved Hide resolved
test/select.jl Show resolved Hide resolved
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
@bkamins
Copy link
Member Author

bkamins commented Oct 8, 2021

I just wonder whether we should also pass the function, as it could be useful to generate a column name e.g. in a loop over multiple functions and/or inputs.

Could you please give an example or explain it a bit more as I am not sure what you mean here.

Though that would make the API less convenient. What kind of use case do we expect?

You mean use cases of the change I have implemented?

The most common will be cols .=> fun .=> identity to avoid changing column names I think.

@nalimilan
Copy link
Member

Could you please give an example or explain it a bit more as I am not sure what you mean here.

I mean something like:

julia> combine(df, [:x, :y, :z] .=> [mean median] =.> string)
4×6 DataFrame
 Row │ meanx     meany    meanz     medianx   mediany   medianz 
     │ Float64   Float64  Float64   Float64   Float64   Float64  
─────┼───────────────────────────────────────────────────────────
   10.483742   0.4106  0.287058   0.47259   0.32765  0.247748

Given that it's almost the same as the default names maybe that's not useful. It could be useful when there are many input variable names but you would like to use only one of them in the output rather than the "etc" default. That's not super likely but i thought I'd mention it just to be sure.

@nalimilan
Copy link
Member

You mean use cases of the change I have implemented?

Yes.

The most common will be cols .=> fun .=> identity to avoid changing column names I think.

So that's renamecols=false, right? I hope we have other use cases or it doesn't seem super appealing. :-)

@bkamins
Copy link
Member Author

bkamins commented Oct 9, 2021

That's not super likely but i thought I'd mention it just to be sure.

Ah - now I understand. I think it would complicate the API too much.

So that's renamecols=false, right? I hope we have other use cases or it doesn't seem super appealing. :-)

Yes - identity or string is just renamecols=false but it is shorter to type. You asked me about a most common use case.

Recently @jtrakk asked for this in #2893 so maybe we can get some more comment when it would be useful.

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but let's wait a bit as it's always interesting to check concrete use cases before merging.

@jtrakk
Copy link

jtrakk commented Oct 9, 2021

My case was like this:

transform(df, [:X,:Y,:Z] .=> (v->v.-mean(v)) => (s->"$(s)demeaned"))

The case of "for each variable, for each function"

combine(df, [:x, :y, :z] .=> [mean median] .=> string)

is an interesting one and does suggest taking the function as well, but I agree it would complicate the API. Perhaps there could be a wrapper

combine(df, [:x, :y, :z] .=> [mean median] .=> Rename((s,f)->"$s$f"))

and passing only a function like lowercase could be a shortcut for Rename((s,f)->lowercase(s)), but that seems worth postponing until someone needs it.

@bkamins
Copy link
Member Author

bkamins commented Oct 9, 2021

@jtrakk - thank you very much for the valuable feedback.

@nalimilan nalimilan merged commit ba36297 into main Oct 10, 2021
@nalimilan nalimilan deleted the bk/destination_function branch October 10, 2021 15:07
@bkamins
Copy link
Member Author

bkamins commented Oct 10, 2021

Thank you!

Essentially as @jtrakk shows the use case is when the function name does not describe well the operation you perform and you prefer to give an explicit name for the transformation. This is especially useful when transformations are anonymous functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

In src => fun => dst allow transformation function in dst
3 participants