-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implementation proposal of column renaming #2313
Conversation
What is the advantage of accepting a string rather a helper function for If we are accepting strings in this way shall we also accept |
Are |
@oxinabox - these are exactly the things I wanted to discuss 😄.
It is just shorter to write
We could, but do you think anyone would want to specify the suffix as a
They can be easily added, but they are not fully general (we need to handle joining more than two data frames anyway). I felt that having one kwarg was cleanest to start with, but this is exactly the point I was not sure what would be best. |
I think so, because people have a lot of habit of working with Symbols when thinking about DataFrames. |
Sounds good. I agree it's nice to allow passing a string to suffix for simplicity (like other implementations to). Then we can discuss providing renaming functions like I would also allow passing a pair when there are only two inputs (like |
I have fully updated the PR. For now I decided not to support the cases of passing more than 2 data frames (we can do it later), as for now for 2 data frames we can use This PR is relatively important because it turned out that I need to rewrite the A review would be welcome (if you find my English unpolished - which is likely - just please suggest corrections directly). Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documenting the row order is nice, but you noted somewhere that performance improvements could require changing it. Do you think that's likely? Should we mention that it might change in the future?
This note was after this PR was last updated. I would say that it is almost sure we will change the order (especially that e.g. what |
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
For now I defensively say that the row order is undefined everywhere, but it is likely that for some of the joins we will be able to guarantee something in the future. In particular note that the key things that might influence row order in the future are:
|
src/abstractdataframe/join.jl
Outdated
joined, left_indicator, right_indicator = compose_joined_table(joiner, kind, | ||
update_row_maps!(joiner.dfl_on, joiner.dfr_on, group_rows(joiner.dfr_on), | ||
true, false, true, false)..., | ||
makeunique, left_rename, right_rename, nothing) | ||
inner_row_maps..., makeunique, left_rename, right_rename, nothing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And then how about doing this?
joined, left_indicator, right_indicator =
compose_joined_table(joiner, kind, inner_row_maps...,
makeunique, left_rename, right_rename, nothing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is better indeed. fixed
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
resolved conflicts with "drop deprecations" PR. |
Thank you! |
Fixes #1333.
I have implemented the code of the logic. It would be good to validate it. Thank you for anyone agreeing to do this.
Now the key question is about the API. What keyword argument names we should provide and what forms of passed values should they accept (just note that we need to handle renaming of more than 2 data frames - that is why internally it will be a vector, but maybe other special syntaxes should be allowed in the simpler forms).
As usual CC: @nalimilan + @oxinabox + @pdeffebach 😄.