-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance of transform! on SubDataFrame #3070
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These issues are tricky as usual...
Is there a performance impact for transform
when the number of columns is large? In theory, couldn't the check be restricted to columns that have been used as inputs of transformations?
src/subdataframe/subdataframe.jl
Outdated
function _replace_columns!(sdf::SubDataFrame, newdf::DataFrame) | ||
colsmatch = _names(sdf) == _names(newdf) | ||
|
||
function _replace_columns!(sdf::SubDataFrame, newdf::DataFrame, wastransform::Bool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a name like keep_present
would be easier to follow (i.e. say what it does rather than when it's set)? Could also make it a keyword argument, as it's hard to guess what it means when seeing just a Bool value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - I will change it.
src/dataframe/dataframe.jl
Outdated
@@ -1812,6 +1812,9 @@ end | |||
|
|||
# This is not exactly copy! as in general we allow axes to be different | |||
function _replace_columns!(df::DataFrame, newdf::DataFrame) | |||
# here we do not support wastransform argument like for SubDataFrame | |||
# because by default in DataFrame case columns are not copied | |||
# so we need to pass `:` to select! to handle this case correctly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get what select!
call this refers to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_replace_columns!
is used by transform!
and select!
. For DataFrame
objects transform!
falls back to select!
by passing extra :
(as opposed to what we now do in GroupedDataFrame
case).
I might drop this comment if you feel it is not useful. This is exactly related to the fact that select
for DataFrame
uses a less robust de-aliasing approach (checking input-output mapping) than the approach I proposed for GroupedDataFrame
with IdDict
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have simplified the comment
It is negligible, for 10,000 columns it is around 500μs:
It could, and this is what we do for |
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
Thank you! |
Fixes #3069.
What this PR addresses.
and before the PR:
select[!]
andtransform[!]
onGroupedDataFrame
After the PR:
and before the PR: