-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: performance regression in replace() corner cases #38086
Comments
Ah, the huge one on ASV is for Now, about the actual example, also for the case where actually a value can be replaced (correct dtype, so much less of a corner case), there seems a slowdown:
It might have been a special case for values that were not found. ASV indicates this commit range: 0755915...dbee8fa, from which #37704 seems the obvious related one cc @jbrockmendel |
Taking a quick look at the profiles: before, it seems to be implemented with a "putmask" approach, while on master a lot of time is spent in a "compare_or_regex_search" function |
totally plausible, as that PR was before you pointed out that |
restoring the shortcut for that cuts it down from 1.2s to about 400ms, but virtually all of whats left is in block.copy still looking into the other case |
and using missing.mask_missing for non-object dtype brings the other one down to 460ms, slightly under 1.1.4 |
ASV shows a gigantic regression (14629.25x) in a certain
replace
benchmark: https://pandas.pydata.org/speed/pandas/#replace.ReplaceList.time_replace_list?python=3.8&Cython=0.29.21&p-inplace=True&commits=07559156-dbee8faeThe simplified case is:
Compared to 1.1, I don't see such a huge difference, but still a decent slowdown (x10).
Now, in this case, we have integer columns, but trying to replace infinity, which of course can never be present. So maybe before we had some shortcut for that.
This also seems quite a cornercase, though. So not sure how critical the regression is.
The text was updated successfully, but these errors were encountered: