PERF: performance regression in replace() corner cases #38086

jorisvandenbossche · 2020-11-26T13:30:44Z

ASV shows a gigantic regression (14629.25x) in a certain replace benchmark: https://pandas.pydata.org/speed/pandas/#replace.ReplaceList.time_replace_list?python=3.8&Cython=0.29.21&p-inplace=True&commits=07559156-dbee8fae

The simplified case is:

In [5]: df = pd.DataFrame({"A": 0, "B": 0}, index=range(4 * 10 ** 7))

In [6]: %timeit df.replace([np.inf, -np.inf], np.nan)
100 ms ± 6.76 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)   # 1.1
1.2 s ± 31.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)  # master

Compared to 1.1, I don't see such a huge difference, but still a decent slowdown (x10).

Now, in this case, we have integer columns, but trying to replace infinity, which of course can never be present. So maybe before we had some shortcut for that.
This also seems quite a cornercase, though. So not sure how critical the regression is.

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2020-11-26T13:36:08Z

Compared to 1.1, I don't see such a huge difference, but still a decent slowdown (x10).

Ah, the huge one on ASV is for inplace=True, for the default of inplace=False that I used above, it gives a comparable difference on the online benchmarks.

Now, about the actual example, also for the case where actually a value can be replaced (correct dtype, so much less of a corner case), there seems a slowdown:

In [10]: %timeit df.replace([1, 2], np.nan)
224 ms ± 6.23 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)  # 1.1
1.12 s ± 64.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)  # master

It might have been a special case for values that were not found.

ASV indicates this commit range: 0755915...dbee8fa, from which #37704 seems the obvious related one cc @jbrockmendel

jorisvandenbossche · 2020-11-26T13:39:46Z

Taking a quick look at the profiles: before, it seems to be implemented with a "putmask" approach, while on master a lot of time is spent in a "compare_or_regex_search" function

jbrockmendel · 2020-11-26T16:06:47Z

totally plausible, as that PR was before you pointed out that np.putmask is faster than ndarray.__setitem__

jbrockmendel · 2020-11-26T16:26:11Z

So maybe before we had some shortcut for that.

restoring the shortcut for that cuts it down from 1.2s to about 400ms, but virtually all of whats left is in block.copy

still looking into the other case

jbrockmendel · 2020-11-26T16:34:46Z

and using missing.mask_missing for non-object dtype brings the other one down to 460ms, slightly under 1.1.4

jorisvandenbossche added Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version labels Nov 26, 2020

jorisvandenbossche added this to the 1.2 milestone Nov 26, 2020

jbrockmendel mentioned this issue Nov 26, 2020

PERF: replace_list #38097

Merged

5 tasks

jreback closed this as completed in #38097 Nov 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: performance regression in replace() corner cases #38086

PERF: performance regression in replace() corner cases #38086

jorisvandenbossche commented Nov 26, 2020

jorisvandenbossche commented Nov 26, 2020

jorisvandenbossche commented Nov 26, 2020

jbrockmendel commented Nov 26, 2020

jbrockmendel commented Nov 26, 2020

jbrockmendel commented Nov 26, 2020

PERF: performance regression in replace() corner cases #38086

PERF: performance regression in replace() corner cases #38086

Comments

jorisvandenbossche commented Nov 26, 2020

jorisvandenbossche commented Nov 26, 2020

jorisvandenbossche commented Nov 26, 2020

jbrockmendel commented Nov 26, 2020

jbrockmendel commented Nov 26, 2020

jbrockmendel commented Nov 26, 2020