Mystery: bi-modal benchmarks #40066

jbrockmendel · 2021-02-25T21:29:51Z

Running some asvs and seeing changes in benchmarks that should be unaffected, yet those changes are consistent across runs.

Hunting this down, I'm replicating the most-weird benchmark (asv reports 1.27±0ms vs 789±3μs) in ipython

arr = np.ones(10**6, dtype="int8")
ser = pd.Series(arr)

func = ser.sum

%timeit func()

The first few runs matched the asv results really closely:

In [4]: %timeit func()                                                                                                                           
781 µs ± 552 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)  # <-- branch
1.27 ms ± 1.82 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)  # <-- master

But running it again just for kicks the results flipped:

In [5]: %timeit func()                                                                                                                           
1.27 ms ± 5.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)  # <-- branch
780 µs ± 4.81 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)  # <-- master

In [5]: %timeit func()
1.26 ms ± 691 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)   # <-- branch
1.26 ms ± 578 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)  # <-- master

In [10]: %timeit func()        
1.26 ms ± 818 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)  # <-- branch
773 µs ± 200 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)  # <-- master

Eventually the ~780 number stopped showing up and got replaced a mode between 895-907, but the 1.26-1.27 mode hasn't moved a bit.

No obvious pattern as to which mode will show up in any given run (through they are positively auto-correlated). But in the asv results there must be something causing the modes to be split between the two branches.

I get the same behavior even if I use the same branch, just different terminal windows.

Looking at this with %prun -s cumtime for i in range(1000): func() points at {method 'reduce' of 'numpy.ufunc' objects}

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.796    0.796 {built-in method builtins.exec}
        1    0.001    0.001    0.796    0.796 <string>:1(<module>)
     1000    0.001    0.000    0.795    0.001 generic.py:10747(sum)
     1000    0.001    0.000    0.794    0.001 generic.py:10489(sum)
     1000    0.002    0.000    0.793    0.001 generic.py:10454(_min_count_stat_function)
     1000    0.004    0.000    0.791    0.001 series.py:4133(_reduce)
     1000    0.004    0.000    0.778    0.001 nanops.py:82(_f)
     1000    0.001    0.000    0.763    0.001 nanops.py:392(new_func)
     1000    0.003    0.000    0.761    0.001 nanops.py:528(nansum)
     1000    0.001    0.000    0.738    0.001 {method 'sum' of 'numpy.ndarray' objects}
     1000    0.000    0.000    0.738    0.001 _methods.py:45(_sum)
     1000    0.737    0.001    0.737    0.001 {method 'reduce' of 'numpy.ufunc' objects}    # <--
     1000    0.003    0.000    0.017    0.000 nanops.py:256(_get_values)
     4000    0.005    0.000    0.010    0.000 _ufunc_config.py:32(seterr)

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    1.292    1.292 {built-in method builtins.exec}
        1    0.001    0.001    1.292    1.292 <string>:1(<module>)
     1000    0.001    0.000    1.292    0.001 generic.py:10747(sum)
     1000    0.001    0.000    1.291    0.001 generic.py:10489(sum)
     1000    0.002    0.000    1.290    0.001 generic.py:10454(_min_count_stat_function)
     1000    0.004    0.000    1.288    0.001 series.py:4133(_reduce)
     1000    0.004    0.000    1.275    0.001 nanops.py:82(_f)
     1000    0.001    0.000    1.260    0.001 nanops.py:392(new_func)
     1000    0.002    0.000    1.258    0.001 nanops.py:528(nansum)
     1000    0.001    0.000    1.235    0.001 {method 'sum' of 'numpy.ndarray' objects}
     1000    0.000    0.000    1.234    0.001 _methods.py:45(_sum)
     1000    1.234    0.001    1.234    0.001 {method 'reduce' of 'numpy.ufunc' objects}   # <--
     1000    0.003    0.000    0.018    0.000 nanops.py:256(_get_values)
     4000    0.005    0.000    0.010    0.000 _ufunc_config.py:32(seterr)

The total difference is 1.292 - 0.796 = 0.496 and the difference in the reduce ufunc is 1.234 - 0.737 = 0.497, which is pretty suggestive.

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2021-03-09T21:43:10Z

https://vstinner.readthedocs.io/benchmark.html might have some useful info / links

jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 25, 2021

This was referenced Feb 26, 2021

PERF: NDArrayBacked in cython #40054

Closed

PERF: DataFrame.transpose with dt64tz #40149

Merged

jbrockmendel mentioned this issue Mar 8, 2021

Refactor - ArrayManager overview issue #39146

Closed

11 tasks

jbrockmendel added Benchmark Performance (ASV) benchmarks and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 6, 2021

jbrockmendel closed this as completed Jun 26, 2021

jbrockmendel mentioned this issue Oct 25, 2023

Set up benchmarks server #55007

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mystery: bi-modal benchmarks #40066

Mystery: bi-modal benchmarks #40066

jbrockmendel commented Feb 25, 2021

jorisvandenbossche commented Mar 9, 2021

Mystery: bi-modal benchmarks #40066

Mystery: bi-modal benchmarks #40066

Comments

jbrockmendel commented Feb 25, 2021

jorisvandenbossche commented Mar 9, 2021