Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mystery: bi-modal benchmarks #40066

Closed
jbrockmendel opened this issue Feb 25, 2021 · 1 comment
Closed

Mystery: bi-modal benchmarks #40066

jbrockmendel opened this issue Feb 25, 2021 · 1 comment
Labels
Benchmark Performance (ASV) benchmarks

Comments

@jbrockmendel
Copy link
Member

Running some asvs and seeing changes in benchmarks that should be unaffected, yet those changes are consistent across runs.

Hunting this down, I'm replicating the most-weird benchmark (asv reports 1.27±0ms vs 789±3μs) in ipython

arr = np.ones(10**6, dtype="int8")
ser = pd.Series(arr)

func = ser.sum

%timeit func()

The first few runs matched the asv results really closely:

In [4]: %timeit func()                                                                                                                           
781 µs ± 552 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)  # <-- branch
1.27 ms ± 1.82 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)  # <-- master

But running it again just for kicks the results flipped:

In [5]: %timeit func()                                                                                                                           
1.27 ms ± 5.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)  # <-- branch
780 µs ± 4.81 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)  # <-- master

In [5]: %timeit func()
1.26 ms ± 691 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)   # <-- branch
1.26 ms ± 578 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)  # <-- master

In [10]: %timeit func()        
1.26 ms ± 818 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)  # <-- branch
773 µs ± 200 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)  # <-- master

Eventually the ~780 number stopped showing up and got replaced a mode between 895-907, but the 1.26-1.27 mode hasn't moved a bit.

No obvious pattern as to which mode will show up in any given run (through they are positively auto-correlated). But in the asv results there must be something causing the modes to be split between the two branches.

I get the same behavior even if I use the same branch, just different terminal windows.

Looking at this with %prun -s cumtime for i in range(1000): func() points at {method 'reduce' of 'numpy.ufunc' objects}

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.796    0.796 {built-in method builtins.exec}
        1    0.001    0.001    0.796    0.796 <string>:1(<module>)
     1000    0.001    0.000    0.795    0.001 generic.py:10747(sum)
     1000    0.001    0.000    0.794    0.001 generic.py:10489(sum)
     1000    0.002    0.000    0.793    0.001 generic.py:10454(_min_count_stat_function)
     1000    0.004    0.000    0.791    0.001 series.py:4133(_reduce)
     1000    0.004    0.000    0.778    0.001 nanops.py:82(_f)
     1000    0.001    0.000    0.763    0.001 nanops.py:392(new_func)
     1000    0.003    0.000    0.761    0.001 nanops.py:528(nansum)
     1000    0.001    0.000    0.738    0.001 {method 'sum' of 'numpy.ndarray' objects}
     1000    0.000    0.000    0.738    0.001 _methods.py:45(_sum)
     1000    0.737    0.001    0.737    0.001 {method 'reduce' of 'numpy.ufunc' objects}    # <--
     1000    0.003    0.000    0.017    0.000 nanops.py:256(_get_values)
     4000    0.005    0.000    0.010    0.000 _ufunc_config.py:32(seterr)

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    1.292    1.292 {built-in method builtins.exec}
        1    0.001    0.001    1.292    1.292 <string>:1(<module>)
     1000    0.001    0.000    1.292    0.001 generic.py:10747(sum)
     1000    0.001    0.000    1.291    0.001 generic.py:10489(sum)
     1000    0.002    0.000    1.290    0.001 generic.py:10454(_min_count_stat_function)
     1000    0.004    0.000    1.288    0.001 series.py:4133(_reduce)
     1000    0.004    0.000    1.275    0.001 nanops.py:82(_f)
     1000    0.001    0.000    1.260    0.001 nanops.py:392(new_func)
     1000    0.002    0.000    1.258    0.001 nanops.py:528(nansum)
     1000    0.001    0.000    1.235    0.001 {method 'sum' of 'numpy.ndarray' objects}
     1000    0.000    0.000    1.234    0.001 _methods.py:45(_sum)
     1000    1.234    0.001    1.234    0.001 {method 'reduce' of 'numpy.ufunc' objects}   # <--
     1000    0.003    0.000    0.018    0.000 nanops.py:256(_get_values)
     4000    0.005    0.000    0.010    0.000 _ufunc_config.py:32(seterr)

The total difference is 1.292 - 0.796 = 0.496 and the difference in the reduce ufunc is 1.234 - 0.737 = 0.497, which is pretty suggestive.

@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 25, 2021
@jorisvandenbossche
Copy link
Member

https://vstinner.readthedocs.io/benchmark.html might have some useful info / links

@jbrockmendel jbrockmendel added Benchmark Performance (ASV) benchmarks and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Benchmark Performance (ASV) benchmarks
Projects
None yet
Development

No branches or pull requests

2 participants