Benchmarks grant #45049

datapythonista · 2021-12-24T12:51:36Z

This issue is to provide information, and allow discussion, on the NumFOCUS small development grant to improve benchmarks for pandas (and possibly other projects of the ecosystem). The content of this issue is work in progress, and will be updated as needed.

List of some of the tasks identified to improve asv itself:

Evaluate the need of forking asv, and fork if needed. The project looks abandoned, and maintainers have been contacted, but no answer yet.
Update the style of asv (codebase currently is not following PEP-8), and remove Python 2 compatibility code (and six dependency)
Evaluate if it make sense to allow asv to be imported in benchmarks, and possibly implement it
Evaluate and possibly implement API changes to asv, to improve readability, usability and code clarity. For example, would it be a good idea to provide an abstract base class for benchmarks?
Evaluate using conbench or codespeed as the UI for benchmarks, and possibly make asv compatible with one or more external UI's, and drop asv UI
Find code not in use anymore, and remove it

Work on the pandas benchmarks:

Make benchmarks run faster, analyzing slower benchmarks, and by using less data when make sense, or avoiding unnecessary parametrizations PERF: Some asv tests take a long time to setup #16803 CI: benchmark build is taking a long time #44450 (comment)
Improve the CI for pandas benchmarks. Making them more reliable (we're currently using a grep to see if the build has failed). We can also check if the logs can be improved
Review benchmark structure, see if it can be improved so benchmarks are easier to find, and remove duplicate benchmarks if it makes sense
Review existing bechmarks issues, discuss with the pandas team on any that should be prioritizes, and work on the important ones

Work on infrastructure:

Evaluate options to replace the current pandas benchmarks infrastructure and coordinate with NumFOCUS
Add hooks and notifications that make sense to detect performance regressions as soon as possible

List of some other projects using asv:

numpy (NumFOCUS sponsored)
scipy (NumFOCUS sponsored)
scikit-learn (NumFOCUS sponsored)
scikit-image (NumFOCUS sponsored)
xarray (NumFOCUS sponsored)
pymc (NumFOCUS sponsored)
pvlib (NumFOCUS affiliated)
modin
vaex
ibis
radis
pandera
msprime
xitorch
kartothek
pyphi
qcodes

Feedback very welcome

CC: @dorothykiz1 @LucyJimenez
CC: @pandas-dev/pandas-core @rgommers

The text was updated successfully, but these errors were encountered:

rgommers · 2021-12-24T13:00:06Z

Thanks for leading this effort @datapythonista!

Cc'ing a few more folks from NumPy/SciPy/scikit-image who I know will be interested in this topic: @mattip, @stefanv, @mdhaber.

mattip · 2021-12-24T13:18:21Z

Maybe I do not remember correctly, but it seemed to me asv is a wrapper around the pyperf package for benchmarking. I do like the idea of separating the benchmark runner from the GUI data presentation. Another presentation framework is codespeed, which is used by speed.python.org and speed.pypy.org. I could not find a way to get an overview on conbench (like asv and codespeed provide). I like the way the overview easily lets one see what has changed over time across all the benchmarks.

There is a lot to discuss around benchmarking itself: microbenchmarking vs. application benchmarking, stability, what to measure, how much can be done on CI hosted machines vs. private runners.

mdhaber · 2021-12-25T02:17:54Z

This sounds great. If you decide to fork asv, I'd be interested in testing the fork.

In case you are interested in suggestions that would help maintainers of other projects, I'll include thoughts from my experience with ASV below.

Please let me know if either of these should be filed as issues at https://github.com/airspeed-velocity/asv. - I haven't checked recently, but in the past I've been unable to use conda-installed ASV on my Windows machine (see specific error [here](https://github.com/scipy/scipy/pull/12732#issuecomment-674595399)). The only way I've been able to get ASV to work in my conda environment on Windows is to install it with pip. Not good practice, I know, but that's what works. - Sometimes the author of a benchmark wants to time code execution _and_ track something about that code execution without executing that code twice. It sounds like there is not a simple way to do this (see [here](https://github.com/scipy/scipy/pull/10762#issuecomment-567322968)).

If you decide to fork, please consider changing the following behaviors.

When a benchmark exceeds the default time limit, ASV reports it as having "FAILED". I think it would be useful to distinguish timeout from other failures. (I also think having a default time limit is surprising.)
I think the current default is to silence stderr, so if a test fails, benchmarks need to be re-run to see what went wrong. Should --show-stderr be the default? If not, could it be logged by default?
I think the default is to run benchmarks against master instead of HEAD of the current branch. I understand that there are arguments both ways, but defaulting to master seems surprising to me (and at least some others - see BENCH: Run against current HEAD instead of master scipy/scipy#11998).

The rest of these may be due to user error. Nonetheless, they might indicate that there is something to be improved.

I use a site.cfg so that BLAS/LAPACK can be found when I build SciPy, but BLAS/LAPACK still can't be found when ASV tries to build SciPy. I get around this by manually copying openblas.a into the ASV environment's Lib folder. Is it possible for ASV to use the information in a project's site.cfg?
What is the recommended workflow for debugging and fixing a benchmark that is failing? (Is it to use --quick --show-stderr? Sometimes fixing bugs requires iteration, and even with the --quick option, there is still a long delay before the benchmarks start running. It is often faster for me to manually instantiate the class representing the benchmark and call its methods. That way I can interact with the stack trace and even run in debug mode. Should there be a simpler way to execute a benchmark within the Python prompt?)
Perhaps I need to read the documentation more carefully, but I am confused about how many times benchmarks run to generate timing and tracking results. I see from here that "The timing itself is based on the Python standard library’s timeit module, with some extensions for automatic heuristics shamelessly stolen from IPython’s %timeit magic function. This means that in most cases the benchmark function itself will be run many times to achieve accurate timing." Is there a way to control this other than the --quick option, which runs the test only once? Are track_ benchmarks run multiple times, too?

astrojuanlu · 2022-01-25T14:42:53Z

Have you folks looked into combining asv with pytest-benchmark? Looks like there have been proposals on both sides ionelmc/pytest-benchmark#208, airspeed-velocity/asv#567 but nothing conclusive. Both projects overlap a lot since both have

A way to write benchmarks,
A way to store benchmark data, track performance, and detect regressions

See also asvdb, explained in this comment:

Our use case is basically what's been described here: we have a large test suite already written with different tools (including pytest-benchmark) and we wanted to take advantage of asv's awesome viewer features without having to rewrite our entire test suite.

JulianWgs · 2022-02-12T15:43:41Z

Hello all,

it's great to hear that pandas gets improvements regarding benchmarks. I see that you currently use asv which is a great tool. I just want to leave a link to blog post I read about consistent benchmarking in CI systems.

The author uses Cachegrind to count executed CPU instruction and other metrics to calculate a value which is correlated to the actual execution time. Its the same tool sqlite uses for its benchmarks. The blog post sadly doesn't look into concurrency.

Best regards
Julian

PS:
Netflix posted about detecting performance regression last month.

datapythonista · 2022-02-16T12:32:48Z

Thanks @JulianWgs for sharing those. Very interesting. Not in scope for the current grant, but very useful for the future.

datapythonista added the Benchmark Performance (ASV) benchmarks label Dec 24, 2021

datapythonista mentioned this issue Dec 24, 2021

Possible fork of asv airspeed-velocity/asv#982

Closed

mroeschke added the Admin Administrative tasks related to the pandas project label Dec 27, 2021

datapythonista mentioned this issue Apr 1, 2022

WEB: Add benchmarks grant write up as a blog post #46483

Merged

mroeschke closed this as completed in #46483 Apr 7, 2022

ian-r-rose mentioned this issue Jun 22, 2022

Choose benchmarking tool coiled/benchmarks#186

Closed

nstarman mentioned this issue May 16, 2024

How we do benchmarking astropy/astropy-benchmarks#120

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks grant #45049

Benchmarks grant #45049

datapythonista commented Dec 24, 2021 •

edited

Loading

rgommers commented Dec 24, 2021

mattip commented Dec 24, 2021

mdhaber commented Dec 25, 2021

astrojuanlu commented Jan 25, 2022 •

edited

Loading

JulianWgs commented Feb 12, 2022

datapythonista commented Feb 16, 2022

Benchmarks grant #45049

Benchmarks grant #45049

Comments

datapythonista commented Dec 24, 2021 • edited Loading

rgommers commented Dec 24, 2021

mattip commented Dec 24, 2021

mdhaber commented Dec 25, 2021

astrojuanlu commented Jan 25, 2022 • edited Loading

JulianWgs commented Feb 12, 2022

datapythonista commented Feb 16, 2022

datapythonista commented Dec 24, 2021 •

edited

Loading

astrojuanlu commented Jan 25, 2022 •

edited

Loading