Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Establish performance testing framework and benchmarks #729

Closed
duckontheweb opened this issue Jan 24, 2022 · 3 comments · Fixed by #748
Closed

Establish performance testing framework and benchmarks #729

duckontheweb opened this issue Jan 24, 2022 · 3 comments · Fixed by #748
Assignees
Labels
discussion An issue to capture a discussion enhancement

Comments

@duckontheweb
Copy link
Contributor

A few issues have come up over the past few months related to slow performance within the library. We do not currently have any benchmarks for memory usage or runtimes built into our testing suite, so it is hard to catch performance regressions or evaluate possible performance improvements.

The goal of this issue is to articulate a plan for a first pass at performance benchmarking within the library. This should include deciding which parts of the code we want to benchmark and selecting a performance benchmarking library (or libraries) to use in our testing suite.

I do not have extensive personal experience in this area, so input from others who have set up performance testing frameworks in the past is highly desired. To start the discussion, here are some initial thoughts:

What to Benchmark?

What Tool to Use?

Given the interest in using async and multithreaded techniques to speed up performance (see #609 and #274) any library we choose should probably be able to handle asynchronous and multithreaded code. Here are some options from a brief survey of the landscape:

  • asv (airspeed velocity)

    Well-documented and supported speed benchmarking framework with built-in visualization capabilities. Not sure what it would look like to support asynchronous or multi-threaded code.

  • yappi

    Has support for async and gevent and multithreaded processing, seems well-supported, and claims to be very fast. Only profiles timing, not memory usage (as far as I can tell).

  • profile/cProfile

    Already included in the standard library, but would take some work to make it support multi-threaded or async code. Only profiles timing, not memory usage. Can be combined with snakeviz for visualizing results.

  • guppy3

    Memory profiler with some related blog posts on how to track down memory issues.

cc: @TomAugspurger @lossyrob @matthewhanson @gadomski @scottyhq

@duckontheweb duckontheweb added enhancement help wanted discussion An issue to capture a discussion labels Jan 24, 2022
@TomAugspurger
Copy link
Collaborator

https://tomaugspurger.github.io/maintaing-performance.html has a few notes on what we do for pandas, mostly using asv (I know there's a typo in the title).

The nice thing about asv it that it tracks benchmarks over time, making it relatively easy to detect performance regressions and tie them back to specific commits. You just need somewhere to run it (we could maybe use the same server sitting in my closet running the pandas benchmarks, but it's somewhat flaky).

I suspect, but am not sure, that you can profile async code with asv by creating an event loop in the setup method and making / running an asyncio.Task to run the actual async code.

@duckontheweb
Copy link
Contributor Author

asv also allows us to benchmark peak memory usage, object-specific memory size, and custom metrics (more detail here).

@guidorice
Copy link
Contributor

➕ 1 for benchmarking memory usage. I am trying to use pystac to iterate all the items and assets in a static catalog with over a million items, and I'm seeing too high memory usage. Will probably just process the catalog as raw json instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion An issue to capture a discussion enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants