Add Python test infrastructure for testing memory usage limits #15231

itamarst · 2024-03-22T13:58:29Z

Description

Motivation

Polars makes a variety of implicit or explicit guarantees about memory usage. For example:

Only the specifically requested columns will be loaded into memory when reading files. High memory usage reading single column with read_parquet #15098 was a regression that resulted in very significant memory usage increase for users of read_parquet() in some situations.
Streaming APIs will process data in chunks, keeping data from all being loaded in memory.

Unfortunately, bugs like #15098 won't be caught by normal tests. There is a need for test infrastructure specifically focused on measuring memory usage.

In particular, the goal is to measure peak memory usage, because that is the bottleneck relevant to users. (Allocating 100MB and freeing it in a loop 10 times means you've allocated a total 1GB, but peak usage is still only 100MB. This is very different than allocating 1GB at once, which has a peak of 1GB.)

Possible requirements

Keeping existing allocators in place is helpful: it's a pretty core API, so making it significantly different between CI runs and released code is not ideal for testing purpose.

Options

Memray

pytest-memray uses the Memray memory profiler, which works on the level of operating system allocator hooks (malloc/free/calloc/mmap). Memray doesn't support Windows.

Benefits are that it tracks everything, regardless of source.

Downsides:

Insofar as Polars uses jemalloc on platforms supported by Memray, catching smaller allocations probably won't work well, since the allocations will only be caught by Memray via the larger mmap() chunks jemalloc uses for its memory pools. Options to deal with this:

Tests for memory usage would need to allocate sufficiently large amounts of memory to not fit in an existing jemalloc pool.
Disable jemalloc in test builds used to run tests, allowing smaller allocations to be tracked.

Another downside is that memray impacts every test, not just those that care about memory usage. This adds some overhead, though perhaps not enough to matter in practice.

Finally, very much tied to Python.

jemalloc or mimalloc debugging hooks

jemalloc has profiling APIs, which can be turned on or off at runtime (if it's compiled with it). https://www.magiroux.com/rust-jemalloc-profiling/ talks about it a bit. It's not clear if they actually give peak memory, which is what one cares about, and it seems somewhat oriented towards dump-and-analyze later.

Not sure mimalloc has this.

The downside is that this is limited to only Rust allocations, you wouldn't see Python or NumPy or PyArrow allocations.

tracemalloc

tracemalloc is Python's built-in memory API. It makes it very easy to get peak memory over a period of time, and you can choose to only enable it for specific tests.

By default Polars' allocations won't get tracked by tracemalloc. However, there's a C API for registering allocations which Polars could use.

Essentially this would involve having a wrapper for the global allocator uses in the Rust extensions. It could only be enabled in test builds, or it could always there but disabled by default.

Benefits:

Tests could use small allocations.
Works on all platforms where Python runs.
NumPy hooks into Tracemalloc.

Downsides:

It's not clear if e.g. PyArrow does (I can check, I'm guessing not), so it's not clear if everything can be tracked this way. PyArrow does have its own hooks.
Python-specific.

`peak_alloc`

This is a global allocator that lets you get peak allocated memory easily. This could be conditionally enabled for easy Rust-only testing.

Hybrid options

Use e.g. peak_alloc + tracemalloc + PyArrow hooks to figure out peak memory.

Notes

It appears you can plug in new memory pools for PyArrow, so one could create a new one for a test and then use its max_memory() method.

The text was updated successfully, but these errors were encountered:

itamarst · 2024-03-22T16:59:15Z

Based on the above, it seems like the best initial approach is probably:

Create a wrapper global allocator for Rust, that wraps jemalloc/mimalloc depending on platform, and registers memory with tracemalloc.
Use this wrapper only for dev profiles (doing it in release would also be nice and probably low overhead, but requires benchmarking so probably out of scope for this issue).
Add some PyArrow utility.
Write tests for the above, to make sure they're actually tracking things!
Write test infrastructure that allows asserting peak memory doesn't get above a certain level.
Write a demonstration test focusing on read_parquet().

itamarst · 2024-03-22T20:42:32Z

OK, I have this mostly working, will finish up and do PR on Monday.

itamarst added the enhancement New feature or an improvement of an existing feature label Mar 22, 2024

itamarst mentioned this issue Mar 25, 2024

test(python): Memory usage test infrastructure, plus a test for #15098 #15285

Merged

ritchie46 closed this as completed in #15285 Mar 28, 2024

itamarst mentioned this issue Mar 28, 2024

Minimal tests for memory usage contraints in partial-column read APIs #15371

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Python test infrastructure for testing memory usage limits #15231

Add Python test infrastructure for testing memory usage limits #15231

itamarst commented Mar 22, 2024 •

edited

Loading

itamarst commented Mar 22, 2024 •

edited

Loading

itamarst commented Mar 22, 2024

Add Python test infrastructure for testing memory usage limits #15231

Add Python test infrastructure for testing memory usage limits #15231

Comments

itamarst commented Mar 22, 2024 • edited Loading

Description

Motivation

Possible requirements

Options

Memray

jemalloc or mimalloc debugging hooks

tracemalloc

peak_alloc

Hybrid options

Notes

itamarst commented Mar 22, 2024 • edited Loading

itamarst commented Mar 22, 2024

itamarst commented Mar 22, 2024 •

edited

Loading

`peak_alloc`

itamarst commented Mar 22, 2024 •

edited

Loading