You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Polars makes a variety of implicit or explicit guarantees about memory usage. For example:
Only the specifically requested columns will be loaded into memory when reading files. High memory usage reading single column with read_parquet #15098 was a regression that resulted in very significant memory usage increase for users of read_parquet() in some situations.
Streaming APIs will process data in chunks, keeping data from all being loaded in memory.
Unfortunately, bugs like #15098 won't be caught by normal tests. There is a need for test infrastructure specifically focused on measuring memory usage.
In particular, the goal is to measure peak memory usage, because that is the bottleneck relevant to users. (Allocating 100MB and freeing it in a loop 10 times means you've allocated a total 1GB, but peak usage is still only 100MB. This is very different than allocating 1GB at once, which has a peak of 1GB.)
Possible requirements
Keeping existing allocators in place is helpful: it's a pretty core API, so making it significantly different between CI runs and released code is not ideal for testing purpose.
Options
Memray
pytest-memray uses the Memray memory profiler, which works on the level of operating system allocator hooks (malloc/free/calloc/mmap). Memray doesn't support Windows.
Benefits are that it tracks everything, regardless of source.
Downsides:
Insofar as Polars uses jemalloc on platforms supported by Memray, catching smaller allocations probably won't work well, since the allocations will only be caught by Memray via the larger mmap() chunks jemalloc uses for its memory pools. Options to deal with this:
Tests for memory usage would need to allocate sufficiently large amounts of memory to not fit in an existing jemalloc pool.
Disable jemalloc in test builds used to run tests, allowing smaller allocations to be tracked.
Another downside is that memray impacts every test, not just those that care about memory usage. This adds some overhead, though perhaps not enough to matter in practice.
Finally, very much tied to Python.
jemalloc or mimalloc debugging hooks
jemalloc has profiling APIs, which can be turned on or off at runtime (if it's compiled with it). https://www.magiroux.com/rust-jemalloc-profiling/ talks about it a bit. It's not clear if they actually give peak memory, which is what one cares about, and it seems somewhat oriented towards dump-and-analyze later.
Not sure mimalloc has this.
The downside is that this is limited to only Rust allocations, you wouldn't see Python or NumPy or PyArrow allocations.
tracemalloc
tracemalloc is Python's built-in memory API. It makes it very easy to get peak memory over a period of time, and you can choose to only enable it for specific tests.
By default Polars' allocations won't get tracked by tracemalloc. However, there's a C API for registering allocations which Polars could use.
Essentially this would involve having a wrapper for the global allocator uses in the Rust extensions. It could only be enabled in test builds, or it could always there but disabled by default.
Benefits:
Tests could use small allocations.
Works on all platforms where Python runs.
NumPy hooks into Tracemalloc.
Downsides:
It's not clear if e.g. PyArrow does (I can check, I'm guessing not), so it's not clear if everything can be tracked this way. PyArrow does have its own hooks.
Based on the above, it seems like the best initial approach is probably:
Create a wrapper global allocator for Rust, that wraps jemalloc/mimalloc depending on platform, and registers memory with tracemalloc.
Use this wrapper only for dev profiles (doing it in release would also be nice and probably low overhead, but requires benchmarking so probably out of scope for this issue).
Add some PyArrow utility.
Write tests for the above, to make sure they're actually tracking things!
Write test infrastructure that allows asserting peak memory doesn't get above a certain level.
Write a demonstration test focusing on read_parquet().
Description
Motivation
Polars makes a variety of implicit or explicit guarantees about memory usage. For example:
read_parquet
#15098 was a regression that resulted in very significant memory usage increase for users ofread_parquet()
in some situations.Unfortunately, bugs like #15098 won't be caught by normal tests. There is a need for test infrastructure specifically focused on measuring memory usage.
In particular, the goal is to measure peak memory usage, because that is the bottleneck relevant to users. (Allocating 100MB and freeing it in a loop 10 times means you've allocated a total 1GB, but peak usage is still only 100MB. This is very different than allocating 1GB at once, which has a peak of 1GB.)
Possible requirements
Keeping existing allocators in place is helpful: it's a pretty core API, so making it significantly different between CI runs and released code is not ideal for testing purpose.
Options
Memray
pytest-memray
uses the Memray memory profiler, which works on the level of operating system allocator hooks (malloc/free/calloc/mmap). Memray doesn't support Windows.Benefits are that it tracks everything, regardless of source.
Downsides:
Insofar as Polars uses jemalloc on platforms supported by Memray, catching smaller allocations probably won't work well, since the allocations will only be caught by Memray via the larger mmap() chunks jemalloc uses for its memory pools. Options to deal with this:
Another downside is that memray impacts every test, not just those that care about memory usage. This adds some overhead, though perhaps not enough to matter in practice.
Finally, very much tied to Python.
jemalloc or mimalloc debugging hooks
jemalloc has profiling APIs, which can be turned on or off at runtime (if it's compiled with it). https://www.magiroux.com/rust-jemalloc-profiling/ talks about it a bit. It's not clear if they actually give peak memory, which is what one cares about, and it seems somewhat oriented towards dump-and-analyze later.
Not sure mimalloc has this.
The downside is that this is limited to only Rust allocations, you wouldn't see Python or NumPy or PyArrow allocations.
tracemalloc
tracemalloc is Python's built-in memory API. It makes it very easy to get peak memory over a period of time, and you can choose to only enable it for specific tests.
By default Polars' allocations won't get tracked by tracemalloc. However, there's a C API for registering allocations which Polars could use.
Essentially this would involve having a wrapper for the global allocator uses in the Rust extensions. It could only be enabled in test builds, or it could always there but disabled by default.
Benefits:
Downsides:
peak_alloc
This is a global allocator that lets you get peak allocated memory easily. This could be conditionally enabled for easy Rust-only testing.
Hybrid options
Use e.g.
peak_alloc
+tracemalloc
+ PyArrow hooks to figure out peak memory.Notes
It appears you can plug in new memory pools for PyArrow, so one could create a new one for a test and then use its
max_memory()
method.The text was updated successfully, but these errors were encountered: