-
Notifications
You must be signed in to change notification settings - Fork 759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected memory management behavior #1056
Comments
Thanks. I think it's good timing to revisit our object model now, since recently some safety problems are pointed out aroung GILGuard. |
I'd love to read the 2nd draft. I'll shortly be using this project cautiously in production, so am excited to see where it's heading. |
Just some notes I learned from my first try at
|
I also have been thinking about our new API Because this gives an easy way to go from the owned reference into the gil-scoped reference, I'm also considering whether we should add new ways to create Python types with Focussing on
|
Finally, I wanted to talk about I think we can always make these tradeoffs better learning from community members who have feedback! |
To me, the imporant things to get right for this library are, in order:
Currently, 1 and 4 are pretty well nailed, but 2 and 3 are both being held back by object storage. Here's my attempt at a solution that does away with all thread locals and object storage. The first part of the solution is already solved by the current Instead, these are always wrapped by either:
I believe this does away with all thread locals and all object storage, and brings the performance and memory characteristics in line with the equivalent C. I'd be happy to have a go at this myself, but I don't want to step on other people's toes, or spend time on something that isn't wanted, or has already been tried and found problematic. |
My specific issue with the drop behaviour is that checking a thread-local is not zero cost, and if the GIL isn't held it leads to unbounded memory growth, and potentially a permanent memory leak if the GIL is never re-acquired. |
This comment has been minimized.
This comment has been minimized.
@etianen thanks for all the ideas. I agree that the direction I want for PyO3 is to remove as much object storage as possible. TLDR; Yes, I like this design and have considered it myself, but I'm not sure it's the right time to do it.I've thought many times about ideas similar to your Regarding Why I'm not sure now is the moment.The biggest downside of
If The result of this is that a series of chained PyO3 API calls gradually see the GIL lifetime which the compiler guarantees get narrower and narrower. It's not the end of the world, but it's quite an annoying papercut. The future solution I seeThere is, fortunately, a way that I can see this being resolved in the future: arbitrary self types. If we were able to use
With arbitrary self types, we really can have our cake and eat it! So this is a direction which I would like to see for PyO3, but I'm not actively pursuing it now because we need someone to champion getting the feature added to the Rust language first! |
Note that a "hack" to fix GIL lifetimes if we move to
In the case where the reference is owned, we know that the lifetime TBH I'd be very happy to listen to the PyO3 community to see whether they would prefer migrate to this API now, at the cost of some lifetime frustration, or to wait for the more ergonomic future. |
The alternative (acquiring the GIL) is significantly slower than checking the "not zero cost" thread-local. (You can easily verify this by modifying the If we were to remove the To migrate to the case where
When I've looked at this in the past, I've come to the conclusion that the (OTOH, I 100% agree with you that the unbounded memory growth that can occur from |
It's clear that you've considered a lot of these issues already, so thanks for explaining! On reflection, I can see that there's still a cause for a naked
I think your Why I think now is the timeIt's currently impossible / very difficult to write a Python extension that deals with siginificant numbers of objects, and that is a big chunk of valuable use-cases. Using Arbitrary self-types might never land, or might land in a long time. Switching to It's a small hit to ergonomics vs a 10% performance boost and no unbounded memory growth. That seems a clear win, IMO. The drop behavior of
|
Clone/Drop strategy | Pro | Con |
---|---|---|
Always acquire GIL, immediately update refcount | Simple, no additional state | Slow |
Check for GIL using `PyGILState_Check` (or a TLS if faster), acquire GIL if needed, immediately update refcount | Still simple, no additional state, hopefully fast is the GIL is held | Slow if GIL not held |
Check for GIL using `PyGILState_Check` (or a TLS if faster), immediately update refcount if possible, else defer until GIL next held | Fast regardless of GIL state | Unbounded memory growth |
Whichever strategy is chosen, a programmer can use ManuallyDrop
and into_ref()
to perform a zero-cost drop of one or more Py<T>
, so this really comes down to which strategy is the default for correctness and good-enough lazy programming. I personally find the potential for unbounded memory growth a little worrying, but I appreciate the different tradeoffs here. I'd vote for the second strategy (assuming reasonably performant), but it's not a hill I'd die on. 😛
I've largely said my piece here, but just to add: I'm very reassured that you see the I can't use this library in earnest until that issue is solved, but it remains an excellent project. I'm happy to help out if it makes this happen earlier, but otherwise I'll be watching the changelog eagerly! |
Thank you but I'm thinking about a completely different one: |
@kngwyu would love to see some detailed notes on what you're thinking! |
Just FYI; while I think we're getting close to considering |
Note to self - pantsbuild/pants#13526 (comment) is interesting. (Potentially |
2864: Add a section on memory management for `extension` r=davidhewitt a=haixuanTao Adding a special case of memory management when writing an extension. This is a documentation of: #1056 and #2853 Co-authored-by: Haixuan Xavier Tao <tao.xavier@outlook.com> Co-authored-by: David Hewitt <1939362+davidhewitt@users.noreply.github.com>
With the release of the Bound API in 0.21 (beta), there is finally a resolution to this problem. 🎉 Migrating to the Bound API will solve this problem immediately for code which needs this fixed, and the follow-up to this work is #3960 to remove the existing GIL Refs API. |
Incredible, I've been watching and hoping for a long while for this to be sorted! I'm really glad it was included for 1.0. ❤️ |
I'm very excited about this library. The effort going into documentation and ergonomics is fantastic. I've been waiting to use it on stable Rust for months, and have finally started trying to integrate it into my CPU-bound Python codebase to write extension modules.
Right now,
pyo3
seems to have a serious issue to me. The use of the object storage (even the lighter #887 implementation) means that this library is not zero-cost, and worse, can easily leak memory. The additional management layer that's built on top of Python's own reference counting, while ergonomic, is a big problem for my use-case.The current situation, as I understand it:
GilGuard
acquire overhead (minimal, and rare):Mutex
on thePyObject
alloc/dealloc array.Vec
to copy thePyObject
alloc/dealloc array out of TLS.Vec
for storing pointers to owned references.&PyAny
owned reference creation (mostly method returns, cheap, but very frequent):Vecs
as they grow.PyObject
drop (cheap, but frequent):Mutex
on the dealloc array.GitGuard
acquireThe regular access to TLS and global mutexes are regrettable, but the memory-freeing behavior is really worrying to me. I'm not writing Python extensions because I want pretty-damn fast and unbounded memory growth. I'm going to the effort of writing a Python extension because I need as fast as possible and predictable memory usage.
The solution outlined in #885 seems perfect. A
PyAny<'a>
has no need for object storage, andClone
andDrop
work as one would expect. I'd even go one stage further and havePyObject
andPy<T>
always acquire the GIL forClone
andDrop
. Sure, it's slower than using the current TLS and deferred drop, but it's simpler, more predictable, and people can always useManuallyDrop
turn them back intoPyAny<'a>
to batch clone/drop if performance is suffering.You could even remove the mirrored object API on
PyObject
and require that they're converted intoPyAny<'a>
tocall
orgetattr
, makingPyObject
's only purpose to be storing a long-term reference to a Python object between GIL acquisitions.At that point, there's no additional global state beyond what's already provided by the Python interpreter. Things are deallocated predictably. There's a smaller API surface. The library is truly zero-cost and adds no additional complexity.
I would urge you to be bold with making breaking changes while the library is young and the userbase relatively small. It would be a trememdous shame if the most popular Rust/Python bindings were significantly slower than the equivalent C, and introduced an unexpected additional memory model.
In any case, thanks for all the work you've put in so far. :)
The text was updated successfully, but these errors were encountered: