-
Notifications
You must be signed in to change notification settings - Fork 782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"lazy" state of PyErr
... has challenges
#4584
Comments
Hmm, can you say more about (3)?
…On Fri, Sep 27, 2024 at 11:50 AM David Hewitt ***@***.***> wrote:
PyErr contains some arcane trickery internally to avoid creating a Python
exception object, by making some "lazy" state which boxes a PyErrArguments
object.
While the original design predates me, I believe that it was done for
performance to avoid creating a Python exception object inside of pure-Rust
code.
I think there might be a few reasons to revisit this design.
1. I think the performance motivation is not clear-cut; it makes
PyO3's error handling relatively expensive because every error pathway has
to box up the state into a dyn Trait before then creating a Python
exception.
2. I think we've learned that a better design is to allow APIs to be
generic on the error type where relevant, i.e. in IntoPyObject (and I
guess we could do the same in FromPyObject). The proc macros have for
a long time allowed any error type which is convertible to PyErr, too.
3. I realize as I write this issue that PyErrState::normalize is not
sound under the freethreaded build: it uses the GIL for synchronization.
4. The boxed PyErrArguments state is not traversible by the Python GC.
In theory this means that the state can contain arbitrary Python objects,
could therefore participate cycles and cause memory leaks. We would need to
make breaking changes to the trait to fix this.
I propose that instead we remove the ability for PyErr to be lazy and
make it a thin wrapper around Py<PyBaseException>. This is essentially
the direction that Python went in 3.12 for the global SetRaisedException
ffi calls.
It would have the effect of creating a PyErr to change performance
profile; instead of creating a boxed lazy exception state it would
immediately create the Python object. This might be faster, it might be
slower. But I think it would simplify things which might overall improve
things.
Due to point 3, i.e. that it's not threadsafe, I'm tempted to just try to
land this ASAP on 0.23. cc @ngoldbaum <https://github.com/ngoldbaum>
I could at least make a branch and see what the benchmarks say.
—
Reply to this email directly, view it on GitHub
<#4584>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAAAGBHKOTZPQ2TEABBWC3LZYV5FFAVCNFSM6AAAAABO7N27JKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGU2TGMRVGMYDINI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
All that is necessary for evil to succeed is for good people to do nothing.
|
The |
Ah, yes, sorry the unsoundness is really in |
This whole thing is really just a GILOnceCell, right? |
Not exactly, it consumes it's original state and rebuilds it in place when normalizing. You're right it's write once, though. I think the in place rebuild implies a lock if we wanted to keep the current design. |
I realize that one option is to leave the design as is and just replace the That might be good enough for 0.23, however I still feel removing this lazy behaviour might be the correct long term option. |
I'm considering whether we should just have two error types, one being the cheap one and the other holding the python exception object (Maybe we could represent this with a generic parameter). We can probably keep the breaking changes to a minimum by having the default parameter be the uninstantiated variant and having most of the conversions happen inside a proc macro. |
I'm open to this, can you maybe elaborate a bit further? I think you're right that ripping the lazy state out is maybe more breaking than I originally considered, so we should think more carefully. I think based on @alex's observation that this is basically a |
#4764 makes me more convinced that we need to take the "lazy" state out - while in the particular polars case there the exceptions are coming from Python and don't need to use the lazy mechanism, in principle Rust-originating exceptions which use the lazy mechanism would trigger a similar deadlock. There's just too much going on behind the scenes in these types, and there's enough places in PyO3 now that we allow errors-convertible-to-PyErr that I'm very much in favour of removing lazy state for 0.24. Another argument in favour as well beyond removing all the hidden complexity: this might improve performance and stack memory consumption. Because of the heavy weight of
... if |
PyErr
contains some arcane trickery internally to avoid creating a Python exception object, by making some "lazy" state which boxes aPyErrArguments
object.While the original design predates me, I believe that it was done for performance to avoid creating a Python exception object inside of pure-Rust code.
I think there might be a few reasons to revisit this design.
dyn Trait
before then creating a Python exception.IntoPyObject
(and I guess we could do the same inFromPyObject
). The proc macros have for a long time allowed any error type which is convertible to PyErr, too.PyErrState::normalize
is not sound under the freethreaded build: it uses the GIL for synchronization.PyErrArguments
state is not traversible by the Python GC. In theory this means that the state can contain arbitrary Python objects, could therefore participate cycles and cause memory leaks. We would need to make breaking changes to the trait to fix this.I propose that instead we remove the ability for
PyErr
to be lazy and make it a thin wrapper aroundPy<PyBaseException>
. This is essentially the direction that Python went in 3.12 for the globalSetRaisedException
ffi calls.It would have the effect of creating a
PyErr
to change performance profile; instead of creating a boxed lazy exception state it would immediately create the Python object. This might be faster, it might be slower. But I think it would simplify things which might overall improve things.Due to point 3, i.e. that it's not threadsafe, I'm tempted to just try to land this ASAP on 0.23. cc @ngoldbaum
I could at least make a branch and see what the benchmarks say.
The text was updated successfully, but these errors were encountered: