-
Notifications
You must be signed in to change notification settings - Fork 783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible deadlock in impl Display for PyErr
#4764
Comments
Note that the warnings are converted to errors and so will lead to a |
I can reproduce. Now I understand why it only happens during pytest. Only there it triggers the error branch. |
Yes forgot to mention that. Sorry for the confusion. |
For full context. We can acquire the gil. We start multithreaded in a |
I wouldn't be surprised if bisecting pointed to #4671 as the culprit. It would be nice to get rid of the lazy error state entirely as alluded to there. |
I would support that guess. @bschoenmaeckers can you give me more specific instructions to repro? I tried to do so just now by cloning your branch of the |
Try |
Thanks, so I have reproduced and got a thread dump of the reproduction. One thread is holding the GIL and waiting to lock stderr:
Another thread is holding the stderr lock and waiting to acquire the GIL:
#4671 is the cause of the GIL switch here which leads to the deadlock. However I now wonder if more generally there is potential for GIL switches inside formatting traits under io write locks to create deadlocks. These would be significantly rarer, but still possible given we call 🤔 |
Can you explain me what happens? Can GIL switches happen involuntarily while holding a different lock (e.g. an io write lock)? Then I assume another thread tries to acquire that lock and we're deadlocked? |
Essentially, yes. You have to assume that doing arbitrary Python operations can lead to GIL switches. I think in practice as long as you do very little Python work then the chance of a GIL switch while holding a lock can be realistically zero, because IIRC since 3.11 the interpreter only switches pure-Python code after sufficiently many turns of the eval loop (maybe only at function call boundaries?). But there always exists the possibility of native code releasing the GIL to allow Python threads to make progress. This is essentially the cause of https://pyo3.rs/v0.23.3/faq.html#im-experiencing-deadlocks-using-pyo3-with-stdsynconcelock-stdsynclazylock-lazy_static-and-once_cell - in particular In this case I think #4671 has introduced a switch which wasn't previously in your case, and #4766 would probably resolve the deadlock. In general the IO locks seem like a rare and sneaky source of deadlocks; I've never seen someone hit this before. I think we could probably mitigate by various options:
|
Ouch, that's more tricky than I assumed. |
#4766 fixes the deadlock in polars 👍 |
Bug Description
When updating polars to pyo3 0.23 I encountered a deadlock when multiple threads try to print a
PyErr
. When I forced thePyErrState
to be evaluated with the current heldpy
token the deadlock went away.Steps to Reproduce
When I remove the
e.value(py);
pyo3 will deadlock.https://github.com/pola-rs/polars/blob/db2410f47fb6abd5770c9707a3e969711a9ae4a2/crates/polars-python/src/on_startup.rs#L58-L72
The imported python fuction _polars_warn looks like this
Backtrace
No response
Your operating system and version
Windows 11
Your Python version (
python --version
)Python 3.9.20
Your Rust version (
rustc --version
)rustc 1.85.0-nightly (6b6a867ae 2024-11-27)
Your PyO3 version
0.23.2
How did you install python? Did you use a virtualenv?
python.org installer
Additional Info
ref pola-rs/polars#20111
The text was updated successfully, but these errors were encountered: