-
-
Notifications
You must be signed in to change notification settings - Fork 30.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash During Subinterpreter Finalization #105699
Comments
Probably the same thing: (AMD64 Arch Linux TraceRefs 3.12)
|
maybe related: #105690 |
This fixes a race during import. The existing _PyRuntimeState.imports.pkgcontext is shared between interpreters, and occasionally this would cause a crash when multiple interpreters were importing extensions modules at the same time. To solve this we add a thread-local variable for the value. We also leave the existing state (and infrequent race) in place for platforms that do not support thread-local variables.
…-105740) This fixes a race during import. The existing _PyRuntimeState.imports.pkgcontext is shared between interpreters, and occasionally this would cause a crash when multiple interpreters were importing extensions modules at the same time. To solve this we add a thread-local variable for the value. We also leave the existing state (and infrequent race) in place for platforms that do not support thread-local variables. (cherry picked from commit b87d288) Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
…) (gh-105765) This fixes a race during import. The existing _PyRuntimeState.imports.pkgcontext is shared between interpreters, and occasionally this would cause a crash when multiple interpreters were importing extensions modules at the same time. To solve this we add a thread-local variable for the value. We also leave the existing state (and infrequent race) in place for platforms that do not support thread-local variables. (cherry picked from commit b87d288) Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
After gh-106899, I'm
(stack track for that last one)
|
With the 3 PRs I have up I don't see any more crashes (other than in _xxinterpchannels). |
A static (process-global) str object must only have its "interned" state cleared when no longer interned in any interpreters. They are the only ones that can be shared by interpreters so we don't have to worry about any other str objects. We trigger clearing the state with the main interpreter, since no other interpreters may exist at that point and _PyUnicode_ClearInterned() is only called during interpreter finalization. We do not address here the fact that a string will only be interned in the first interpreter that interns it. In any subsequent interpreters str.state.interned is already set so _PyUnicode_InternInPlace() will skip it. That needs to be addressed separately from fixing the crasher.
A static (process-global) str object must only have its "interned" state cleared when no longer interned in any interpreters. They are the only ones that can be shared by interpreters so we don't have to worry about any other str objects. We trigger clearing the state with the main interpreter, since no other interpreters may exist at that point and _PyUnicode_ClearInterned() is only called during interpreter finalization. We do not address here the fact that a string will only be interned in the first interpreter that interns it. In any subsequent interpreters str.state.interned is already set so _PyUnicode_InternInPlace() will skip it. That needs to be addressed separately from fixing the crasher. (cherry picked from commit 87e7cb0) Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
…106923) There was a slight race in _Py_ClearFileSystemEncoding() (when called from _Py_SetFileSystemEncoding()), between freeing the value and setting the variable to NULL, which occasionally caused crashes when multiple isolated interpreters were used. (Notably, I saw at least 10 different, seemingly unrelated spooky-action-at-a-distance, ways this crashed. Yay, free threading!) We avoid the problem by only setting the global variables with the main interpreter (i.e. runtime init).
pythongh-106923) There was a slight race in _Py_ClearFileSystemEncoding() (when called from _Py_SetFileSystemEncoding()), between freeing the value and setting the variable to NULL, which occasionally caused crashes when multiple isolated interpreters were used. (Notably, I saw at least 10 different, seemingly unrelated spooky-action-at-a-distance, ways this crashed. Yay, free threading!) We avoid the problem by only setting the global variables with the main interpreter (i.e. runtime init). (cherry picked from commit 0ba07b2) Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
…le (gh-106923) (#106964) gh-105699: Fix a Crasher Related to a Deprecated Global Variable (gh-106923) There was a slight race in _Py_ClearFileSystemEncoding() (when called from _Py_SetFileSystemEncoding()), between freeing the value and setting the variable to NULL, which occasionally caused crashes when multiple isolated interpreters were used. (Notably, I saw at least 10 different, seemingly unrelated spooky-action-at-a-distance, ways this crashed. Yay, free threading!) We avoid the problem by only setting the global variables with the main interpreter (i.e. runtime init). (cherry picked from commit 0ba07b2) Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
gh-105699: Fix an Interned Strings Crasher (gh-106930) A static (process-global) str object must only have its "interned" state cleared when no longer interned in any interpreters. They are the only ones that can be shared by interpreters so we don't have to worry about any other str objects. We trigger clearing the state with the main interpreter, since no other interpreters may exist at that point and _PyUnicode_ClearInterned() is only called during interpreter finalization. We do not address here the fact that a string will only be interned in the first interpreter that interns it. In any subsequent interpreters str.state.interned is already set so _PyUnicode_InternInPlace() will skip it. That needs to be addressed separately from fixing the crasher. (cherry picked from commit 87e7cb0) Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
…ythonGH-106966) (cherry picked from commit adda43d) Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
This can be closed once gh-106974 lands. |
The stress tests I added are still crashing occasionally. I suspect most of the problem lies with the _xxsubinterpreters module, but I'm going to check. In the meantime I'm going to disable the two tests to cut down on noise as we approach 3.12rc1. CC @Yhg1s |
Notably, I can reproduce a crash on the 3.12 branch but not on main. ( |
The two tests are crashing periodically in CI and on buildbots. I suspect the problem is in the _xxsubinterpreters module. Regardless, I'm disabling the tests temporarily, to reduce the noise as we approach 3.12rc1. I'll be investigating the crashes separately.
The two tests are crashing periodically in CI and on buildbots. I suspect the problem is in the _xxsubinterpreters module. Regardless, I'm disabling the tests temporarily, to reduce the noise as we approach 3.12rc1. I'll be investigating the crashes separately. (cherry picked from commit 4f67921) Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
…h-107357) gh-105699: Disable the Interpreters Stress Tests (gh-107354) The two tests are crashing periodically in CI and on buildbots. I suspect the problem is in the _xxsubinterpreters module. Regardless, I'm disabling the tests temporarily, to reduce the noise as we approach 3.12rc1. I'll be investigating the crashes separately. (cherry picked from commit 4f67921) Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
This fixes a crasher due to a race condition, triggered infrequently when two isolated (own GIL) subinterpreters simultaneously initialize their sys or builtins modules. The crash happened due the combination of the "detached" thread state we were using and the "last holder" logic we use for the GIL. It turns out it's tricky to use the same thread state for different threads. Who could have guessed? We solve the problem by eliminating the one object we were still sharing between interpreters. We replace it with a low-level hashtable, using the "raw" allocator to avoid tying it to the main interpreter. We also remove the accommodations for "detached" thread states, which were a dubious idea to start with.
…hongh-106974) This fixes a crasher due to a race condition, triggered infrequently when two isolated (own GIL) subinterpreters simultaneously initialize their sys or builtins modules. The crash happened due the combination of the "detached" thread state we were using and the "last holder" logic we use for the GIL. It turns out it's tricky to use the same thread state for different threads. Who could have guessed? We solve the problem by eliminating the one object we were still sharing between interpreters. We replace it with a low-level hashtable, using the "raw" allocator to avoid tying it to the main interpreter. We also remove the accommodations for "detached" thread states, which were a dubious idea to start with. (cherry picked from commit 8ba4df9)
…-106974) (gh-107412) gh-105699: Use a _Py_hashtable_t for the PyModuleDef Cache (gh-106974) This fixes a crasher due to a race condition, triggered infrequently when two isolated (own GIL) subinterpreters simultaneously initialize their sys or builtins modules. The crash happened due the combination of the "detached" thread state we were using and the "last holder" logic we use for the GIL. It turns out it's tricky to use the same thread state for different threads. Who could have guessed? We solve the problem by eliminating the one object we were still sharing between interpreters. We replace it with a low-level hashtable, using the "raw" allocator to avoid tying it to the main interpreter. We also remove the accommodations for "detached" thread states, which were a dubious idea to start with. (cherry picked from commit 8ba4df9)
I've disable the new stress tests as they were causing crashes regularly that I'm not convinced indicate bugs (outside _xxsubinterpreters). Once that's sorted out and I've re-enabled the tests, we can close this issue. |
We had disabled them due to crashes they exposed, which have since been fixed.
…thongh-107572) We had disabled them due to crashes they exposed, which have since been fixed. (cherry picked from commit f9e3ff1) Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
I'm calling this good. |
…tress Tests (pythongh-107572) (python#107783)" This reverts commit a4aac7d.
…sts (pythongh-107572) (python#107783) We had disabled them due to crashes they exposed, which have since been fixed. (cherry picked from commit f9e3ff1) Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com> Co-authored-by: Łukasz Langa <lukasz@langa.pl> Co-authored-by: T. Wouters <thomas@python.org>
There's an isolation leak somewhere. It may be just in the _xxsubinterpreters module, but I suspect it's not.
See #99114 (comment).
Reproducers:
FYI, I see crashes on this fairly infrequently.
Linked PRs
The text was updated successfully, but these errors were encountered: