-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test suite crashes when run single process --with-pydebug
enabled
#121832
Comments
I have confirmed the crash is occurring in the SubInterpreterTest added as part of #121602. If that test is commented out, the test suite passes without error. |
FYI, I'm pretty sure gh-121636 will fix this. The gist is: the change in gh-121602 runtime initialization makes a copy of the (mostly) unmodified Is there a way you could verify gh-121636 fixes the iOS build? All that said, I'm definitely curious why this would be manifesting only on iOS? |
The iOS buildbot should be sufficient to verify any fix; from the look of it, the current state of #121636 doesn't fix it. The test failure isn't very helpful - it's a hard crash of the emulator, which the test report doesn't show very nicely - but with the test_types SubinterpreterTest commented out, the suite completes and passes on iOS. I've verified the same tests locally, and I get the same failure on #121636.
You and me both :-) My best guess right now it's not iOS specific, but a function of how the test suite is executed on iOS. iOS is forced to run the entire test suite sequentially in a single process. This isn't (AFAIK) the default mode of operation on any other runner; so it's possible the specific sequence of test execution is causing an odd state to emerge. One of the debugging tasks I'm going to look into today is to try and reproduce this on macOS (the other is to try and find a subset of tests that causes the failure, rather than needing to get to test_types alphabetically... after 20 minutes) |
Looks like gh-121636 isn't sufficient. I'm going to try a different approach. |
See gh-121882. |
Dang it... looks like gh-121882 doesn't fix the problem either... 😢 |
Confirming: the test_types test doesn't fail on macOS when the test suite is executed in single process mode, so the problem isn't purely about execution order. Now trying to bisect the test suite to find a smaller subset of tests that fail. |
Scratch that - it does crash on macOS - just you just have to remember to enable @ericsnowcurrently To reproduce:
(Although I also had to patch Key details: Clean macOS build, with pydebug enabled; running as a single process (not multiprocess as Test failure:
Now trying to narrow down a subset of these tests that will fail. |
--with-pydebug
enabled--with-pydebug
enabled
@ericsnowcurrently I've got it narrowed down to just 2 tests:
This is running on macOS M1 hardware, using a checkout of main at f036a46. I've also been able to reproduce it on macOS x86_64 hardware, and under Ubuntu 22.04 on x86_64. The Ubuntu machine is a VM on my macOS x86_64 machine, if that matters; but it doesn't appear to be CPU or OS dependent. |
Based on a comment from @Fidget-Spinner on Discord: the problem might be that I've also been able to narrow the problem even further: |
Oh! Could it be this obvious? cpython/Lib/test/test_type_cache.py Lines 175 to 181 in 5d98a4d
If I'm reading this right, the test is deliberately setting the If I comment out L179 and L181, the I'm not sure how to go about resetting |
The thing is. Setting the type version tag to 0 should be fine on its own. If the rest of the tests fail, that means the type cache itself is buggy when type versions overflow. So there's probably an underlying bug somewhere else. |
Interesting! This failure happens on the default build, but not on the free-threaded build. |
I wonder if the assert is even right? I don't quite understand the |
FYI, @markshannon added that assert last month in 00257c7 commit 00257c746c447a2e026b5a2a618f0e033fb90111
Author: Mark Shannon <mark@hotpy.org>
Date: Wed Jun 19 17:38:45 2024 +0100
GH-119462: Enforce invariants of type versioning (GH-120731)
* Remove uses of Py_TPFLAGS_VALID_VERSION_TAG
diff --git a/Objects/typeobject.c b/Objects/typeobject.c
index 0dcf1d399d9..1cc6ca79298 100644
--- a/Objects/typeobject.c
+++ b/Objects/typeobject.c
@@ -8516,12 +8481,11 @@ init_static_type(PyInterpreterState *interp, PyTypeObject *self,
assert(NEXT_GLOBAL_VERSION_TAG <= _Py_MAX_GLOBAL_TYPE_VERSION_TAG);
_PyType_SetVersion(self, NEXT_GLOBAL_VERSION_TAG++);
- self->tp_flags |= Py_TPFLAGS_VALID_VERSION_TAG;
}
else {
assert(!initial);
assert(self->tp_flags & _Py_TPFLAGS_STATIC_BUILTIN);
- assert(self->tp_flags & Py_TPFLAGS_VALID_VERSION_TAG);
+ assert(self->tp_version_tag != 0);
}
managed_static_type_state_init(interp, self, isbuiltin, initial); That implies the assert condition is meant to be guaranteed.
|
Do note that the crash isn't unique to test_types. It can be triggered by any test that creates a subinterpreter. For example, test_import:
|
@markshannon, @Fidget-Spinner, at this point we need to decide if that assert in |
@markshannon Gentle nudge on this one; this bug is currently breaking the iOS buildbots. |
The assertion is correct. Static types must be immutable, so their version number is fixed. The problem is that we aren't enforcing that invariant in I think we can assume that well behaved C extension don't modify static builtin types, so adding a new variant of I think the best thing is to add an assertion in |
Unfortunately, the datetime module modifies the static builtin types when reloading, which doesn't appear to be safe. I think the fix for datetime would be to make the builtin classes fully immutable, or to use heap types for the classes. |
As a temporary workaround, I've added #122150; this disables the test that is causing the crash on iOS so that iOS CI can continue. Any fix for this bug should be able to remove the test skip added by that PR. |
…ore test suite. (pythonGH-122150) (cherry picked from commit 1bcc9eb) Co-authored-by: Russell Keith-Magee <russell@keith-magee.com>
@ericsnowcurrently can you make sure to revert #122150 when this is fixed? |
…not changed by PyType_Modified. (GH-122182) Update datetime module and test_type_cache.py to not call PyType_Modified.
…es is not changed by PyType_Modified. (pythonGH-122182) Update datetime module and test_type_cache.py to not call PyType_Modified.
…es is not changed by PyType_Modified. (pythonGH-122182) Update datetime module and test_type_cache.py to not call PyType_Modified.
…es is not changed by PyType_Modified. (pythonGH-122182) Update datetime module and test_type_cache.py to not call PyType_Modified. (cherry picked from commit e55b05f) Co-authored-by: Mark Shannon <mark@hotpy.org>
… (pythonGH-122340) Revert test skip introduced by pythonGH-122150. (cherry picked from commit 863a92f) Co-authored-by: Russell Keith-Magee <russell@keith-magee.com>
Crash report
What happened?
The
make testios
testing target crashes when compiled--with-pydebug
enabled (as is done in CI).The error is a SIGABRT, raised during
test_types
; however, runningtest_types
by itself isn't enough to reproduce the problem.The error is raised on L8476 of
typeobject.c
:The same failure can be manufactured on macOS with M1 hardware, as long as the test suite is executed as a single process (i.e.
python -m test
, not the multi-process option enabled bymake test
).Full error trace on iOS:
stacktrace.txt
The same problem doesn't appear to occur when debug is not enabled.
CPython versions tested on:
CPython main branch
Operating systems tested on:
macOS, iOS
Output from running 'python -VV' on the command line:
3.14.0a0; verified on 2bac2b86, but possibly present back to dc03ce7
Linked PRs
The text was updated successfully, but these errors were encountered: