Test suite crashes when run single process `--with-pydebug` enabled #121832

freakboy3742 · 2024-07-16T04:10:08Z

Crash report

What happened?

The make testios testing target crashes when compiled --with-pydebug enabled (as is done in CI).

The error is a SIGABRT, raised during test_types; however, running test_types by itself isn't enough to reproduce the problem.

The error is raised on L8476 of typeobject.c:

    assert(self->tp_version_tag != 0);

The same failure can be manufactured on macOS with M1 hardware, as long as the test suite is executed as a single process (i.e. python -m test, not the multi-process option enabled by make test).

Full error trace on iOS:
stacktrace.txt

The same problem doesn't appear to occur when debug is not enabled.

CPython versions tested on:

CPython main branch

Operating systems tested on:

macOS, iOS

Output from running 'python -VV' on the command line:

3.14.0a0; verified on 2bac2b86, but possibly present back to dc03ce7

Linked PRs

The text was updated successfully, but these errors were encountered:

freakboy3742 · 2024-07-16T08:41:54Z

I've narrowed the cause of the crash to #121602 (which was backported to 3.13 as #121632, so it explains the failures on both branches).

freakboy3742 · 2024-07-16T11:44:29Z

I have confirmed the crash is occurring in the SubInterpreterTest added as part of #121602. If that test is commented out, the test suite passes without error.

ericsnowcurrently · 2024-07-16T15:54:57Z

FYI, I'm pretty sure gh-121636 will fix this.

The gist is: the change in gh-121602 runtime initialization makes a copy of the (mostly) unmodified PyTypeObject struct for each builtin static type. During finalization we restore that copy, including the tp_version_tag. Clearly a part of that is causing things to get in an inconsistent state on iOS. The fix in gh-121636 restricts the copy/restore to only the "tp" slots (function pointers), restoring the pre-121602 status quo for the rest, including tp_version_tag.

Is there a way you could verify gh-121636 fixes the iOS build?

All that said, I'm definitely curious why this would be manifesting only on iOS?

freakboy3742 · 2024-07-16T22:14:55Z

FYI, I'm pretty sure gh-121636 will fix this.
...
Is there a way you could verify gh-121636 fixes the iOS build?

The iOS buildbot should be sufficient to verify any fix; from the look of it, the current state of #121636 doesn't fix it. The test failure isn't very helpful - it's a hard crash of the emulator, which the test report doesn't show very nicely - but with the test_types SubinterpreterTest commented out, the suite completes and passes on iOS. I've verified the same tests locally, and I get the same failure on #121636.

All that said, I'm definitely curious why this would be manifesting only on iOS?

You and me both :-)

My best guess right now it's not iOS specific, but a function of how the test suite is executed on iOS. iOS is forced to run the entire test suite sequentially in a single process. This isn't (AFAIK) the default mode of operation on any other runner; so it's possible the specific sequence of test execution is causing an odd state to emerge. One of the debugging tasks I'm going to look into today is to try and reproduce this on macOS (the other is to try and find a subset of tests that causes the failure, rather than needing to get to test_types alphabetically... after 20 minutes)

ericsnowcurrently · 2024-07-16T22:54:19Z

Looks like gh-121636 isn't sufficient. I'm going to try a different approach.

ericsnowcurrently · 2024-07-16T23:00:49Z

See gh-121882.

freakboy3742 · 2024-07-16T23:23:41Z

Dang it... looks like gh-121882 doesn't fix the problem either... 😢

freakboy3742 · 2024-07-17T03:37:09Z

Confirming: the test_types test doesn't fail on macOS when the test suite is executed in single process mode, so the problem isn't purely about execution order. Now trying to bisect the test suite to find a smaller subset of tests that fail.

freakboy3742 · 2024-07-17T04:08:51Z

Scratch that - it does crash on macOS - just you just have to remember to enable --with-pydebug.

@ericsnowcurrently To reproduce:

./configure --enable-universalsdk="`xcrun --show-sdk-path`" --with-universal-archs=universal2 --with-pydebug
make -j8
./python.exe -m test

(Although I also had to patch test_buildin.py::PtyTests.test_input_tty to avoid a lockup due to #116402).

Key details: Clean macOS build, with pydebug enabled; running as a single process (not multiprocess as make test will do), on an M1 MacBook Pro.

Test failure:

...
0:17:13 load avg: 14.30 [424/478/1] test_types
Assertion failed: (self->tp_version_tag != 0), function init_static_type, file typeobject.c, line 8476.
Fatal Python error: Aborted

Current thread 0x00000001fb35fac0 (most recent call first):
  <no Python frame>

Extension modules: _testinternalcapi, _testcapi, _testlimitedcapi, _testbuffer, _testclinic, _testclinic_limited, _ctypes_test, xxsubtype, _testexternalinspection, _testsinglephase_basic_wrapper, _testsinglephase_basic_copy, _testsinglephase_with_reinit, _testsinglephase_with_state, _test_non_isolated, _test_shared_gil_only (total: 15)
zsh: abort      ./python.exe -m test

Now trying to narrow down a subset of these tests that will fail.

freakboy3742 · 2024-07-17T06:54:44Z

@ericsnowcurrently I've got it narrowed down to just 2 tests:

rkm@eunectes build % ./python.exe -m test test_type_cache test_types 
Using random seed: 226415265
0:00:00 load avg: 9.54 Run 2 tests sequentially in a single process
0:00:00 load avg: 9.54 [1/2] test_type_cache
0:00:00 load avg: 9.54 [2/2] test_types
Assertion failed: (self->tp_version_tag != 0), function init_static_type, file typeobject.c, line 8476.
Fatal Python error: Aborted

Current thread 0x00000001fb35fac0 (most recent call first):
  <no Python frame>

Extension modules: _testinternalcapi, _testcapi (total: 2)
zsh: abort      ./python.exe -m test test_type_cache test_types

This is running on macOS M1 hardware, using a checkout of main at f036a46.

I've also been able to reproduce it on macOS x86_64 hardware, and under Ubuntu 22.04 on x86_64. The Ubuntu machine is a VM on my macOS x86_64 machine, if that matters; but it doesn't appear to be CPU or OS dependent.

freakboy3742 · 2024-07-17T07:52:42Z

Based on a comment from @Fidget-Spinner on Discord: the problem might be that test_type_cache is leaving something in a bad state. Apparently some of those tests exercise situations that should never happen, so maybe there's some problematic leftover state from those tests that has only been exposed as a result of #121602 landing.

I've also been able to narrow the problem even further:
test_type_cache.TypeCacheWithSpecializationTests.test_class_load_attr_specialization_static_type is the only test case that needs to run before test_types to cause the crash.

freakboy3742 · 2024-07-17T08:01:26Z

Oh! Could it be this obvious?

cpython/Lib/test/test_type_cache.py

Lines 175 to 181 in 5d98a4d

    
           # Permanently overflow the static type version counter, and force str and bytes 
        
           # to have tp_version_tag == 0 
        
           for _ in range(2**16): 
        
               type_modified(str) 
        
               type_assign_version(str) 
        
               type_modified(bytes) 
        
               type_assign_version(bytes)

If I'm reading this right, the test is deliberately setting the tp_version_tag to 0... and then not resetting the version tag at the end of the test, which causes the assertion in test_types to break?

If I comment out L179 and L181, the test_type_cache test fails, but test_types doesn't crash (in fact, it passes!)

I'm not sure how to go about resetting tp_version_tag... but that definitely looks like what is needed here.

Fidget-Spinner · 2024-07-17T14:06:48Z

The thing is. Setting the type version tag to 0 should be fine on its own. If the rest of the tests fail, that means the type cache itself is buggy when type versions overflow. So there's probably an underlying bug somewhere else.

Fidget-Spinner · 2024-07-17T14:08:09Z

Interesting! This failure happens on the default build, but not on the free-threaded build.

Fidget-Spinner · 2024-07-17T14:10:11Z

I wonder if the assert is even right? I don't quite understand the init_static_type code, but it's not guaranteed that the type version tag is not zero just because it's init. That is only guaranteed if there's a valid entry in the type cache.

ericsnowcurrently · 2024-07-17T16:54:18Z

FYI, @markshannon added that assert last month in 00257c7

commit 00257c746c447a2e026b5a2a618f0e033fb90111
Author: Mark Shannon <mark@hotpy.org>
Date:   Wed Jun 19 17:38:45 2024 +0100

    GH-119462: Enforce invariants of type versioning (GH-120731)

    * Remove uses of Py_TPFLAGS_VALID_VERSION_TAG

diff --git a/Objects/typeobject.c b/Objects/typeobject.c
index 0dcf1d399d9..1cc6ca79298 100644
--- a/Objects/typeobject.c
+++ b/Objects/typeobject.c
@@ -8516,12 +8481,11 @@ init_static_type(PyInterpreterState *interp, PyTypeObject *self,

         assert(NEXT_GLOBAL_VERSION_TAG <= _Py_MAX_GLOBAL_TYPE_VERSION_TAG);
         _PyType_SetVersion(self, NEXT_GLOBAL_VERSION_TAG++);
-        self->tp_flags |= Py_TPFLAGS_VALID_VERSION_TAG;
     }
     else {
         assert(!initial);
         assert(self->tp_flags & _Py_TPFLAGS_STATIC_BUILTIN);
-        assert(self->tp_flags & Py_TPFLAGS_VALID_VERSION_TAG);
+        assert(self->tp_version_tag != 0);
     }

     managed_static_type_state_init(interp, self, isbuiltin, initial);

That implies the assert condition is meant to be guaranteed.

init_static_type() runs once for each builtin static type when the runtime is initialized and again when each subinterpreter is initialized. The branch where the problematic assert happens is taken only when the static type has not been readied yet, which is basically with each subinterpreter.

ericsnowcurrently · 2024-07-17T16:59:11Z

Do note that the crash isn't unique to test_types. It can be triggered by any test that creates a subinterpreter. For example, test_import:

$ ./python -m test test_type_cache test_import -v -m test_basic_multiple_interpreters_main_no_reset -m test_class_load_attr_specialization_static_type
== CPython 3.14.0a0 (tags/main--refreshed-20240711-7-g5250a031332:5250a031332, Jul 17 2024, 10:52:02) [GCC 7.5.0]
== Linux-5.4.0-150-generic-x86_64-with-glibc2.27 little-endian
== Python build: debug
== cwd: /home/esnow/projects/work/cpython-perf/cpython/build/test_python_worker_11771æ
== CPU count: 8
== encodings: locale=UTF-8 FS=utf-8
== resources: all test resources are disabled, use -u option to unskip tests

Using random seed: 2891891991
0:00:00 load avg: 0.08 Run 2 tests sequentially in a single process
0:00:00 load avg: 0.08 [1/2] test_type_cache
test_class_load_attr_specialization_static_type (test.test_type_cache.TypeCacheWithSpecializationTests.test_class_load_attr_specialization_static_type) ... ok

----------------------------------------------------------------------
Ran 1 test in 0.097s

OK
0:00:00 load avg: 0.08 [2/2] test_import
test_basic_multiple_interpreters_main_no_reset (test.test_import.SinglephaseInitTests.test_basic_multiple_interpreters_main_no_reset) ... python: Objects/typeobject.c:8476: init_static_type: Assertion `self->tp_version_tag != 0' failed.
Fatal Python error: Aborted

Current thread 0x00007fd132d09100 (most recent call first):
  <no Python frame>

Extension modules: _testinternalcapi, _testcapi, _testmultiphase, _testsinglephase (total: 4)
Aborted (core dumped)

ericsnowcurrently · 2024-07-17T17:02:34Z

That crash using test_import (or others) pre-dates 5250a03, so gh-121602 did not introduce the problem. It only revealed it a bit more clearly.

ericsnowcurrently · 2024-07-17T17:26:06Z

@markshannon, @Fidget-Spinner, at this point we need to decide if that assert in init_static_type() is actually correct.

freakboy3742 · 2024-07-22T02:52:23Z

@markshannon Gentle nudge on this one; this bug is currently breaking the iOS buildbots.

markshannon · 2024-07-22T08:46:03Z

The assertion is correct. Static types must be immutable, so their version number is fixed.

The problem is that we aren't enforcing that invariant in PyType_Modified.
Unfortunately, PyType_Modified returns void and has done forever, so cannot signal failure.

I think we can assume that well behaved C extension don't modify static builtin types, so adding a new variant of PyType_Modified that checks isn't worth the disruption.

I think the best thing is to add an assertion in PyType_Modified so that the error shows up when the invariant is broken, and fix the tests to not break the invariant.
I'll do that now.

markshannon · 2024-07-22T10:00:13Z

Unfortunately, the datetime module modifies the static builtin types when reloading, which doesn't appear to be safe.
If two subinterpreters are using the datetime classes, and one of them unloads the module, the other will see changes.

I think the fix for datetime would be to make the builtin classes fully immutable, or to use heap types for the classes.

freakboy3742 · 2024-07-23T06:49:29Z

As a temporary workaround, I've added #122150; this disables the test that is causing the crash on iOS so that iOS CI can continue. Any fix for this bug should be able to remove the test skip added by that PR.

…st suite. (GH-122150)

…ore test suite. (pythonGH-122150) (cherry picked from commit 1bcc9eb) Co-authored-by: Russell Keith-Magee <russell@keith-magee.com>

markshannon · 2024-07-23T10:18:03Z

@ericsnowcurrently can you make sure to revert #122150 when this is fixed?

…tore test suite. (GH-122150) (#122159) gh-121832: Skip subinterpreter static type check on iOS to restore test suite. (GH-122150) (cherry picked from commit 1bcc9eb) Co-authored-by: Russell Keith-Magee <russell@keith-magee.com>

…not changed by PyType_Modified. (GH-122182) Update datetime module and test_type_cache.py to not call PyType_Modified.

…ore test suite. (pythonGH-122150)

…es is not changed by PyType_Modified. (pythonGH-122182) Update datetime module and test_type_cache.py to not call PyType_Modified.

…ore test suite. (pythonGH-122150)

…es is not changed by PyType_Modified. (pythonGH-122182) Update datetime module and test_type_cache.py to not call PyType_Modified.

…es is not changed by PyType_Modified. (pythonGH-122182) Update datetime module and test_type_cache.py to not call PyType_Modified. (cherry picked from commit e55b05f) Co-authored-by: Mark Shannon <mark@hotpy.org>

…pes is not changed by PyType_Modified (gh-122290) Update datetime module and test_type_cache.py to not call PyType_Modified. (cherry picked from commit e55b05f, AKA gh--122182) Co-authored-by: Mark Shannon <mark@hotpy.org>

Revert test skip introduced by #122150.

… (pythonGH-122340) Revert test skip introduced by pythonGH-122150. (cherry picked from commit 863a92f) Co-authored-by: Russell Keith-Magee <russell@keith-magee.com>

… (#122342) Revert test skip introduced by GH-122150. (cherry picked from commit 863a92f)

freakboy3742 added type-crash A hard crash of the interpreter, possibly with a core dump 3.13 bugs and security fixes OS-ios 3.14 new features, bugs and security fixes labels Jul 16, 2024

This was referenced Jul 16, 2024

gh-120522: Apply App Store compliance patch to installed products, not source #121830

Closed

gh-120831: Increase the default minimum supported iOS version to 13.0 #121250

Merged

freakboy3742 mentioned this issue Jul 16, 2024

gh-117482: Fix Builtin Types Slot Wrappers #121602

Merged

freakboy3742 changed the title ~~iOS testbed crashes when --with-pydebug enabled~~ Test suite crashes when run single process --with-pydebug enabled Jul 17, 2024

freakboy3742 removed the OS-ios label Jul 17, 2024

freakboy3742 mentioned this issue Jul 18, 2024

gh-120522: Apply App Store compliance patch during installation #121947

Merged

markshannon self-assigned this Jul 22, 2024

freakboy3742 mentioned this issue Jul 23, 2024

gh-121832: Skip subinterpreter static type check on iOS to restore test suite. #122150

Merged

markshannon pushed a commit that referenced this issue Jul 23, 2024

gh-121832: Skip subinterpreter static type check on iOS to restore te…

1bcc9eb

…st suite. (GH-122150)

bedevere-app bot mentioned this issue Jul 23, 2024

[3.13] gh-121832: Skip subinterpreter static type check on iOS to restore test suite. (GH-122150) #122159

Merged

markshannon assigned ericsnowcurrently Jul 23, 2024

bedevere-app bot mentioned this issue Jul 23, 2024

GH-121832: Assert that the version number of static builtin types is not changed by PyType_Modified. #122182

Merged

markshannon added a commit that referenced this issue Jul 24, 2024

GH-121832: Assert that the version number of static builtin types is …

e55b05f

…not changed by PyType_Modified. (GH-122182) Update datetime module and test_type_cache.py to not call PyType_Modified.

nohlson pushed a commit to nohlson/cpython that referenced this issue Jul 24, 2024

pythongh-121832: Skip subinterpreter static type check on iOS to rest…

4a6def1

…ore test suite. (pythonGH-122150)

nohlson pushed a commit to nohlson/cpython that referenced this issue Jul 24, 2024

pythongh-121832: Skip subinterpreter static type check on iOS to rest…

263866c

…ore test suite. (pythonGH-122150)

bedevere-app bot mentioned this issue Jul 25, 2024

[3.13] GH-121832: Assert that the version number of static builtin types is not changed by PyType_Modified. (GH-122182) #122290

Merged

bedevere-app bot mentioned this issue Jul 27, 2024

gh-121832: Revert test skip introduced by #122150. #122340

Merged

freakboy3742 added a commit that referenced this issue Jul 27, 2024

gh-121832: Revert test skip introduced by #122150. (#122340)

863a92f

Revert test skip introduced by #122150.

freakboy3742 closed this as completed in #122340 Jul 27, 2024

bedevere-app bot mentioned this issue Jul 27, 2024

[3.13] gh-121832: Revert test skip introduced by GH-122150. (GH-122340) #122342

Merged

freakboy3742 added a commit that referenced this issue Jul 27, 2024

[3.13] gh-121832: Revert test skip introduced by GH-122150. (GH-122340)…

7b35c50

… (#122342) Revert test skip introduced by GH-122150. (cherry picked from commit 863a92f)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test suite crashes when run single process `--with-pydebug` enabled #121832

Test suite crashes when run single process `--with-pydebug` enabled #121832

freakboy3742 commented Jul 16, 2024 •

edited by bedevere-app bot

Loading

freakboy3742 commented Jul 16, 2024

freakboy3742 commented Jul 16, 2024

ericsnowcurrently commented Jul 16, 2024

freakboy3742 commented Jul 16, 2024

ericsnowcurrently commented Jul 16, 2024

ericsnowcurrently commented Jul 16, 2024

freakboy3742 commented Jul 16, 2024

freakboy3742 commented Jul 17, 2024

freakboy3742 commented Jul 17, 2024 •

edited

Loading

freakboy3742 commented Jul 17, 2024

freakboy3742 commented Jul 17, 2024

freakboy3742 commented Jul 17, 2024

Fidget-Spinner commented Jul 17, 2024

Fidget-Spinner commented Jul 17, 2024

Fidget-Spinner commented Jul 17, 2024

ericsnowcurrently commented Jul 17, 2024

ericsnowcurrently commented Jul 17, 2024

ericsnowcurrently commented Jul 17, 2024

ericsnowcurrently commented Jul 17, 2024

freakboy3742 commented Jul 22, 2024

markshannon commented Jul 22, 2024

markshannon commented Jul 22, 2024 •

edited

Loading

freakboy3742 commented Jul 23, 2024

markshannon commented Jul 23, 2024

Test suite crashes when run single process --with-pydebug enabled #121832

Test suite crashes when run single process --with-pydebug enabled #121832

Comments

freakboy3742 commented Jul 16, 2024 • edited by bedevere-app bot Loading

Crash report

What happened?

CPython versions tested on:

Operating systems tested on:

Output from running 'python -VV' on the command line:

Linked PRs

freakboy3742 commented Jul 16, 2024

freakboy3742 commented Jul 16, 2024

ericsnowcurrently commented Jul 16, 2024

freakboy3742 commented Jul 16, 2024

ericsnowcurrently commented Jul 16, 2024

ericsnowcurrently commented Jul 16, 2024

freakboy3742 commented Jul 16, 2024

freakboy3742 commented Jul 17, 2024

freakboy3742 commented Jul 17, 2024 • edited Loading

freakboy3742 commented Jul 17, 2024

freakboy3742 commented Jul 17, 2024

freakboy3742 commented Jul 17, 2024

Fidget-Spinner commented Jul 17, 2024

Fidget-Spinner commented Jul 17, 2024

Fidget-Spinner commented Jul 17, 2024

ericsnowcurrently commented Jul 17, 2024

ericsnowcurrently commented Jul 17, 2024

ericsnowcurrently commented Jul 17, 2024

ericsnowcurrently commented Jul 17, 2024

freakboy3742 commented Jul 22, 2024

markshannon commented Jul 22, 2024

markshannon commented Jul 22, 2024 • edited Loading

freakboy3742 commented Jul 23, 2024

markshannon commented Jul 23, 2024

Test suite crashes when run single process `--with-pydebug` enabled #121832

Test suite crashes when run single process `--with-pydebug` enabled #121832

freakboy3742 commented Jul 16, 2024 •

edited by bedevere-app bot

Loading

freakboy3742 commented Jul 17, 2024 •

edited

Loading

markshannon commented Jul 22, 2024 •

edited

Loading