Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #392: Port to Python 3.13 #396

Merged
merged 9 commits into from
Sep 4, 2024
Merged

Conversation

vstinner
Copy link
Contributor

  • Replace C_RECURSION_LIMIT with Py_C_RECURSION_LIMIT.
  • Add Py_C_RECURSION_LIMIT for Python 3.12 and older.
  • Disable GREENLET_USE_CFRAME on Python 3.13.
  • Define Py_BUILD_CORE to include pycore_frame.h.

* Replace C_RECURSION_LIMIT with Py_C_RECURSION_LIMIT.
* Add Py_C_RECURSION_LIMIT for Python 3.12 and older.
* Disable GREENLET_USE_CFRAME on Python 3.13.
* Define Py_BUILD_CORE to include pycore_frame.h.
@vstinner
Copy link
Contributor Author

The second commit adds a Python 3.13 CI to GitHub Actions. It's to prove that my change works as expected. I can easily remove that change if you want to drop the 3.13 test for now.

Well, IMO it's good to test as soon as possible ;-)

@vstinner
Copy link
Contributor Author

Ah. While "tox -e py313" pass locally, it fails on the GitHub Action:

test_break_ctxvars (greenlet.tests.test_contextvars.ContextVarsTests.test_break_ctxvars) ... Fatal Python error: Segmentation fault
Current thread 0x00007fd2a2a61b80 (most recent call first):
  File "/home/runner/work/greenlet/greenlet/src/greenlet/tests/leakcheck.py", line 212 in _run_test
  File "/home/runner/work/greenlet/greenlet/src/greenlet/tests/leakcheck.py", line 295 in __call__
  File "/home/runner/work/greenlet/greenlet/src/greenlet/tests/leakcheck.py", line 317 in wrapper
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/case.py", line 589 in _callTestMethod
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/case.py", line 636 in run
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/case.py", line 692 in __call__
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/suite.py", line 122 in run
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/suite.py", line 84 in __call__
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/suite.py", line 122 in run
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/suite.py", line 84 in __call__
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/suite.py", line 122 in run
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/suite.py", line 84 in __call__
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/runner.py", line 240 in run
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/main.py", line 270 in runTests
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/main.py", line 104 in __init__
  File "/opt/hostedtoolcache/Python/3.13.0-alpha.3/x64/lib/python3.13/unittest/__main__.py", line 18 in <module>
  File "<frozen runpy>", line 88 in _run_code
  File "<frozen runpy>", line 198 in _run_module_as_main
Extension modules: greenlet._greenlet, psutil._psutil_linux, psutil._psutil_posix, greenlet.tests._test_extension_cpp, greenlet.tests._test_extension (total: 5)
/home/runner/work/_temp/7f62c942-1552-419a-bf75-ca42301e321a.sh: line 3:  1982 Segmentation fault      (core dumped) python -m unittest discover -v greenlet.tests
Error: Process completed with exit code 139.

I'm not sure about this change:

+  #if GREENLET_USE_CFRAME
     tstate->cframe->current_frame = this->current_frame;
+  #endif

We should save/restore the current_frame, but I'm not sure how.

@jamadden
Copy link
Contributor

Thanks for looking into this @vstinner ! I haven't had any time to keep up with Python 3.13 changes, but I can at least offer some insight into that crash. This is because that same test crashes immediately for me on my M2 Mac. Assuming the crash is the same, here's the backtrace:

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0xdddddddddddddde5 -> 0xffffdddddddddde5 (possible pointer authentication failure)
Exception Codes:       0x0000000000000001, 0xdddddddddddddde5

Termination Reason:    Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process:   Python [25867]

VM Region Info: 0xffffdddddddddde5 is not in any region.  
      REGION TYPE                    START - END         [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
      UNUSED SPACE AT START
--->  
      UNUSED SPACE AT END

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib        	       0x1877c60dc __pthread_kill + 8
1   libsystem_pthread.dylib       	       0x1877fdcc0 pthread_kill + 288
2   libsystem_c.dylib             	       0x1876d2540 raise + 32
3   Python                        	       0x103a15410 faulthandler_fatal_error + 412
4   libsystem_platform.dylib      	       0x18782da24 _sigtramp + 56
5   _greenlet.cpython-313-darwin.so	       0x1037cf910 greenlet::PythonState::operator>>(_ts*) + 56 (TPythonState.cpp:205) [inlined]
6   _greenlet.cpython-313-darwin.so	       0x1037cf910 greenlet::Greenlet::g_switchstack_success() + 88 (TGreenlet.cpp:133)
7   _greenlet.cpython-313-darwin.so	       0x1037cb0c4 greenlet::Greenlet::g_switchstack() + 288 (TGreenlet.cpp:210)
8   _greenlet.cpython-313-darwin.so	       0x1037cfaa8 greenlet::MainGreenlet::g_switch() + 48 (TMainGreenlet.cpp:100)
9   _greenlet.cpython-313-darwin.so	       0x1037d0cc4 green_switch(_greenlet*, _object*, _object*) + 268 (greenlet.cpp:530)
10  Python                        	       0x1038e27b8 cfunction_call + 72
11  Python                        	       0x103890124 _PyObject_MakeTpCall + 128
12  Python                        	       0x1039b5f80 context_run + 104
13  Python                        	       0x1038e1ee8 cfunction_vectorcall_FASTCALL_KEYWORDS.llvm.10847317658053313408 + 92
14  _greenlet.cpython-313-darwin.so	       0x1037cce54 greenlet::UserGreenlet::inner_bootstrap(_greenlet*, _object*) + 264 (TUserGreenlet.cpp:464)
15  _greenlet.cpython-313-darwin.so	       0x1037cc6c8 greenlet::UserGreenlet::g_initialstub(void*) + 1364 (TUserGreenlet.cpp:311)
16  _greenlet.cpython-313-darwin.so	       0x1037cf298 greenlet::UserGreenlet::g_switch() + 136 (TUserGreenlet.cpp:179)

The invalid address causing the segfault is interesting...

The crashing code in question:

#if GREENLET_PY311
  #if GREENLET_PY312
    tstate->py_recursion_remaining = tstate->py_recursion_limit - this->py_recursion_depth;
    tstate->c_recursion_remaining = Py_C_RECURSION_LIMIT - this->c_recursion_depth;
    this->unexpose_frames(); // <- CRASH; line 205
  #else // \/ 3.11

unexpose_frames was new for 3.12:

#if GREENLET_PY312
void GREENLET_NOINLINE(PythonState::unexpose_frames)()
{
    if (!this->top_frame()) {
        return;
    }

    // See GreenletState::expose_frames() and the comment on frames_were_exposed
    // for more information about this logic.
    _PyInterpreterFrame *iframe = this->_top_frame->f_frame;
    while (iframe != nullptr) {
        _PyInterpreterFrame *prev_exposed = iframe->previous;
        assert(iframe->frame_obj);
        memcpy(&iframe->previous, &iframe->frame_obj->_f_frame_data[0],
               sizeof(void *));
        iframe = prev_exposed;
    }
}
#else
void PythonState::unexpose_frames()
{}
#endif      

IIRC, this all has to do with the way frames are "lazily" allocated now. We have to turn them into real frames so that we can get the complete stacks of suspended greenlets; that was added in greenlet 3.0.3 (previously, I'd just punted on that problem).

I didn't write much of expose_frames; it was generously contributed in b5f9d23

If I disable that functionality by ifdefing out expose/unexpose_frames I get a little farther, but still crash; this time in ContextVarsTests.test_context_not_propagated:

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0x000000010489c020
Exception Codes:       0x0000000000000001, 0x000000010489c020

Termination Reason:    Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process:   exc handler [26401]

VM Region Info: 0x10489c020 is not in any region.  Bytes after previous region: 33  Bytes before following region: 32736
      REGION TYPE                    START - END         [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
      shared memory               104898000-10489c000    [   16K] r--/r-- SM=SHM  
--->  GAP OF 0x8000 BYTES
      VM_ALLOCATE                 1048a4000-1048ac000    [   32K] rw-/rwx SM=PRV  

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   Python                        	       0x105209420 dump_frame + 24
1   Python                        	       0x105208a04 dump_traceback + 116
2   Python                        	       0x105208b88 _Py_DumpTracebackThreads + 324
3   Python                        	       0x105219498 faulthandler_dump_traceback + 84
4   Python                        	       0x1052193f4 faulthandler_fatal_error + 384
5   libsystem_platform.dylib      	       0x18782da24 _sigtramp + 56
6   _greenlet.cpython-313-darwin.so	       0x104fcf3dc greenlet::PythonState::operator<<(_ts const*) + 104 (TPythonState.cpp:144)
7   _greenlet.cpython-313-darwin.so	       0x104fcf3dc greenlet::PythonState::operator<<(_ts const*) + 104 (TPythonState.cpp:144)
8   _greenlet.cpython-313-darwin.so	       0x104fcf25c greenlet::Greenlet::g_switchstack() + 144 (TGreenlet.cpp:167)
9   _greenlet.cpython-313-darwin.so	       0x104fd3ac8 greenlet::MainGreenlet::g_switch() + 48 (TMainGreenlet.cpp:100)
10  _greenlet.cpython-313-darwin.so	       0x104fd1234 greenlet::UserGreenlet::inner_bootstrap(_greenlet*, _object*) + 1396 (TUserGreenlet.cpp:553)
11  _greenlet.cpython-313-darwin.so	       0x104fd063c greenlet::UserGreenlet::g_initialstub(void*) + 1364 (TUserGreenlet.cpp:311)
12  _greenlet.cpython-313-darwin.so	       0x104fd32c8 greenlet::UserGreenlet::g_switch() + 136 (TUserGreenlet.cpp:179)
13  _greenlet.cpython-313-darwin.so	       0x104fd4ce4 green_switch(_greenlet*, _object*, _object*) + 268 (greenlet.cpp:530)

@vstinner
Copy link
Contributor Author

PythonState::unexpose_frames() is called with this->_top_frame->f_frame = 0xdddddddddddddddd: the struct _frame memory was freed, so reading a member gives 0xdddddddddddddddd pointer. this->_top_frame doesn't "own" the memory. It's just a raw pointer to something managed internally by Python.

I don't know how to move forward on this issue :-(

@vstinner
Copy link
Contributor Author

The following code of void PythonState::operator<<(const PyThreadState *const tstate) noexcept is scary:

    PyFrameObject *frame = PyThreadState_GetFrame((PyThreadState *)tstate);
    Py_XDECREF(frame);  // PyThreadState_GetFrame gives us a new
                        // reference.
    this->_top_frame.steal(frame);

Why does it call Py_XDECREF(): we have a nice strong reference which is safe to use. Calling Py_XDECREF() converts it to a borrowed reference and so the frame can be deallocated and we end up with a dangling pointer and likely a crash.

Exception Codes:       KERN_INVALID_ADDRESS at 0xdddddddddddddde5 -> 0xffffdddddddddde5 (possible pointer authentication failure)

Exactly: 0xdddddddddd... pointer is a dangling pointer.

@vstinner
Copy link
Contributor Author

I hacked greenlet to hold a strong reference to the frame, but then another assertion failed...

test_break_ctxvars (greenlet.tests.test_contextvars.ContextVarsTests.test_break_ctxvars) ...

python: Python/ceval.c:894: _PyEval_EvalFrameDefault: Assertion `STACK_LEVEL() == 0' failed.

Program received signal SIGABRT, Aborted.

@jamadden
Copy link
Contributor

Why does it call Py_XDECREF(): ... Calling Py_XDECREF() converts it to a borrowed reference and so the frame can be deallocated

Exactly. IIRC, in most places we keep borrowed references, it's specifically so they can be deallocated. greenlets aren't required to ever finish or get switched back to before they get deallocated (this is surprisingly easy to arrange to happen). That means that any code on the C call stack that was going to decref something like a frame may never get a chance to run. So, while a greenlet is switched out, we essentially transfer ownership out of the C stack and into us. If we kept an extra strong reference, and the C stack code never runs again, we leak.

This exposes implicit assumptions about the objects like that: we assume that "someone else" is keeping a strong reference to it, someone who may not run again to decref; or, put another way, the code that originally "owned" these objects won't run again to decref them until we switch the greenlet back in. It seems like something in this area has changed; for example, we're getting back fresh (transient) objects that have no other owner, or the place in the C stack that was keeping a strong reference is now keeping a borrowed reference.

@edgarrmondragon
Copy link
Contributor

With Python 3.13 in beta, should we try to get this across?

@wRAR wRAR mentioned this pull request May 28, 2024
7 tasks
@musicinmybrain
Copy link

musicinmybrain commented May 30, 2024

I just tested this as a patch for the Fedora package, but (when built for Python 3.13) there is still a problem with trying to access the trash member of the opaque PyThreadState struct.

  In file included from src/greenlet/greenlet.cpp:36:
  src/greenlet/TPythonState.cpp: In member function ‘void greenlet::PythonState::operator<<(const PyThreadState*)’:
  src/greenlet/TPythonState.cpp:149:42: error: ‘const PyThreadState’ {aka ‘const struct _ts’} has no member named ‘trash’
    149 |     this->trash_delete_nesting = tstate->trash.delete_nesting;
        |                                          ^~~~~
  src/greenlet/TPythonState.cpp: In member function ‘void greenlet::PythonState::operator>>(PyThreadState*)’:
  src/greenlet/TPythonState.cpp:217:13: error: ‘PyThreadState’ {aka ‘struct _ts’} has no member named ‘trash’
    217 |     tstate->trash.delete_nesting = this->trash_delete_nesting;
        |             ^~~~~
  src/greenlet/greenlet.cpp: In function ‘PyObject* mod_get_tstate_trash_delete_nesting(PyObject*)’:
  src/greenlet/greenlet.cpp:1341:36: error: ‘PyThreadState’ {aka ‘struct _ts’} has no member named ‘trash’
   1341 |     return PyLong_FromLong(tstate->trash.delete_nesting);
        |                                    ^~~~~
  error: command '/usr/bin/gcc' failed with exit code 1

@musicinmybrain
Copy link

I checked internal/pycore_tstate.h for Python 3.13.0b1, and struct _PyThreadStateImpl does not have a member named trash.

@vstinner
Copy link
Contributor Author

vstinner commented Jun 5, 2024

src/greenlet/greenlet.cpp:1341:36: error: ‘PyThreadState’ {aka ‘struct _ts’} has no member named ‘trash’

I updated my PR to use tstate->delete_later instead. My PR is still a WIP, the test suite does crash immediately :-(

@vstinner
Copy link
Contributor Author

vstinner commented Jun 5, 2024

With this patch:

diff --git a/src/greenlet/TPythonState.cpp b/src/greenlet/TPythonState.cpp
index bfb40ca..98855c5 100644
--- a/src/greenlet/TPythonState.cpp
+++ b/src/greenlet/TPythonState.cpp
@@ -146,8 +146,8 @@ void PythonState::operator<<(const PyThreadState *const tstate) noexcept
     this->datastack_limit = tstate->datastack_limit;
 
     PyFrameObject *frame = PyThreadState_GetFrame((PyThreadState *)tstate);
-    Py_XDECREF(frame);  // PyThreadState_GetFrame gives us a new
-                        // reference.
+    //Py_XDECREF(frame);  // PyThreadState_GetFrame gives us a new
+    //                    // reference.
     this->_top_frame.steal(frame);
   #if GREENLET_PY313
     this->delete_later = Py_XNewRef(tstate->delete_later);

I get an assertion error that I don't know how to debug:

test_break_ctxvars (greenlet.tests.test_contextvars.ContextVarsTests.test_break_ctxvars) ...

python: Python/ceval.c:901: _PyEval_EvalFrameDefault: Assertion `STACK_LEVEL() == 0' failed.

Python is built in debug mode.

@vstinner
Copy link
Contributor Author

vstinner commented Jun 5, 2024

I updated my PR and now the whole test suite pass on Python 3.13!

Problem: GitHub Action updated macos-latest to arm64 arch which doesn't support old Python versions. I wrote #410 for that.

@musicinmybrain
Copy link

This seems to work as a downstream patch in Fedora. Thank you!

@mgorny
Copy link

mgorny commented Jul 9, 2024

@vstinner, thanks for doing this. Is there any chance I could convince you to try looking at #368? It also affects 3.13 for us.

@EwoutH
Copy link

EwoutH commented Aug 17, 2024

What would be needed to move this forward?

@arcivanov
Copy link

I have a feeling @jamadden is terribly busy

@EwoutH
Copy link

EwoutH commented Aug 26, 2024

I think we need to consider what to do if this repository is unmaintained.

@raghunandan-16
Copy link

I am getting this error for green let for 3.13 is it related with the above mentioned : " pycore_frame.h(8): fatal error C1189: #error: "this header requires Py_BUILD_CORE define"

@edgarrmondragon
Copy link
Contributor

I see @oremanj has committed to this repo in the past year. Maybe they can offer some options.

@jamadden
Copy link
Contributor

jamadden commented Sep 4, 2024

Thank's everyone for your help on this, with a special thanks to @vstinner.

I'll do my best to get a release out to PyPI (even if only an RC) by the end of the week.

@jamadden jamadden merged commit f82b66f into python-greenlet:master Sep 4, 2024
18 checks passed
@vstinner vstinner deleted the py313 branch September 5, 2024 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants