Tracking Issue: Sub-Interpreter Support #3451

Aequitosh · 2023-09-13T12:08:39Z

Tracks the development and state of supporting sub-interpreters in PyO3.

This issue really only tracks progress, for discussing everything else, feel free to join over here: Aequitosh#1

Summary

As of 13.09.2023

PyO3 currently doesn't support sub-interpreters, which will lead to an ImportError being raised if a module using PyO3 is initialized more than once per interpreter process. As stated in #2523, this is necessary in order to prevent soundness holes (as in, prevent things that use PyO3 from randomly breaking, having nasty undefined behaviour, etc.).

Even though this prevents soundness holes on the one hand, it can lead to modules / applications using a sub-interpreter model to "break" in certain situations. For examples, see pyca/cryptography#9016 and bazaah/aur-ceph#20.

Implementing sub-interpreter support isn't straightforward and requires quite a substantial redesign of PyO3's API. This issue shall track this redesign and provide as much relevant information as possible for all that wish to contribute.

Goals

Adapted from #576 (comment), as of 13.09.2023.

Mid-Term

Rework synchronization primitives to not rely on the GIL. See Add support for nogil Python #2885
- Develop transition plan so that existing users can migrate their code without enormous amounts of work
- Remove static data from PyO3's implementation, either move things to PyModule_GetState (preferred) or PyInterpreterState_GetDict (alternative)
Allow extension authors to use unsafe in order to opt in to sub-interpreter support - it is their responsibility to guarantee to not store Py<T> in any static data.
Document all conditions that extension authors' modules need to satisfy so that they may be used within sub-interpreters

Long-Term

Possibly remove the need for extension authors to audit their own code once we're confident enough.

Tasks

TBA - might them here (or some other place) once more concrete pieces of work have been identified.

Relevant Issues & Interesting Reads

Listing relevant things here. Some things might already be linked above, but it's nevertheless nice to have everything in one place.

Initial discussion regarding sub-interpreter support:
Support sub-interpreters #576
PR regarding nogil Python support, contains lots of additional information:
Add support for nogil Python #2885
Discussion regarding making Python's C-API more friendly for Rust; linking to comment what would need to happen in PyO3 internally:
How can we make Python's C-API more friendly for Rust? #2346 (comment)
cryptography issue regarding sub-interpreters in PyO3:
Different behaviour between 41.0.0, 41.0.1 and 40.0.2 - PyO3 modules may only be initialized once per interpreter process pyca/cryptography#9016
aur-ceph maintainer's issue regarding Ceph's sub-interpreter model, and why Ceph Dashboard breaks:
ceph-mgr/dashboard: python-cryptography PyO3 modules may only be initialized once per interpreter process bazaah/aur-ceph#20
Idea by @GoldsteinE - using a ghostcell-ish pattern:
Support sub-interpreters #576 (comment)

The text was updated successfully, but these errors were encountered:

Aequitosh · 2023-09-13T12:09:50Z

Note that I will update this issue whenever updates, new infos, etc. appear in order to keep everything relatively tidy.

letalboy · 2023-09-16T18:56:01Z

@Aequitosh, just in case you didn't have seen, David have redirected the Multiple Gill Acquire to here and close the other one to don't have two lines of the same subject, so now we will continue the subject here ;)

Aequitosh · 2024-02-06T10:10:12Z

Hi there!

For those following this issue, I've got a short update: I'm slowly able to pick up on all this again, now that there are less things going on in my private life.

Currently, I'm working on properly drafting up and implementing a prototype of an idea that's been living rent-free in my head the past few weeks - I figured it's finally time I brought it to life in the form of code. More details will follow as soon as I'm more confident with the idea - that is, once I've actually implemented it in prototypical form and seen it in action.

See this more as a sign that this issue is still alive; I'm still very eager to work on this, even though I wasn't able to for a while.

Aequitosh · 2024-04-04T11:31:53Z

So, I have been working on and off on this. The more I begin to understand how CPython's insides work, the more I realize how complicated this actually is.

Nevertheless, I've got a rough plan for removing static data from PyO3. I think this is a good first "milestone" (or whatever you'd like to call it) for this issue - I will elaborate on this further below.

Per-Module State

From what I've been experimenting with, it's probably best to move static data into the per-module memory-area (which can be accessed via PyModule_GetState) as was initially preferred.

The absolutely fantastic thing about per-module state is that, according to the CPython docs, it's an arbitrarily-sized block of memory allocated on the Python interpreter's heap that is sub-interpreter safe to access. This makes it the ideal place to store more than just static data - I will elaborate on this below.

Relocating currently static data to this per-module memory region will require a new mechanism to actually put stuff on there during the initialization of a module. To give a more concrete example, instead of statically allocating docstrings, they should instead perhaps be allocated in a separate container, and then be moved / cloned / etc. on the per-module memory block.

To provide an analogy, this mechanism (or API) would work similar to something like lazy_static or OnceLock, just quite a bit more elaborate. This "pseudo-static docstring container" would be mutable during the module initialization phase and made immutable once put onto the Python heap.

But obviously this goes beyond just storing docstrings - and in my opinion, also beyond just storing static data.

A Place For More Than `static`s

I reckon that implementing this hypothetical mechanism described above will require quite a lot of changes to PyO3's internals; at least that's what it looks like to me right now.

Nevertheless, I think it can be leveraged for more than just static data - for example, depending on how we'll actually make the current synchronization primitives independent from the GIL, we could definitely store other per-module state there, including e.g. a per-module lock that emulates the GIL (really just an example!).

What can (and what should) be in the per-module state is still up for discussion (some of it perhaps beyond the current scope of this issue), but I think it's safe to say that we should start with relocating static data there - and that's where I'm currently at.

Next Steps

Because working on the synchronization primitives first (the prototype I had mentioned in my prior post) was maybe a too big of a chunk up front, this is what I'll be working on in the next couple weeks:

Some kind of struct living on the Python heap representing per-module state where currently static data will be moved to
An internal API regarding per-module state
- Only for per-module static data for now, but once the flow's been worked out, I don't see why this couldn't also be used for other purposes
- Perhaps something that can be made pub (or have a pub layer, rather) once the details have been fleshed out

Also, I'll probably open a developer diary or something over at the discussions of my fork in order to keep this thread rather clean - if you'd like to comment on this, feel free to open a discussion there too.

I was initially intending to report back once I had something more concrete, but I feel it's better to share some bits and pieces here and there - maybe it encourages somebody to share their ideas or comments as well.

Example of Per-Module State

There is a fantastic example that just so happened to be added in the meantime, which demonstrates this; moreover, it shows how to make a sub-interpreter safe module using PyO3's FFI bindings. Leaving this here as it shows mostly what I mean.

davidhewitt · 2024-04-06T14:30:49Z

Agreed that per module state is a necessary first step which can have general value beyond subinterpreters. I've actually been playing around with the first step for supporting that, which is changing PyO3 to do something compatible with pep 489. Ideally I can push this soon!

Aequitosh · 2024-04-06T15:33:47Z

That's actually fantastic - I've got multi-phase initialization to almost work at the moment; I still have to change a bunch of the proc macro stuff so I can actually attach functions, classes, etc. to my module. It otherwise loads just fine (though I get a double-free when the garbage collector picks it up, woops).

Let me know if I can lend a hand or anything! I haven't pushed my stuff yet, but might soon. I'll ping you over at my fork once I do (if that's alright).

mejrs · 2024-04-06T22:21:54Z

I think part of this would be to have an optional state: &State argument in functions and methods that passes in some user defined (part of) the module state, so that users can also put their static data in it. Much like how web frameworks pass in a Context struct so users don't have to use global variables.

Example of Per-Module State

There is a fantastic example that just so happened to be added in the meantime, which demonstrates this; moreover, it shows how to make a sub-interpreter safe module using PyO3's FFI bindings. Leaving this here as it shows mostly what I mean.

Thanks! I mostly wrote it to get some experience with it and to get a feel of what it should look like. I'm happy if it does the same for others :)

Aequitosh · 2024-05-06T17:21:36Z

Back with an update! I opened up PR #4162, which implements almost fully functional multi-phase module initialization. See the PR for more information.

More work will continue off and on in the meantime. Exams are coming up, but I'll try to make some time every now and then.

Aequitosh · 2024-07-18T14:52:01Z

Healthcheck: Now that university stuff has cooled down, I can finally dedicate some more time to this again. Just rebased my PR on main and added some notes for future me.

Aequitosh · 2024-07-23T15:49:44Z

Back with some good news! Multi-phase initialization now works, even for submodules.

I've updated PR #4162 correspondingly; it's now an RFC. See the PR's description for more details.

It's still a little rough around the edges, but we're getting much closer to merging now, I feel. (Unless something unexpected pops up, that is.)

@davidhewitt Kindly pinging you here and asking you to take a look whenever you have time. ;)

Aequitosh mentioned this issue Sep 13, 2023

Support sub-interpreters #576

Open

davidhewitt mentioned this issue Sep 16, 2023

Multiple Gill Aquire #3422

Closed

davidhewitt added the Good First Issue label Apr 2, 2024

Aequitosh mentioned this issue May 6, 2024

RFC: Implement Multi-Phase Module Initialization as per PEP 489 #4162

Open

6 tasks

Aequitosh mentioned this issue May 24, 2024

[4.1.1] ImportError: PyO3 modules compiled for CPython 3.8 or older may only be initialized once per interpreter process pyca/bcrypt#694

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tracking Issue: Sub-Interpreter Support #3451

Tracking Issue: Sub-Interpreter Support #3451

Aequitosh commented Sep 13, 2023

Aequitosh commented Sep 13, 2023

letalboy commented Sep 16, 2023

Aequitosh commented Feb 6, 2024 •

edited

Loading

Aequitosh commented Apr 4, 2024

davidhewitt commented Apr 6, 2024

Aequitosh commented Apr 6, 2024

mejrs commented Apr 6, 2024

Aequitosh commented May 6, 2024

Aequitosh commented Jul 18, 2024

Aequitosh commented Jul 23, 2024

Tracking Issue: Sub-Interpreter Support #3451

Tracking Issue: Sub-Interpreter Support #3451

Comments

Aequitosh commented Sep 13, 2023

This issue really only tracks progress, for discussing everything else, feel free to join over here: Aequitosh#1

Summary

As of 13.09.2023

Goals

Mid-Term

Long-Term

Tasks

Relevant Issues & Interesting Reads

Aequitosh commented Sep 13, 2023

letalboy commented Sep 16, 2023

Aequitosh commented Feb 6, 2024 • edited Loading

Aequitosh commented Apr 4, 2024

Per-Module State

A Place For More Than statics

Next Steps

Example of Per-Module State

davidhewitt commented Apr 6, 2024

Aequitosh commented Apr 6, 2024

mejrs commented Apr 6, 2024

Aequitosh commented May 6, 2024

Aequitosh commented Jul 18, 2024

Aequitosh commented Jul 23, 2024

Aequitosh commented Feb 6, 2024 •

edited

Loading

A Place For More Than `static`s