Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set up tagged pointers in the evaluation stack #117139

Closed
2 tasks done
Tracked by #108219
Fidget-Spinner opened this issue Mar 21, 2024 · 4 comments
Closed
2 tasks done
Tracked by #108219

Set up tagged pointers in the evaluation stack #117139

Fidget-Spinner opened this issue Mar 21, 2024 · 4 comments
Assignees
Labels
topic-free-threading type-feature A feature request or enhancement

Comments

@Fidget-Spinner
Copy link
Member

Fidget-Spinner commented Mar 21, 2024

Feature or enhancement

Proposal:

This issue mostly mirrors Mark's original one at faster-cpython/ideas#632.

As part of PEP 703, tagged pointers will help achieve deferred reference counting.

Tagged pointers will also help the default GIL builds in 3.14 and future with unboxed ints.

Here's the initial design (open for comments):

We change the evaluation stack from PyObject * to PyTaggedObject. PyTaggedObject looks like this.

typedef union PyTaggedObject {
    PyObject *obj;
    uintptr_t bits;
} PyTaggedObject;

Note: I am limiting the tagged pointers to just the evaluation stack (this is up for debate as well). The main reason is to keep the change as minimally invasive as possible for 3.13 and not leak out any implementation details for now. For example, I don't want to support tagged pointers in C API functions. This may/will change in 3.14.

Thanks to the cases generator, we can then autogenerate all the proper accessors to said object. For example, a stack effect of PyObject * will generate the code effect.obj, while a stack effect of PyTaggedObject (the new default) will generate the code (PyObject *)(effect.bits & ~TAGS) or something similar.

Unboxed ints will also help both free-threaded and non-free-threaded builds by reducing refcounting even more for simple arithmetic operations (and making integer operations native). However, I am aiming for deferred tags in 3.13, and anything else is a plus.

Further note: due to wanting to respect immediate finalizers, we are not deferring that many objects. So this may not be a win on default builds. On free-threaded builds, I see a slight (10%) speedup on fibonacci microbenchmarks. Will benchmark on the default bulid as well and see the results.

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

No response

Linked PRs

Tasks

Preview Give feedback
  1. interpreter-core type-bug
@Fidget-Spinner Fidget-Spinner added the type-feature A feature request or enhancement label Mar 21, 2024
@Fidget-Spinner Fidget-Spinner self-assigned this Mar 21, 2024
@gvanrossum
Copy link
Member

So the mentions of unboxed ints are speculative and not part of the initial implementation, right?

And the tagged pointers just have their tag bit(s) removed when used in the "blob of C code" that's the instruction definition's body. For example,

        inst(STORE_FAST, (value --)) {
            SETLOCAL(oparg, value);
        }

would expand its body to

            SETLOCAL(oparg, (PyObject *)(value.bits & ~TAGS));

How do pointers receive their tag bit? Does every stack push just "or in" the tag bit(s)?

I recall trying to do this manually (before we had the generator) and it was a lot of work -- the code generator should make this much simpler. It also was quite a bit slower -- have you thought about this?

My biggest question is what does the tag bit mean? Is there a specific section of PEP 703 I should read to learn more about this?

@colesbury
Copy link
Contributor

My biggest question is what does the tag bit mean?

PEP 703 doesn't go into detail about the tag bits. I'd expect it to match the description in Mark's issue, except with unboxed ints not part of the initial implementation:

Tag Meaning Encoding
00 Normal reference (uintptr)ptr
01 Deferred reference ((uintptr)ptr) + 1

01 includes both immortal objects and those whose reference count has been deferred (but are not immortal).

I recall trying to do this manually... It also was quite a bit slower

We should only enable this in the free-threaded build to start. We do not want to introduce any performance regressions in the default build. If it turns out to be a net win for the default build, then we can consider enabling it there. Otherwise, we can revisit that in 3.14, if we pursue tagged integers.

How do pointers receive their tag bit? Does every stack push just "or in" the tag bit(s)?

Yeah, most pushes just "or in" the tag bit, which is an immortal check. So for, example GET_ITER currently has:

stack_pointer[-1] = iter;

That might be generated instead as:

#define PACK(obj) ((PyTaggedObject){.bits = (uintptr_t)obj | _Py_IsImmortal(obj)})

stack_pointer[-1] = PACK(iter);

For example, ... [STORE_FAST]

I think we want to use tagged pointers in the locals as well as the evaluation stack, but it may be easier to start with just using tagged pointers the evaluation stack. In that case STORE_FAST will need extra refcounting to convert a tagged pointer into a regular object reference. For example,

PyObject *obj = (PyObject *)(value.bits & ~TAGS);
if ((value.bits & IS_DEFERRED) {
  Py_INCREF(obj);
}
SETLOCAL(oparg, obj);

If we use tagged pointers for the locals as well, then we could skip the extra work in STORE_FAST.

Motivation

The primary motivation is to avoid reference count contention in multithreaded programs on things like functions/methods (in the free-threaded build). In particular, the benefit (from avoid reference counting contention) is going to be in LOAD_ATTR, LOAD_GLOBAL, and their variants.

For example, currently LOAD_ATTR_SLOT has code that looks like:

PyObject *attr = ...;
Py_INCREF(attr);
stack_pointer[-1] = attr;

We want the generated code to instead look something like the following, but probably refactored with helper macros/functions:

PyObject *attr = ...;
PyTaggedObject attr_tag;
if (_Py_IsImmortalOrDeferred(attr)) {
    attr_tag.bits = (uintptr_t)attr | IS_DEFERRED;
}
else {
    attr_tag.obj = attr;
}
stack_pointer[-1] = attr_tag;

@gvanrossum
Copy link
Member

Okay...

So from this definition

#define PACK(obj) ((PyTaggedObject){.bits = (uintptr_t)obj | _Py_IsImmortal(obj)})

I gather that the tag bit is (by default) only set if the object is immortal. From your Motivation here (and also from the section on deferred refcounting in PEP 703) I take it that the tag bit would also be set for selected objects like functions and methods.

Mark's original issue describes variants of INCREF/DECREF to check for the tag bit; presumably these variants should only be used in the interpreter.

@Fidget-Spinner Good luck!

Fidget-Spinner added a commit that referenced this issue Apr 30, 2024

---------

Co-authored-by: Sam Gross <655866+colesbury@users.noreply.github.com>
SonicField pushed a commit to SonicField/cpython that referenced this issue May 8, 2024

---------

Co-authored-by: Sam Gross <655866+colesbury@users.noreply.github.com>
SonicField pushed a commit to SonicField/cpython that referenced this issue May 8, 2024
@markshannon
Copy link
Member

One thing to note:

If we are going to use tagged ints, all plain "borrow" transformations will need to be removed, as it is impossible to borrow a PyObject * from a tagged int or vice-versa.
Ultimately, we will want to extend the API that takes stack references to avoid having the double conversion ref -> PyObject * -> ref when calling common API functions that take PyObject *.

In cases where it is known that the stack ref is not an int we could add StackRefNotInt_ToObjectBorrow and the reverse, so avoid the cost of two conversions.

Fidget-Spinner added a commit that referenced this issue Jun 26, 2024
This PR sets up tagged pointers for CPython.

The general idea is to create a separate struct _PyStackRef for everything on the evaluation stack to store the bits. This forces the C compiler to warn us if we try to cast things or pull things out of the struct directly.

Only for free threading: We tag the low bit if something is deferred - that means we skip incref and decref operations on it. This behavior may change in the future if Mark's plans to defer all objects in the interpreter loop pans out.

This implies a strict stack reference discipline is required. ALL incref and decref operations on stackrefs must use the stackref variants. It is unsafe to untag something then do normal incref/decref ops on it.

The new incref and decref variants are called dup and close. They mimic a "handle" API operating on these stackrefs.

Please read Include/internal/pycore_stackref.h for more information!

---------

Co-authored-by: Mark Shannon <9448417+markshannon@users.noreply.github.com>
Fidget-Spinner added a commit that referenced this issue Jun 28, 2024
mrahtz pushed a commit to mrahtz/cpython that referenced this issue Jun 30, 2024
…18450)

This PR sets up tagged pointers for CPython.

The general idea is to create a separate struct _PyStackRef for everything on the evaluation stack to store the bits. This forces the C compiler to warn us if we try to cast things or pull things out of the struct directly.

Only for free threading: We tag the low bit if something is deferred - that means we skip incref and decref operations on it. This behavior may change in the future if Mark's plans to defer all objects in the interpreter loop pans out.

This implies a strict stack reference discipline is required. ALL incref and decref operations on stackrefs must use the stackref variants. It is unsafe to untag something then do normal incref/decref ops on it.

The new incref and decref variants are called dup and close. They mimic a "handle" API operating on these stackrefs.

Please read Include/internal/pycore_stackref.h for more information!

---------

Co-authored-by: Mark Shannon <9448417+markshannon@users.noreply.github.com>
mrahtz pushed a commit to mrahtz/cpython that referenced this issue Jun 30, 2024
colesbury added a commit to colesbury/cpython that referenced this issue Jul 1, 2024
Avoids the extra conversion from stack refs to PyObjects.
colesbury added a commit that referenced this issue Jul 2, 2024
Avoids the extra conversion from stack refs to PyObjects.
Akasurde pushed a commit to Akasurde/cpython that referenced this issue Jul 3, 2024
…1244)

Avoids the extra conversion from stack refs to PyObjects.
noahbkim pushed a commit to hudson-trading/cpython that referenced this issue Jul 11, 2024
…18450)

This PR sets up tagged pointers for CPython.

The general idea is to create a separate struct _PyStackRef for everything on the evaluation stack to store the bits. This forces the C compiler to warn us if we try to cast things or pull things out of the struct directly.

Only for free threading: We tag the low bit if something is deferred - that means we skip incref and decref operations on it. This behavior may change in the future if Mark's plans to defer all objects in the interpreter loop pans out.

This implies a strict stack reference discipline is required. ALL incref and decref operations on stackrefs must use the stackref variants. It is unsafe to untag something then do normal incref/decref ops on it.

The new incref and decref variants are called dup and close. They mimic a "handle" API operating on these stackrefs.

Please read Include/internal/pycore_stackref.h for more information!

---------

Co-authored-by: Mark Shannon <9448417+markshannon@users.noreply.github.com>
noahbkim pushed a commit to hudson-trading/cpython that referenced this issue Jul 11, 2024
noahbkim pushed a commit to hudson-trading/cpython that referenced this issue Jul 11, 2024
…1244)

Avoids the extra conversion from stack refs to PyObjects.
estyxx pushed a commit to estyxx/cpython that referenced this issue Jul 17, 2024
…18450)

This PR sets up tagged pointers for CPython.

The general idea is to create a separate struct _PyStackRef for everything on the evaluation stack to store the bits. This forces the C compiler to warn us if we try to cast things or pull things out of the struct directly.

Only for free threading: We tag the low bit if something is deferred - that means we skip incref and decref operations on it. This behavior may change in the future if Mark's plans to defer all objects in the interpreter loop pans out.

This implies a strict stack reference discipline is required. ALL incref and decref operations on stackrefs must use the stackref variants. It is unsafe to untag something then do normal incref/decref ops on it.

The new incref and decref variants are called dup and close. They mimic a "handle" API operating on these stackrefs.

Please read Include/internal/pycore_stackref.h for more information!

---------

Co-authored-by: Mark Shannon <9448417+markshannon@users.noreply.github.com>
estyxx pushed a commit to estyxx/cpython that referenced this issue Jul 17, 2024
estyxx pushed a commit to estyxx/cpython that referenced this issue Jul 17, 2024
…1244)

Avoids the extra conversion from stack refs to PyObjects.
colesbury added a commit to colesbury/cpython that referenced this issue Jul 26, 2024
`_PyDict_SetItem_Take2` steals both the key (i.e., `sub`) and the value.
colesbury added a commit to colesbury/cpython that referenced this issue Aug 6, 2024
kumaraditya303 pushed a commit that referenced this issue Aug 7, 2024
`_PyDict_SetItem_Take2` steals both the key (i.e., `sub`) and the value.
colesbury added a commit to colesbury/cpython that referenced this issue Aug 8, 2024
This replaces `_PyList_FromArraySteal` with `_PyList_FromStackRefSteal`.
It's functionally equivalent, but takes a `_PyStackRef` array instead of
an array of `PyObject` pointers.

Co-authored-by: Ken Jin <kenjin@python.org>
colesbury added a commit to colesbury/cpython that referenced this issue Aug 8, 2024
This replaces `_PyList_FromArraySteal` with `_PyList_FromStackRefSteal`.
It's functionally equivalent, but takes a `_PyStackRef` array instead of
an array of `PyObject` pointers.

Co-authored-by: Ken Jin <kenjin@python.org>
colesbury added a commit to colesbury/cpython that referenced this issue Aug 8, 2024
BUILD_SET should use a borrow instead of a steal. The cleanup in
_DO_CALL CONVERSION_FAILED was incorrect.

Co-authored-by: Ken Jin <kenjin@python.org>
colesbury added a commit that referenced this issue Aug 12, 2024
`BUILD_SET` should use a borrow instead of a steal. The cleanup in `_DO_CALL`
`CONVERSION_FAILED` was incorrect.

Co-authored-by: Ken Jin <kenjin@python.org>
colesbury added a commit that referenced this issue Aug 12, 2024
…2830)

This replaces `_PyList_FromArraySteal` with `_PyList_FromStackRefSteal`.
It's functionally equivalent, but takes a `_PyStackRef` array instead of
an array of `PyObject` pointers.

Co-authored-by: Ken Jin <kenjin@python.org>
colesbury added a commit to colesbury/cpython that referenced this issue Aug 12, 2024
* The free-threaded GC now visits interpreter stacks to keep objects
  that use deferred reference counting alive.

* Interpreter frames are zero initialized in the free-threaded GC so
  that the GC doesn't see garbage data. This is a temporary measure
  until stack spilling around escaping calls is implemented.

Co-authored-by: Ken Jin <kenjin@python.org>
colesbury added a commit that referenced this issue Aug 15, 2024
The free-threaded GC now visits interpreter stacks to keep objects
that use deferred reference counting alive.

Interpreter frames are zero initialized in the free-threaded GC so
that the GC doesn't see garbage data. This is a temporary measure
until stack spilling around escaping calls is implemented.

Co-authored-by: Ken Jin <kenjin@python.org>
blhsing pushed a commit to blhsing/cpython that referenced this issue Aug 22, 2024
`_PyDict_SetItem_Take2` steals both the key (i.e., `sub`) and the value.
blhsing pushed a commit to blhsing/cpython that referenced this issue Aug 22, 2024
`BUILD_SET` should use a borrow instead of a steal. The cleanup in `_DO_CALL`
`CONVERSION_FAILED` was incorrect.

Co-authored-by: Ken Jin <kenjin@python.org>
blhsing pushed a commit to blhsing/cpython that referenced this issue Aug 22, 2024
python#122830)

This replaces `_PyList_FromArraySteal` with `_PyList_FromStackRefSteal`.
It's functionally equivalent, but takes a `_PyStackRef` array instead of
an array of `PyObject` pointers.

Co-authored-by: Ken Jin <kenjin@python.org>
blhsing pushed a commit to blhsing/cpython that referenced this issue Aug 22, 2024
…ython#122956)

The free-threaded GC now visits interpreter stacks to keep objects
that use deferred reference counting alive.

Interpreter frames are zero initialized in the free-threaded GC so
that the GC doesn't see garbage data. This is a temporary measure
until stack spilling around escaping calls is implemented.

Co-authored-by: Ken Jin <kenjin@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-free-threading type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

4 participants