[C API] Add new C functions with more regular reference counting like PyTuple_GetItemRef() #86460

vstinner · 2020-11-09T09:42:14Z

BPO	42294
Nosy	@ronaldoussoren, @vstinner, @markshannon, @serhiy-storchaka, @shihai1991, @Fidget-Spinner
PRs	bpo-42294: Add borrowed/strong reference to doc glossary #23206 [WIP] bpo-42294: Add PyTuple_GetItemRef() function #23207 [WIP] bpo-42294: Add Py_SetRef() and Py_XSetRef() #23209 bpo-42294: Grammar fixes in glossary strong/weak references #23227

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = <Date 2021-09-21.21:51:54.954>
created_at = <Date 2020-11-09.09:42:14.420>
labels = ['expert-C-API', '3.10']
title = '[C API] Add new C functions with more regular reference counting like PyTuple_GetItemRef()'
updated_at = <Date 2021-09-21.21:51:54.954>
user = 'https://github.com/vstinner'

bugs.python.org fields:

activity = <Date 2021-09-21.21:51:54.954>
actor = 'vstinner'
assignee = 'none'
closed = True
closed_date = <Date 2021-09-21.21:51:54.954>
closer = 'vstinner'
components = ['C API']
creation = <Date 2020-11-09.09:42:14.420>
creator = 'vstinner'
dependencies = []
files = []
hgrepos = []
issue_num = 42294
keywords = ['patch']
message_count = 12.0
messages = ['380578', '380583', '380592', '380593', '380594', '380599', '380621', '380729', '380730', '380731', '380735', '402372']
nosy_count = 6.0
nosy_names = ['ronaldoussoren', 'vstinner', 'Mark.Shannon', 'serhiy.storchaka', 'shihai1991', 'kj']
pr_nums = ['23206', '23207', '23209', '23227']
priority = 'normal'
resolution = 'rejected'
stage = 'resolved'
status = 'closed'
superseder = None
type = None
url = 'https://bugs.python.org/issue42294'
versions = ['Python 3.10']

vstinner · 2020-11-09T09:42:13Z

The C API of Python uses and abuses borrowed references and stealing references for performance. When such function is used in some very specific code for best performances, problems arise when they are the only way to access objects. Reference counting in C is error prone, most people, even experimented core developers, get it wrong. Examples of issues:

Reference leaks: objects are never deleted causing memory leaks. For example, an error handling code which forgets to call Py_DECREF() on a newly create object.
Unsafe borrowed references: call arbitrary Python code can delete the referenced objects, and so the borrowed reference becomes a dangling pointer. Most developers are confident that a function call cannot run arbitrary Python code, whereas a single Py_DECREF() can trigger a GC collection which runs finalizers which can be arbitrary Python code. Many functions have been fixed manually by adding Py_INCREF() and Py_DECREF() around "unsafe" function calls.

Borrowed references and stealing references make reference counting code special, even more complex to review. I propose to use new function to make refecence counting code more regular, simpler to review, and so less error prone.

Examples:

Add PyTuple_GetItemRef(): similar to PyTuple_GetItem() but returns a strong reference (or NULL if the tuple item is not set)
Add PyTuple_SetItemRef(): similar to PyTuple_SetItem() but don't steal a reference to the new item

The C API has a long list of functions using borrowed references, so I'm not sure where we should stop. I propose to start with the most common functions: PyDict, PyTuple, PyList, and see how it goes.

--

PyTuple_GetItem() is a function call which checks arguments: raise an exception if arguments are invalid. For best performances, PyTuple_GET_ITEM() macro is providing to skip these checks. This macro also returns a borrowed reference.

I'm not if a new PyTuple_GET_ITEM_REF() macro should be added: similar to PyTuple_GET_ITEM() but returns a strong reference.

Same open question abut PyTuple_SET_ITEM(tuple, index, item) macro which is also special:

Don't call Py_XINCREF(item)
Don't call Py_XDECREF() on the old item

If a new PyTuple_SET_ITEM_REF() macro is added, I would prefer to make the function more "regular" in term of reference counting, and so call Py_XDECREF() on the old item. When used on a newly created tuple, it would add an useless Py_XDECREF(NULL), compared to PyTuple_SET_ITEM(). Again, my idea here is to provide functions with a less surprising behavior and more regular reference counting. There are alternatives to build a new tuple without the useless Py_XDECREF(NULL), like Py_BuildValue().

Code which requires best performance could continue to use PyTuple_SET_ITEM() which is not deprecated, and handle reference counting manually.

--

An alternative is to use abstract functions like:

PyTuple_GetItem() => PySequence_GetItem()
PyDict_GetItem() => PyObject_GetItem()
etc.

I propose to keep specialized functions per type to avoid the overhead of indirection. For example, PySequence_GetItem(obj, index) calls Py_TYPE(obj)->tp_as_sequence->sq_item(obj, index) which implies multiple indirection:

Get the object type from PyObject.ob_type
Dereference *type to get PyTypeObject.tp_as_sequence
Dereference *PyTypeObject.tp_as_sequence to get PySequenceMethods.sq_item

--

I don't plan to get rid of borrowed references. Sometimes, they are safe and replacing them with strong references would require explicit reference counting code which is again easy to get wrong.

For example, Py_TYPE() returns a borrowed reference to an object type. The function is commonly used to access immediately to a type member, with no risk of calling arbitrary Python code between the Py_TYPE() call and the read of the type attribute. For example, the following code is perfectly safe:

        PyErr_Format(PyExc_TypeError, "exec() globals must be a dict, not %.100s",
                     Py_TYPE(globals)->tp_name);

--

See also bpo-42262 where I added Py_NewRef() and Py_XNewRef() functions.

See https://pythoncapi.readthedocs.io/bad_api.html#borrowed-references for details about issues caused by borrowed references and a list of functions using borrowed references.

vstinner · 2020-11-09T12:40:57Z

New changeset 23c5f93 by Victor Stinner in branch 'master':
bpo-42294: Add borrowed/strong reference to doc glossary (GH-23206)
23c5f93

markshannon · 2020-11-09T14:56:57Z

I'm not convinced that this is worth the effort.

The old functions aren't going away, so these additional functions provide no real safety.
You can't stop C programmers trading away correctness for some perceived performance benefit :(

If we were designing the API from scratch, then this would be a better set of functions. But because the old functions remain, it just means we are making the API larger.

Please don't add macros, use inline functions.

There seems to be some confusion about borrowed references and stolen references in https://pythoncapi.readthedocs.io/bad_api.html#borrowed-references
"Stealing" a reference is perfectly safe. Returning a "borrowed" reference is not.

So, don't bother with PyTuple_SetItemRef(), as PyTupleSetItem() is safe.

vstinner · 2020-11-09T15:20:48Z

Mark Shannon:

The old functions aren't going away, so these additional functions provide no real safety. You can't stop C programmers trading away correctness for some perceived performance benefit :(

In my experience, newcomers tend to copy existing more. If slowly, the code base moves towards safer code less error-prone code like Py_NewRef() or Py_SETREF(), slowly, we will avoid a bunch of bugs.

If we were designing the API from scratch, then this would be a better set of functions. But because the old functions remain, it just means we are making the API larger.

New API VS enhance the existing API. So far, no approach won. I wrote the PEP-620 to enhance the C API and towards a more opaque API, and there is the HPy project which is a new API written correctly from the start. But HPy is not usable yet, and migrating C extensions to HPy will take years.

Also, enhancing the existing API and writing a new API are not exclusive option.

What is the issue of making the C API larger?

Please don't add macros, use inline functions.

For Py_NewRef(), I used all at once :-) static inline function + function + macro :-)

It's exported as a regular function for the stable ABI, but overriden by a static inline function with a macro. The idea is to allow to use it for developers who cannot use static inline functions (ex: extension modules not written in C).

I chose to redefine functions as static inline functions in the limited C API. If it's an issue, we can consider to only do that in Include/cpython/object.h.

There seems to be some confusion about borrowed references and stolen references in https://pythoncapi.readthedocs.io/bad_api.html#borrowed-references
"Stealing" a reference is perfectly safe. Returning a "borrowed" reference is not.

So, don't bother with PyTuple_SetItemRef(), as PyTupleSetItem() is safe.

I'm really annoyed that almost all functions increase the refcount of their arugments, except a bunch of special cases. I would like to move towards a more regular API.

PyTuple_SetItem() is annoying because it steals a reference to the item. Also, it doesn't clear the reference of the previous item, which is also likely to introduce a reference leak.

ronaldoussoren · 2020-11-09T15:31:06Z

PyTuple_SetItem() does clear the previous item, it uses Py_XSETREF. The macro version (PyTuple_SET_ITEM) does not clear the previous item.

vstinner · 2020-11-09T17:31:27Z

PyTuple_SetItem() does clear the previous item, it uses Py_XSETREF. The macro version (PyTuple_SET_ITEM) does not clear the previous item.

Oh sorry, I was thinking at PyTuple_SET_ITEM().

serhiy-storchaka · 2020-11-09T22:22:45Z

I concur with Mark.

If you want to work only with non-borrowed references, use PySequence_GetItem() and PySequence_SetItem(). It has a cost: it is slower and needs checking errors. If you need more performant solution and binary compatibility across versions, use PyTuple_GetItem() and PyTuple_SetItem() (borrowed references is the part of optimization). If you don't need binary compatibility, but need speed, use macros.

And no need to expand the C API. It is already large enough.

vstinner · 2020-11-10T23:57:04Z

New changeset 78ba7c6 by kj in branch 'master':
bpo-42294: Grammar fixes in doc glossary strong/weak refs (GH-23227)
78ba7c6

vstinner · 2020-11-11T00:34:03Z

In bpo-1635741, I added PyModule_AddObjectRef() (commit 8021875):
https://docs.python.org/dev/c-api/module.html#c.PyModule_AddObjectRef

"Similar to PyModule_AddObject() but don't steal a reference to the value on success."

I was tired of bugs caused by misusage of the surprising PyModule_AddObject() API.

PyModule_AddObject() *is* useful in some cases, but it is confusing in most cases...

vstinner · 2020-11-11T00:36:09Z

If you want to work only with non-borrowed references, (...)

This is not my goal here. My goal is to reduce the risk of memory leaks.

serhiy-storchaka · 2020-11-11T06:47:16Z

PyModule_AddObject() has unique weird design, it is easy to misuse, and most code misuse it, but fixing it would break the code which uses it correctly.

I did not see any problems with PyTuple_GetItem().

vstinner · 2021-09-21T21:51:55Z

There is no consensus on changing things, so I just close my issue.

vstinner added 3.10 only security fixes topic-C-API labels Nov 9, 2020

vstinner closed this as completed Sep 21, 2021

ezio-melotti transferred this issue from another repository Apr 10, 2022

ronaldoussoren mentioned this issue Aug 18, 2022

Subtle issue with borrowed references in extensions. #95797

Closed

vstinner mentioned this issue Oct 31, 2022

Macro Py_CLEAR references argument two times. #98724

Open

vstinner mentioned this issue May 17, 2023

Irregular reference counting capi-workgroup/problems#8

Open

vstinner mentioned this issue Jun 23, 2023

C API: Add PyDict_GetItemRef() function #106004

Closed

vstinner mentioned this issue Apr 3, 2024

C API: Add PyTuple_GetItemRef(), similar to PyTuple_GetItem() but return a strong reference #117518

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[C API] Add new C functions with more regular reference counting like PyTuple_GetItemRef() #86460

[C API] Add new C functions with more regular reference counting like PyTuple_GetItemRef() #86460

vstinner commented Nov 9, 2020

vstinner commented Nov 9, 2020 •

edited

Loading

vstinner commented Nov 9, 2020

markshannon commented Nov 9, 2020

vstinner commented Nov 9, 2020

ronaldoussoren commented Nov 9, 2020

vstinner commented Nov 9, 2020

serhiy-storchaka commented Nov 9, 2020

vstinner commented Nov 10, 2020

vstinner commented Nov 11, 2020

vstinner commented Nov 11, 2020

serhiy-storchaka commented Nov 11, 2020

vstinner commented Sep 21, 2021

[C API] Add new C functions with more regular reference counting like PyTuple_GetItemRef() #86460

[C API] Add new C functions with more regular reference counting like PyTuple_GetItemRef() #86460

Comments

vstinner commented Nov 9, 2020

vstinner commented Nov 9, 2020 • edited Loading

vstinner commented Nov 9, 2020

markshannon commented Nov 9, 2020

vstinner commented Nov 9, 2020

ronaldoussoren commented Nov 9, 2020

vstinner commented Nov 9, 2020

serhiy-storchaka commented Nov 9, 2020

vstinner commented Nov 10, 2020

vstinner commented Nov 11, 2020

vstinner commented Nov 11, 2020

serhiy-storchaka commented Nov 11, 2020

vstinner commented Sep 21, 2021

vstinner commented Nov 9, 2020 •

edited

Loading