-
-
Notifications
You must be signed in to change notification settings - Fork 30.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-45490: Convert unicodeobject.h macros to static inline functions #31221
Conversation
* Convert unicodeobject.h macros to static inline functions. * Reorder functions to declare functions before their first usage. * Add "kind" variable to PyUnicode_READ_CHAR() and PyUnicode_MAX_CHAR_VALUE() functions to only call PyUnicode_KIND() once. * PyUnicode_KIND() now returns an "enum PyUnicode_Kind". * Simplify PyUnicode_GET_SIZE(). * Add assertions to PyUnicode_WRITE() on the max value. * Add cast macros: * _PyASCIIObject_CAST() * _PyCompactUnicodeObject_CAST() * _PyUnicodeObject_CAST() * The following functions are now declared as deprecated using Py_DEPRECATED(3.3): * PyUnicode_GET_SIZE() * PyUnicode_GET_DATA_SIZE() * PyUnicode_AS_UNICODE() * PyUnicode_AS_DATA() * The implementation of these functions disable deprecation warnings in their body. * PyUnicode_READ_CHAR() now uses PyUnicode_1BYTE_DATA(), PyUnicode_2BYTE_DATA() and PyUnicode_4BYTE_DATA(). * Replace "const PyObject*" with "PyObject*" in _decimal.c and pystrhex.c: PyUnicode_READY() can modify the object. * Replace "const void *data" with "void *data" in some unicodedata.c and unicodeobject.c functions which use PyUnicode_WRITE(): data is used to modify the string.
@erlend-aasland: All in one PR to convert (almost) all macros of Include/cpython/unicodeobject.h. I created a single PR to show what can be done with PEP 670, but IMO it will be better to split this large PR into smaller PRs to ease review, and apply (minor) API changes / cleanup in following PRs (not do everything at once). |
Sounds good. |
IMO, this is a great improvement when it comes to readability/maintainability. |
@@ -1895,7 +1895,7 @@ is_space(enum PyUnicode_Kind kind, const void *data, Py_ssize_t pos) | |||
Return NULL if malloc fails and an empty string if invalid characters | |||
are found. */ | |||
static char * | |||
numeric_as_ascii(const PyObject *u, int strip_ws, int ignore_underscores) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to remove const
? Ditto for the rest of the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the "u" string is not ready, PyUnicode_READY() will modify it. It's not a read-only operation.
In Python 3.12, PyUnicode_WCHAR_KIND will be removed: https://www.python.org/dev/peps/pep-0623/
In the meanwhile, I prefer to not stop lying: we do modify the object :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a whole review, just dropping some notes.
assert(PyUnicode_Check(op)); | ||
assert(PyUnicode_IS_READY(op)); | ||
return _PyASCIIObject_CAST(op)->length; | ||
} | ||
|
||
/* In the access macros below, "kind" may be evaluated more than once. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
presumably update comments like these to just mention that they used to be macros with these caveats in previous python versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh thanks, I didn't look at comments at all. I laser focused on macros code and make sure that I don't change the code :-)
if (kind == PyUnicode_1BYTE_KIND) { | ||
return PyUnicode_1BYTE_DATA(unicode)[index]; | ||
} | ||
else if (PyUnicode_KIND(unicode) == PyUnicode_2BYTE_KIND) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a function this no longer needs to be called twice. (and the comment above becomes less true)
@@ -280,26 +341,24 @@ PyAPI_FUNC(int) _PyUnicode_CheckConsistency( | |||
#define SSTATE_INTERNED_IMMORTAL 2 | |||
|
|||
/* Use only if you know it's a string */ | |||
#define PyUnicode_CHECK_INTERNED(op) \ | |||
(((PyASCIIObject *)(op))->state.interned) | |||
static inline int PyUnicode_CHECK_INTERNED(PyObject *op) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to avoid the cast going away, consider doing what Py_INCREF did & add indirection through a macro for the cast:
#define PyUnicode_CHECK_INTERNED(op) \
_PyUnicode_CHECK_INTERNED(_PyASCIIObject_CAST(op))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SC asked to not add such macro :-) You're right that without such macro, there is a risk of introducing new compiler warnings.
If possible I would prefer to keep PyObject*
for functions in unicodeobject.c, since it's the type used for arguments in existing functions and the type returned by functions creating strings like PyUnicode_New(), PyUnicode_FromString(), etc.
Maybe for this specific header file, we can avoid casts.
For me, PyASCIIObject is an implementation detail which should be hidden. If possible, it should even be moved to the internal C API, but that's a way larger topic which may require a PEP ;-)
The
Changing |
Macros not casting their arguments:
Macro casting its argument to
Macros using a cast in their implementation:
The majority of macros use PyUnicode_READ() and PyUnicode_WRITE() expect (kind, data, index) and (kind, data, index, value) arguments: no Python object. |
This PR was an example. I updated PEP 670 from the discussion on this PR. If PEP 670 is accepted, I will rewrite this PR with smaller changes to ease the review. |
Convert unicodeobject.h macros to static inline functions.
Reorder functions to declare functions before their first usage.
Add "kind" variable to PyUnicode_READ_CHAR() and
PyUnicode_MAX_CHAR_VALUE() functions to only call PyUnicode_KIND()
once.
PyUnicode_KIND() now returns an "enum PyUnicode_Kind".
Simplify PyUnicode_GET_SIZE().
Add assertions to PyUnicode_WRITE() on the max value.
Add cast macros:
The following functions are now declared as deprecated using
Py_DEPRECATED(3.3):
warnings in their body.
PyUnicode_READ_CHAR() now uses PyUnicode_1BYTE_DATA(),
PyUnicode_2BYTE_DATA() and PyUnicode_4BYTE_DATA().
Replace "const PyObject*" with "PyObject*" in _decimal.c
and pystrhex.c: PyUnicode_READY() can modify the object.
Replace "const void *data" with "void *data" in some unicodedata.c
and unicodeobject.c functions which use PyUnicode_WRITE(): data is
used to modify the string.
https://bugs.python.org/issue45490