Skip to content

Commit

Permalink
pythongh-119182: Add PyUnicodeWriter C API (python#119184)
Browse files Browse the repository at this point in the history
  • Loading branch information
vstinner authored Jun 17, 2024
1 parent 2c7209a commit 5c4235c
Show file tree
Hide file tree
Showing 6 changed files with 533 additions and 18 deletions.
84 changes: 84 additions & 0 deletions Doc/c-api/unicode.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1502,3 +1502,87 @@ They all return ``NULL`` or ``-1`` if an exception occurs.
:c:func:`PyUnicode_InternInPlace`, returning either a new Unicode string
object that has been interned, or a new ("owned") reference to an earlier
interned string object with the same value.
PyUnicodeWriter
^^^^^^^^^^^^^^^
The :c:type:`PyUnicodeWriter` API can be used to create a Python :class:`str`
object.
.. versionadded:: 3.14
.. c:type:: PyUnicodeWriter
A Unicode writer instance.
The instance must be destroyed by :c:func:`PyUnicodeWriter_Finish` on
success, or :c:func:`PyUnicodeWriter_Discard` on error.
.. c:function:: PyUnicodeWriter* PyUnicodeWriter_Create(Py_ssize_t length)
Create a Unicode writer instance.
Set an exception and return ``NULL`` on error.
.. c:function:: PyObject* PyUnicodeWriter_Finish(PyUnicodeWriter *writer)
Return the final Python :class:`str` object and destroy the writer instance.
Set an exception and return ``NULL`` on error.
.. c:function:: void PyUnicodeWriter_Discard(PyUnicodeWriter *writer)
Discard the internal Unicode buffer and destroy the writer instance.
.. c:function:: int PyUnicodeWriter_WriteChar(PyUnicodeWriter *writer, Py_UCS4 ch)
Write the single Unicode character *ch* into *writer*.
On success, return ``0``.
On error, set an exception, leave the writer unchanged, and return ``-1``.
.. c:function:: int PyUnicodeWriter_WriteUTF8(PyUnicodeWriter *writer, const char *str, Py_ssize_t size)
Decode the string *str* from UTF-8 in strict mode and write the output into *writer*.
*size* is the string length in bytes. If *size* is equal to ``-1``, call
``strlen(str)`` to get the string length.
On success, return ``0``.
On error, set an exception, leave the writer unchanged, and return ``-1``.
To use a different error handler than ``strict``,
:c:func:`PyUnicode_DecodeUTF8` can be used with
:c:func:`PyUnicodeWriter_WriteStr`.
.. c:function:: int PyUnicodeWriter_WriteStr(PyUnicodeWriter *writer, PyObject *obj)
Call :c:func:`PyObject_Str` on *obj* and write the output into *writer*.
On success, return ``0``.
On error, set an exception, leave the writer unchanged, and return ``-1``.
.. c:function:: int PyUnicodeWriter_WriteRepr(PyUnicodeWriter *writer, PyObject *obj)
Call :c:func:`PyObject_Repr` on *obj* and write the output into *writer*.
On success, return ``0``.
On error, set an exception, leave the writer unchanged, and return ``-1``.
.. c:function:: int PyUnicodeWriter_WriteSubstring(PyUnicodeWriter *writer, PyObject *str, Py_ssize_t start, Py_ssize_t end)
Write the substring ``str[start:end]`` into *writer*.
*str* must be Python :class:`str` object. *start* must be greater than or
equal to 0, and less than or equal to *end*. *end* must be less than or
equal to *str* length.
On success, return ``0``.
On error, set an exception, leave the writer unchanged, and return ``-1``.
.. c:function:: int PyUnicodeWriter_Format(PyUnicodeWriter *writer, const char *format, ...)
Similar to :c:func:`PyUnicode_FromFormat`, but write the output directly into *writer*.
On success, return ``0``.
On error, set an exception, leave the writer unchanged, and return ``-1``.
15 changes: 15 additions & 0 deletions Doc/whatsnew/3.14.rst
Original file line number Diff line number Diff line change
Expand Up @@ -283,6 +283,21 @@ New Features
* Add :c:func:`PyLong_GetSign` function to get the sign of :class:`int` objects.
(Contributed by Sergey B Kirpichev in :gh:`116560`.)

* Add a new :c:type:`PyUnicodeWriter` API to create a Python :class:`str`
object:

* :c:func:`PyUnicodeWriter_Create`.
* :c:func:`PyUnicodeWriter_Discard`.
* :c:func:`PyUnicodeWriter_Finish`.
* :c:func:`PyUnicodeWriter_WriteChar`.
* :c:func:`PyUnicodeWriter_WriteUTF8`.
* :c:func:`PyUnicodeWriter_WriteStr`.
* :c:func:`PyUnicodeWriter_WriteRepr`.
* :c:func:`PyUnicodeWriter_WriteSubstring`.
* :c:func:`PyUnicodeWriter_Format`.

(Contributed by Victor Stinner in :gh:`119182`.)

Porting to Python 3.14
----------------------

Expand Down
37 changes: 35 additions & 2 deletions Include/cpython/unicodeobject.h
Original file line number Diff line number Diff line change
Expand Up @@ -444,7 +444,40 @@ PyAPI_FUNC(PyObject*) PyUnicode_FromKindAndData(
Py_ssize_t size);


/* --- _PyUnicodeWriter API ----------------------------------------------- */
/* --- Public PyUnicodeWriter API ----------------------------------------- */

typedef struct PyUnicodeWriter PyUnicodeWriter;

PyAPI_FUNC(PyUnicodeWriter*) PyUnicodeWriter_Create(Py_ssize_t length);
PyAPI_FUNC(void) PyUnicodeWriter_Discard(PyUnicodeWriter *writer);
PyAPI_FUNC(PyObject*) PyUnicodeWriter_Finish(PyUnicodeWriter *writer);

PyAPI_FUNC(int) PyUnicodeWriter_WriteChar(
PyUnicodeWriter *writer,
Py_UCS4 ch);
PyAPI_FUNC(int) PyUnicodeWriter_WriteUTF8(
PyUnicodeWriter *writer,
const char *str,
Py_ssize_t size);

PyAPI_FUNC(int) PyUnicodeWriter_WriteStr(
PyUnicodeWriter *writer,
PyObject *obj);
PyAPI_FUNC(int) PyUnicodeWriter_WriteRepr(
PyUnicodeWriter *writer,
PyObject *obj);
PyAPI_FUNC(int) PyUnicodeWriter_WriteSubstring(
PyUnicodeWriter *writer,
PyObject *str,
Py_ssize_t start,
Py_ssize_t end);
PyAPI_FUNC(int) PyUnicodeWriter_Format(
PyUnicodeWriter *writer,
const char *format,
...);


/* --- Private _PyUnicodeWriter API --------------------------------------- */

typedef struct {
PyObject *buffer;
Expand All @@ -466,7 +499,7 @@ typedef struct {
/* If readonly is 1, buffer is a shared string (cannot be modified)
and size is set to 0. */
unsigned char readonly;
} _PyUnicodeWriter ;
} _PyUnicodeWriter;

// Initialize a Unicode writer.
//
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Add a new :c:type:`PyUnicodeWriter` API to create a Python :class:`str` object:

* :c:func:`PyUnicodeWriter_Create`.
* :c:func:`PyUnicodeWriter_Discard`.
* :c:func:`PyUnicodeWriter_Finish`.
* :c:func:`PyUnicodeWriter_WriteChar`.
* :c:func:`PyUnicodeWriter_WriteUTF8`.
* :c:func:`PyUnicodeWriter_WriteStr`.
* :c:func:`PyUnicodeWriter_WriteRepr`.
* :c:func:`PyUnicodeWriter_WriteSubstring`.
* :c:func:`PyUnicodeWriter_Format`.

Patch by Victor Stinner.
Loading

0 comments on commit 5c4235c

Please sign in to comment.