Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyUnicode_DecodeUTF8Stateful() does not set *consumed for ASCII-only string #99612

Closed
serhiy-storchaka opened this issue Nov 20, 2022 · 2 comments
Labels
3.9 only security fixes 3.10 only security fixes release-blocker topic-C-API topic-unicode type-bug An unexpected behavior, bug, or error

Comments

@serhiy-storchaka
Copy link
Member

serhiy-storchaka commented Nov 20, 2022

PyUnicode_DecodeUTF8Stateful() should save the number of successfully decoded bytes in *consumed. But if all bytes are in the ASCII range, it uses a fast path and does not set *consumed.

It was found during writing coverage tests for Unicode C API (#99593).

Linked PRs

@serhiy-storchaka serhiy-storchaka added type-bug An unexpected behavior, bug, or error topic-unicode 3.11 only security fixes 3.10 only security fixes topic-C-API 3.12 bugs and security fixes labels Nov 20, 2022
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Nov 20, 2022
Previously *consumed was not set in this case.
serhiy-storchaka added a commit that referenced this issue Dec 1, 2022
carljm added a commit to carljm/cpython that referenced this issue Dec 1, 2022
* main: (112 commits)
  pythongh-99894: Ensure the local names don't collide with the test file in traceback suggestion error checking (python#99895)
  pythongh-99612: Fix PyUnicode_DecodeUTF8Stateful() for ASCII-only data (pythonGH-99613)
  Doc: Add summary line to isolation_level & autocommit sqlite3.connect params (python#99917)
  pythonGH-98906 ```re``` module: ```search() vs. match()``` section should mention ```fullmatch()``` (pythonGH-98916)
  pythongh-89189: More compact range iterator (pythonGH-27986)
  bpo-47220: Document the optional callback parameter of weakref.WeakMethod (pythonGH-25491)
  pythonGH-99905: Fix output of misses in summarize_stats.py execution counts (pythonGH-99906)
  pythongh-99845: PEP 670: Convert PyObject macros to functions (python#99850)
  pythongh-99845: Use size_t type in __sizeof__() methods (python#99846)
  pythonGH-99877)
  Fix typo in exception message in `multiprocessing.pool` (python#99900)
  pythongh-87092: move all localsplus preparation into separate function called from assembler stage (pythonGH-99869)
  pythongh-99891: Fix infinite recursion in the tokenizer when showing warnings (pythonGH-99893)
  pythongh-99824: Document that sqlite3.connect implicitly open a transaction if autocommit=False (python#99825)
  pythonGH-81057: remove static state from suggestions.c (python#99411)
  Improve zip64 limit error message (python#95892)
  pythongh-98253: Break potential reference cycles in external code worsened by typing.py lru_cache (python#98591)
  pythongh-99127: Allow some features of syslog to the main interpreter only (pythongh-99128)
  pythongh-82836: fix private network check (python#97733)
  Docs: improve accuracy of socketserver reference (python#24767)
  ...
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Jul 25, 2023
…nly data (pythonGH-99613)

Previously *consumed was not set in this case..
(cherry picked from commit f08e52c)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@serhiy-storchaka serhiy-storchaka added 3.9 only security fixes 3.8 (EOL) end of life release-blocker labels Jul 25, 2023
@serhiy-storchaka
Copy link
Member Author

Since this is a bug in the C API, I consider it as a security level fix.

serhiy-storchaka added a commit that referenced this issue Jul 25, 2023
…ta (GH-99613) (GH-107224)

Previously *consumed was not set in this case.
(cherry picked from commit f08e52c)
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jul 25, 2023
…nly data (pythonGH-99613) (pythonGH-107224)

(cherry picked from commit b8b3e6a)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Previously *consumed was not set in this case.
(cherry picked from commit f08e52c)
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Jul 25, 2023
…SCII-only data (pythonGH-99613) (pythonGH-107224)

Previously *consumed was not set in this case.
(cherry picked from commit f08e52c).
(cherry picked from commit b8b3e6a)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
@gpshead gpshead removed 3.11 only security fixes 3.12 bugs and security fixes labels Jul 26, 2023
ambv pushed a commit that referenced this issue Aug 22, 2023
…ta (GH-99613) (GH-107224) (#107230)

Previously *consumed was not set in this case.

(cherry picked from commit b8b3e6a)
(cherry picked from commit f08e52c)

Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
ambv pushed a commit that referenced this issue Aug 22, 2023
GH-99613) (GH-107224) (#107231)

Previously *consumed was not set in this case.
(cherry picked from commit f08e52c).
(cherry picked from commit b8b3e6a)
@serhiy-storchaka serhiy-storchaka removed the 3.8 (EOL) end of life label Aug 23, 2023
@serhiy-storchaka
Copy link
Member Author

3.8 is not affected. The bug was introduced in 770847a (#81529).

carlosroman pushed a commit to DataDog/cpython that referenced this issue Oct 11, 2023
…ly data (pythonGH-99613) (pythonGH-107224) (python#107231)

Previously *consumed was not set in this case.
(cherry picked from commit f08e52c).
(cherry picked from commit b8b3e6a)
netbsd-srcmastr referenced this issue in NetBSD/pkgsrc Feb 20, 2024
5.6.2

- Fixed ``__hash__()`` of the C version of the ``CBORTag`` type crashing when there's a recursive
  reference cycle
- Fixed type annotation for the file object in ``cbor2.dump()``, ``cbor2.load()``, ``CBOREncoder``
  and ``CBORDecoder`` to be ``IO[bytes]`` instead of ``BytesIO``
- Worked around a `CPython bug <https://github.com/python/cpython/issues/99612>`_ that caused
  a ``SystemError`` to be raised, or even a buffer overflow to occur when decoding a long text
  string that contained only ASCII characters
- Changed the return type annotations of ``cbor2.load()`` and ``cbor2.load()`` to return ``Any``
  instead of ``object`` so as not to force users to make type casts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.9 only security fixes 3.10 only security fixes release-blocker topic-C-API topic-unicode type-bug An unexpected behavior, bug, or error
Projects
Development

No branches or pull requests

3 participants