-
-
Notifications
You must be signed in to change notification settings - Fork 30.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extended ASCII characters in multiline strings cause "SystemError: Negative size passed to PyUnicode_New" when the encoding is not specified #96611
Comments
#96270 may fix this. Let me confirm. |
It does not fix this. |
Since copy-and-paste doesn't usually preserve broken encodings, this is a convenient way to reproduce the bug:
|
In Lines 1936 to 1948 in 6744490
In this case, however, |
…string (pythonGH-96623) (cherry picked from commit 05692c6) Co-authored-by: Michael Droettboom <mdboom@gmail.com>
…string (pythonGH-96623) (cherry picked from commit 05692c6) Co-authored-by: Michael Droettboom <mdboom@gmail.com>
Bug report
In some cases, when dealing with multi-line string with non-utf8 encoded files, python will throw a
SystemError: Negative size passed to PyUnicode_New
and not execute any code.Minimal test case:
This is only a problem if the non-utf8 character lies on a new line (at any point in the line)
A similar test case behaves correctly
And reports an encoding warning, which is the expected behavior
Since this is an encoding related errors, both files are attached (as .txt, GitHub does not allow .py attachments).
test.txt - single line (correct behavior)
test_ml.txt - multi line (bug)
My environment
The text was updated successfully, but these errors were encountered: