Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeEncodeError: 'utf-8' codec can't encode character '\udca0' in position 10917: surrogates not allowed #4

Closed
baltpeter opened this issue Oct 7, 2024 · 2 comments
Assignees

Comments

@baltpeter
Copy link
Member

I encountered the below error for example for this request: https://data.tweasel.org/data/requests/monkey-april-2024,88512

Traceback (most recent call last):
  File "/home/benni/.local/share/pipx/venvs/datasette/lib/python3.12/site-packages/datasette/app.py", line 1357, in route_path
    await response.asgi_send(send)
  File "/home/benni/.local/share/pipx/venvs/datasette/lib/python3.12/site-packages/datasette/utils/asgi.py", line 342, in asgi_send
    body = body.encode("utf-8")
           ^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'utf-8' codec can't encode character '\udca0' in position 10917: surrogates not allowed
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/home/benni/.local/share/pipx/venvs/datasette/lib/python3.12/site-packages/datasette/app.py", line 1357, in route_path
    await response.asgi_send(send)
  File "/home/benni/.local/share/pipx/venvs/datasette/lib/python3.12/site-packages/datasette/utils/asgi.py", line 342, in asgi_send
    body = body.encode("utf-8")
           ^^^^^^^^^^^^^^^^^^^^
UnicodeEncodeError: 'utf-8' codec can't encode character '\udca0' in position 10917: surrogates not allowed

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/benni/.local/share/pipx/venvs/datasette/lib/python3.12/site-packages/uvicorn/protocols/http/h11_impl.py", line 398, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/benni/.local/share/pipx/venvs/datasette/lib/python3.12/site-packages/uvicorn/middleware/proxy_headers.py", line 70, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/benni/.local/share/pipx/venvs/datasette/lib/python3.12/site-packages/datasette/utils/asgi.py", line 445, in __call__
    return await self.asgi(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/benni/.local/share/pipx/venvs/datasette/lib/python3.12/site-packages/datasette/utils/asgi.py", line 164, in __call__
    await self.app(scope, receive, send)
  File "/home/benni/.local/share/pipx/venvs/datasette/lib/python3.12/site-packages/asgi_csrf.py", line 108, in app_wrapped_with_csrf
    await app(scope, receive, wrapped_send)
  File "/home/benni/.local/share/pipx/venvs/datasette/lib/python3.12/site-packages/datasette/app.py", line 1311, in __call__
    return await self.route_path(scope, receive, send, path)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/benni/.local/share/pipx/venvs/datasette/lib/python3.12/site-packages/datasette/app.py", line 1372, in route_path
    return await self.handle_exception(request, send, exception)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/benni/.local/share/pipx/venvs/datasette/lib/python3.12/site-packages/datasette/app.py", line 1482, in handle_exception
    await response.asgi_send(send)
  File "/home/benni/.local/share/pipx/venvs/datasette/lib/python3.12/site-packages/datasette/utils/asgi.py", line 333, in asgi_send
    await send(
  File "/home/benni/.local/share/pipx/venvs/datasette/lib/python3.12/site-packages/asgi_csrf.py", line 104, in wrapped_send
    await send(event)
  File "/home/benni/.local/share/pipx/venvs/datasette/lib/python3.12/site-packages/uvicorn/protocols/http/h11_impl.py", line 487, in send
    raise RuntimeError(msg % message_type)
RuntimeError: Expected ASGI message 'http.response.body', but got 'http.response.start'.
@baltpeter baltpeter self-assigned this Oct 7, 2024
@baltpeter
Copy link
Member Author

While the async-yness doesn't make this fun to debug, I was able to narrow it down to our own long-cell.py plugin.

The problem is that the user agent header of this request does indeed contain a \udca for some reason.

user-agent: Mozilla/5.0 (Linux; Android 11; sdk_gphone_x86_64 Build/RSR1.210722.013.A2; ) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/83.0.4103.106 Mobile Safari/537.36\udca0TOOMICS_GLOBAL_ANDROID_1.5.9

And with our custom textarea unlike the raw JSON output, we are (for reasons I didn't fully reconstruct, but that I really don't care about either) hitting a body.encode("utf-8") in datasette/utils/asgi.py.

Since Python doesn't like the \udca when encoding to UTF-8, it complains. This can be solved by preemptively doing a little .encode("utf-8", "backslashreplace").decode("utf-8") dance.

@baltpeter
Copy link
Member Author

baltpeter commented Oct 7, 2024

Should be fixed by #5.

@zner0L zner0L closed this as completed in 55a7d19 Oct 7, 2024
zner0L added a commit that referenced this issue Oct 7, 2024
Fixes #4: Avoid UnicodeEncodeError (surrogates not allowed)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant