-
-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connection hangs forever if the WebSocket server just sends a message and close #7
Comments
Wasted 1.5 days on this 🥹 If anyone has insights/clues about this, it would be absolutely welcome. |
@frankie567 was looking into this. checked with a custom ws server impl could not reproduce. is it possible this is happening only with the test server? |
Yes, highly possible. Actually, if you're looking at the test setup, I'm testing against the two WebSockets implementations of Uvicorn, Lines 35 to 37 in e5afdb1
Lines 64 to 66 in e5afdb1
If I remember correctly, the issue arises only with The thing is, I'm even not really sure this problem has consequences in a real context other than unit tests 🤔 |
Data point: I couldn't replicate this as an issue. Used the example given above, and ran the server both with |
Thank you for having a look into this, Tom! Really appreciate it 🙂 Actually, yes, I was able to reproduce it quite consistently on my local macOS or even a Linux VM on GitHub codespaces. Worth noting that it happens only with Here is the Uvicorn log with $ uvicorn server:app --ws wsproto --log-level trace
INFO: Started server process [13670]
INFO: Waiting for application startup.
TRACE: ASGI [1] Started scope={'type': 'lifespan', 'asgi': {'version': '3.0', 'spec_version': '2.0'}}
TRACE: ASGI [1] Receive {'type': 'lifespan.startup'}
TRACE: ASGI [1] Send {'type': 'lifespan.startup.complete'}
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
TRACE: 127.0.0.1:45048 - HTTP connection made
TRACE: 127.0.0.1:45048 - Upgrading to WebSocket
TRACE: 127.0.0.1:45048 - WebSocket connection made
TRACE: 127.0.0.1:45048 - ASGI [2] Started scope={'type': 'websocket', 'asgi': {'version': '3.0', 'spec_version': '2.3'}, 'http_version': '1.1', 'scheme': 'ws', 'server': ('127.0.0.1', 8000), 'client': ('127.0.0.1', 45048), 'root_path': '', 'path': '/ws', 'raw_path': b'/ws', 'query_string': b'', 'headers': '<...>', 'subprotocols': []}
TRACE: 127.0.0.1:45048 - ASGI [2] Receive {'type': 'websocket.connect'}
TRACE: 127.0.0.1:45048 - ASGI [2] Send {'type': 'websocket.accept', 'subprotocol': None, 'headers': '<...>'}
INFO: ('127.0.0.1', 45048) - "WebSocket /ws" [accepted]
TRACE: 127.0.0.1:45048 - ASGI [2] Send {'type': 'websocket.send', 'text': '<3 chars>'}
TRACE: 127.0.0.1:45048 - ASGI [2] Send {'type': 'websocket.close', 'code': 1000, 'reason': ''}
TRACE: 127.0.0.1:45048 - ASGI [2] Completed
TRACE: 127.0.0.1:45048 - WebSocket connection lost And here it is with $ uvicorn server:app --ws websockets --log-level trace
INFO: Started server process [20730]
INFO: Waiting for application startup.
TRACE: ASGI [1] Started scope={'type': 'lifespan', 'asgi': {'version': '3.0', 'spec_version': '2.0'}}
TRACE: ASGI [1] Receive {'type': 'lifespan.startup'}
TRACE: ASGI [1] Send {'type': 'lifespan.startup.complete'}
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
TRACE: 127.0.0.1:37170 - HTTP connection made
TRACE: 127.0.0.1:37170 - Upgrading to WebSocket
DEBUG: = connection is CONNECTING
TRACE: 127.0.0.1:37170 - WebSocket connection made
DEBUG: < GET /ws HTTP/1.1
DEBUG: < host: localhost:8000
DEBUG: < accept: */*
DEBUG: < accept-encoding: gzip, deflate
DEBUG: < user-agent: python-httpx/0.23.1
DEBUG: < connection: upgrade
DEBUG: < upgrade: websocket
DEBUG: < sec-websocket-key: xlgjFww4etMG4awUv2d4dA==
DEBUG: < sec-websocket-version: 13
TRACE: 127.0.0.1:37170 - ASGI [2] Started scope={'type': 'websocket', 'asgi': {'version': '3.0', 'spec_version': '2.3'}, 'http_version': '1.1', 'scheme': 'ws', 'server': ('127.0.0.1', 8000), 'client': ('127.0.0.1', 37170), 'root_path': '', 'path': '/ws', 'raw_path': b'/ws', 'query_string': b'', 'headers': '<...>', 'subprotocols': []}
TRACE: 127.0.0.1:37170 - ASGI [2] Receive {'type': 'websocket.connect'}
TRACE: 127.0.0.1:37170 - ASGI [2] Send {'type': 'websocket.accept', 'subprotocol': None, 'headers': '<...>'}
INFO: ('127.0.0.1', 37170) - "WebSocket /ws" [accepted]
TRACE: 127.0.0.1:37170 - ASGI [2] Send {'type': 'websocket.send', 'text': '<3 chars>'}
DEBUG: > HTTP/1.1 101 Switching Protocols
DEBUG: > Upgrade: websocket
DEBUG: > Connection: Upgrade
DEBUG: > Sec-WebSocket-Accept: ByUvCi3KAKmU1nBfsj34T/RUgbQ=
DEBUG: > date: Mon, 05 Dec 2022 12:37:51 GMT
DEBUG: > server: uvicorn
INFO: connection open
DEBUG: = connection is OPEN
DEBUG: > TEXT 'FOO' [3 bytes]
TRACE: 127.0.0.1:37170 - ASGI [2] Send {'type': 'websocket.close', 'code': 1000, 'reason': ''}
DEBUG: = connection is CLOSING
DEBUG: > CLOSE 1000 (OK) [2 bytes]
DEBUG: < CLOSE 1000 (OK) [2 bytes]
DEBUG: x half-closing TCP connection
TRACE: 127.0.0.1:37170 - WebSocket connection lost
DEBUG: = connection is CLOSED
TRACE: 127.0.0.1:37170 - ASGI [2] Completed
INFO: connection closed EDIT: After trying other things, it seems it happens only when running Uvicorn with $ uvicorn server:app --ws wsproto --log-level trace --loop asyncio
INFO: Started server process [17282]
INFO: Waiting for application startup.
TRACE: ASGI [1] Started scope={'type': 'lifespan', 'asgi': {'version': '3.0', 'spec_version': '2.0'}}
TRACE: ASGI [1] Receive {'type': 'lifespan.startup'}
TRACE: ASGI [1] Send {'type': 'lifespan.startup.complete'}
INFO: Application startup complete.
INFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)
TRACE: 127.0.0.1:56500 - HTTP connection made
TRACE: 127.0.0.1:56500 - Upgrading to WebSocket
TRACE: 127.0.0.1:56500 - WebSocket connection made
TRACE: 127.0.0.1:56500 - ASGI [2] Started scope={'type': 'websocket', 'asgi': {'version': '3.0', 'spec_version': '2.3'}, 'http_version': '1.1', 'scheme': 'ws', 'server': ('127.0.0.1', 8000), 'client': ('127.0.0.1', 56500), 'root_path': '', 'path': '/ws', 'raw_path': b'/ws', 'query_string': b'', 'headers': '<...>', 'subprotocols': []}
TRACE: 127.0.0.1:56500 - ASGI [2] Receive {'type': 'websocket.connect'}
TRACE: 127.0.0.1:56500 - ASGI [2] Send {'type': 'websocket.accept', 'subprotocol': None, 'headers': '<...>'}
INFO: ('127.0.0.1', 56500) - "WebSocket /ws" [accepted]
TRACE: 127.0.0.1:56500 - ASGI [2] Send {'type': 'websocket.send', 'text': '<3 chars>'}
TRACE: 127.0.0.1:56500 - ASGI [2] Send {'type': 'websocket.close', 'code': 1000, 'reason': ''}
TRACE: 127.0.0.1:56500 - ASGI [2] Completed
TRACE: 127.0.0.1:56500 - WebSocket connection lost So it seems we hit weird asyncio bugs because of implementation differences? 🤔 |
Ug. Yes - confirmed. It looks to me like the issue is something to do with the background recieve handling. I'll give my own perspective on this, which you might or might not find useful...
Anyways, don't feel like you need to act on the above, just sharing a different approach, which avoids having to try to get clever with |
Thanks for the insights, Tom! I'll think about you suggest. Actually, I had an implementation that was approximately like this at some point, but wasn't sure if it was the best thing to do in terms of developer experience. Maybe I should reconsider! |
Ah yeah, that looks pretty neat. Could be that just leaving ping/pongs up to the server to initiate would be perfectly okay in practice. 🤷♂️ |
Maybe 🤔 Hello, @Kludex 👋 Sorry to summon you like this, but I remember that you asked for the |
At that point... I just wanted to use a simple |
Was any progress made on this? Have this library's maintainers made a decision to try something like:
in order to try and get this issue resolved? |
@skewty Do you experience this behavior in a "real" use-case (i.e. outside an automated unit test)? |
I may have but I refactored so much other code I can't be certain anymore. Also can't reproduce in production so easily anymore. Long story short.. Only one of two servers that communicated together over websocket was on UPS power.. oops! (not our fault / scope luckily) During a power outage, one server went down (obviously), when it came up again the websocket connection never recovered. That server now plugs into UPS power. Additionally, given the timeline I switched to "websockets" library. To be clear I can't say for certain it was http-ws' fault as another issue was identified in the code as well. |
I've identified the root cause of this issue. By trying out the fix branch I forked for httpx-ws (https://github.com/MtkN1/httpx-ws/tree/patch-trailing-data), the reproducible code (#7 (comment)) no longer hangs! You can install it with: pip install git+https://github.com/MtkN1/httpx-ws.git@patch-trailing-data Here's the diff for the fix branch: main...MtkN1:httpx-ws:patch-trailing-data The reason the reproducible code hangs is that httpx-ws does not handle trailing data. diff --git a/httpx_ws/_api.py b/httpx_ws/_api.py
index ded6b87..1c186cc 100644
--- a/httpx_ws/_api.py
+++ b/httpx_ws/_api.py
@@ -105,9 +105,10 @@ class WebSocketSession:
float
] = DEFAULT_KEEPALIVE_PING_TIMEOUT_SECONDS,
subprotocol: typing.Optional[str] = None,
+ trailing_data: bytes = b"",
) -> None:
self.stream = stream
- self.connection = wsproto.connection.Connection(wsproto.ConnectionType.CLIENT)
+ self.connection = wsproto.connection.Connection(wsproto.ConnectionType.CLIENT, trailing_data=trailing_data)
self.subprotocol = subprotocol
self._events: queue.Queue[
@@ -1054,6 +1055,15 @@ def _connect_ws(
if response.status_code != 101:
raise WebSocketUpgradeError(response)
+ (
+ trailing_data,
+ _,
+ ) = (
+ response.stream._stream._httpcore_stream._status.connection._connection._h11_state.trailing_data
+ )
+ if trailing_data:
+ print("trailing_data:", trailing_data)
+
subprotocol = response.headers.get("sec-websocket-protocol")
session = WebSocketSession(
@@ -1063,6 +1073,7 @@ def _connect_ws(
keepalive_ping_interval_seconds=keepalive_ping_interval_seconds,
keepalive_ping_timeout_seconds=keepalive_ping_timeout_seconds,
subprotocol=subprotocol,
+ trailing_data=trailing_data,
)
yield session
session.close() It seems in the server code with uvicorn and uvloop, all three byte sequences, "Websocket accept", "FOO text frame", and "Close frame", are synchronously sent over the network. On the client side, while "Websocket accept" is processed as a response header, the remaining "FOO text frame" and "Close frame" are treated as trailing data. Since the current httpx-ws implementation doesn't handle trailing data, Without uvloop, it's likely that the three byte sequences are sent asynchronously, which might explain why the current httpx-ws implementation works. As the fix code shows, h11 and wsproto have implementations for trailing data. Here are their references: h11: https://h11.readthedocs.io/en/latest/api.html?highlight=trailing_data#h11.Connection.trailing_data This issue aligns with what I wrote in the HTTPCore discussion. Additionally, the private API mentioned in the fix code ( Next steps:
|
This appears to be the best course of action from my POV, except... "wait for". The pull request won't write itself. I'm in a position to guide someone through implementing and resolving this. I'd suggest starting with a draft pull request in https://github.com/encode/httpcore/blob/master/httpcore/_async/http11.py#L118 |
Thank you @MtkN1 for the investigation and detailed explanations! That's very impressive and appreciated 🙏 I wasn't aware of this @tomchristie If you're open to this, I'm ready to pledge money via Polar to help solve encode/httpcore#872. |
Thanks for the reactions! I'll try to work on encode/httpcore#872 when I have some time 🙂 |
@frankie567 - That's interesting thanks. We've not committed to Polar yet, but I do like the work you're/they're doing. |
The fix for encode/httpcore#872 has been merged! 🎉 The trailing data issue I described is now being received via |
That's truly awesome @MtkN1, thanks 🙏 I'll wait for the next release of httpcore before closing this 🙂 I also need to remove the "sleep" workaround in unit tests. |
httpcore release will happen today: encode/httpcore#892. |
Thank you all for your help 🙏 @all-contributors |
@all-contributors Annoying bot... |
@tomchristie already contributed before to bug, research We had trouble processing your request. Please try again later. |
I couldn't determine any contributions to add, did you specify any contributions? I've put up a pull request to add @Kludex! 🎉 I've put up a pull request to add @tomchristie! 🎉 I've put up a pull request to add @MtkN1! 🎉 |
Bug fixes and improvements -------------------------- * Always disable automatic keepalive ping when using ASGI transport. Thanks @dmontagu and @Kludex 🎉 * Bump dependencies: * `httpcore>=1.0.4` - Solves #7, thanks to @tomchristie and @MtkN1 🎉
Bug fixes and improvements -------------------------- * Always disable automatic keepalive ping when using ASGI transport. Thanks @dmontagu and @Kludex 🎉 * Bump dependencies: * `httpcore>=1.0.4` - Solves frankie567#7, thanks to @tomchristie and @MtkN1 🎉
Describe the bug
Connection hangs forever if the WebSocket server just sends a message and then close.
The client waits forever for a message, missing the only message that was sent by the server.
To Reproduce
Server code
Client code
Expected behavior
The client should be able to read the message and close properly.
Configuration
Additional context
asyncio.sleep
or areceive_*
operation on the server sidewebsockets
libraryThe text was updated successfully, but these errors were encountered: