-
-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Server race condition: connection ack + first subscribe #501
Server race condition: connection ack + first subscribe #501
Comments
Hey there, thanks for reporting! The way I see it is that the only possible solution inside graphql-ws is to eagerly set the While I think about the implications of this change, a potential mitigation for you would be to perform the telemetry tracing in a setTimeout yourself and resolve the |
Yeah, I was trying to think of the implications and I wasn't 100% sure either. I think it's okay, but haven't thought through this super deeply. I'll share some of my thoughts though, in case it's helpful:
I'll think about this some more, but I'd be more than happy to send a PR for any solution we think is the answer! |
🎉 This issue has been resolved in version 5.14.1 🎉 The release is available on: Your semantic-release bot 📦🚀 |
🙇 No worries about the delay, appreciate you fixing this! Quite confident it will work, because it's basically the same as what I did locally, except I set |
Sorry for the super long delay on getting back to you, @enisdenjo. Just did some load testing locally and no issues. Again, appreciate the quick response and fix <3 |
We've observed is that we periodically get
[Network] undefined
client-side, which we noticed was tied to a close code of 4401. This status code implies the first subscribe message is being sent before the connection ack. After scattering around many logs, we observed the client receiving the ack and only ever sending the first subscribe after that point.On the server, we see the subscribe happen before
ctx.acknowledged
is set to true here:graphql-ws/src/server.ts
Lines 616 to 632 in 50d5a51
I think busy event loops are to blame here. Do you think it's possible we've done something wrong (see below for more context)? We've resolved the issue locally by setting
ctx.acknowledged = true
in ouronConnect
handler, but I figure there's something we can do ingraphql-ws
itself.Some ideas I had:
ctx.acknowledged = true
and roll that back if the send fails. I think in this instance a misbehaving clients could potentially send a subscribe without receiving the ack.acknowledged
be a promise, andawait
it for relevant messages. As a protective measure, we could either put a timeout around thatawait
or throw if a second message comes in that needs to await the promise.setTimeout(..., 0)
orprocess.nextTick
with the assumption that one event loop tick is enough to resolve this race condition (uncertain if it will be, but happy to test out that there).Debug Information
We're using
graphql-ws
in tandem with@urql/core
to power one of our clients, and we're also using it to power our server too (version 5.14.0).We
makeServer
with a customonConnect
andonSubscribe
handler. We provideserver.opened
server a custom socket, but primarily it's just a thin wrapper around aws
websocket. Looks something like this:After that, we just load test our server and it's pretty easy to get the aforementioned issue.
The text was updated successfully, but these errors were encountered: