Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid unsafely running effects when handling WS requests #2852

Merged
merged 6 commits into from
May 21, 2024

Conversation

kyri-petrou
Copy link
Collaborator

@kyri-petrou kyri-petrou commented May 19, 2024

When handling websocket connections, we currently impurely run the ZIO effect within a region that is already running in ZIO's executor. This can be quite bad as it can lead to a deadlock if the effect contains an async computation and there are no ZScheduler workers available to execute the effect. This has already introduced some flakiness to Caliban's test suite (see here)

This PR fixes it by changing the upgradeToWebSocket method to return an effect instead, which is then composed with the main effect that is running the request.

EDIT: After some further investigation, it appears that the flakiness issue experienced in Caliban's CI was not related to running the effect unsafely (although I kept that change because I think it's better to do it this way). The issue appears that it was related to the dispatch method in WebSocketAppHandler. Since NettyRuntime#runUninterruptible forks the effect, in some rare cases messages would be offered to the queue in the wrong order.

Comment on lines +71 to +72
case Some(promise) => promise.unsafe.done(Exit.fail(cause))
case None => ()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also noticed this bug where the promise wasn't being fulfilled with the exception as it returned an effect which wasn't then being run

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 67.74194% with 10 lines in your changes are missing coverage. Please review.

Project coverage is 64.57%. Comparing base (21d5324) to head (f3b41ba).
Report is 9 commits behind head on main.

Files Patch % Lines
...a/zio/http/netty/server/ServerInboundHandler.scala 72.41% 8 Missing ⚠️
...ain/scala/zio/http/netty/WebSocketAppHandler.scala 0.00% 2 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2852      +/-   ##
==========================================
+ Coverage   64.51%   64.57%   +0.05%     
==========================================
  Files         156      156              
  Lines        9085     9103      +18     
  Branches     1586     1590       +4     
==========================================
+ Hits         5861     5878      +17     
- Misses       3224     3225       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines +50 to +57
// IMPORTANT: Offering to the queue must be run synchronously to avoid messages being added in the wrong order
// Since the queue is unbounded, this will not block the event loop
// TODO: We need to come up with a design that doesn't involve running an effect to offer to the queue
zExec.unsafeRunSync(queue.offer(event.map(frameFromNetty)))
onComplete match {
case Some(promise) if close => promise.unsafe.done(Exit.succeed(ChannelState.Invalid))
case _ => ()
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the most important change, we need to ensure that messages are offered to the queue synchronously to avoid them being added in the wrong order

@987Nabil 987Nabil merged commit ad123a4 into zio:main May 21, 2024
31 of 33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants