-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timeout in HTTP::Server
spec with preview_mt
#13211
Comments
The problems are easily reproduced on local, though I never get the 15s timeout. During some run I either get an infinite hang ( Indeed, this is a race condition induced by MT: we're listening and closing a server from different fibers that happen to be run on different threads, and are no longer synchronized, so a spec suddenly closes a server while it's being started, ... |
To reproduce: $ bin/crystal build spec/std/http/server/server_spec.cr -Dpreview_mt Then repeatedly run |
This is hard to fix properly. So far only adding a few |
The core of the issue is a synchronization issue between fibers that test edge cases in HTTP::Server, for example what's happening after closing the fiber (from another fiber) while we have no guarantee that the server is actually listening. I couldn't find a proper way to fix the issue. The only solution that fixed the reproducibility of the issue was adding a one millisecond sleep after starting the server. closes crystal-lang#13211
The core of the issue is a synchronization issue between fibers that test edge cases in HTTP::Server, for example what's happening after closing the fiber (from another fiber) while we have no guarantee that the server is actually listening. I couldn't find a proper way to fix the issue. The only solution that fixed the reproducibility of the issue was adding a one millisecond sleep after starting the server. closes crystal-lang#13211
Some
HTTP::Server
specs occasionally time out in CI after reaching the 15 second timout.Example:
HTTP::Server #bind fails after listen
(Duration (p95): 11ms, 98% success rate)HTTP::Server #listen fails after listen
(Duration (p95): 7ms, 99% success rate)HTTP::Server doesn't raise on accept after close #2692
(Duration (p95): 8s, 99% success rate)Stats from https://app.circleci.com/insights/github/crystal-lang/crystal/workflows/nightly_release/tests?branch=master
We could consider increasing the timeout similar to #13043. But that would just be ignoring the symptoms. It really shouldn't take that long to execute these specs. The 95-percentile of the duration being several orders of magnitude below the timeout confirms that.
Maybe there is some kind of deadlock or perhaps all available threads are saturated for such a long time. It's flaky. We need to investigate what's actually going on and figure out what can be done to improve this.
The text was updated successfully, but these errors were encountered: