-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When left running for a while, flintlockd's grpc server becomes inaccessible: rpc error: code = Unavailable desc = failed to receive server preface within timeout
#503
Comments
i think the “we don’t close client connections in capmvm” thing is the issue on the frozen host i had 1012 established connections to the flintlockd server. i tried a |
interesting... grpc run as systemd service with open file limits grpc/grpc-go#1261 |
and from @richardcase's parallel experiment:
|
fix is to actually close client connections in capmvm, being pred now |
@richardcase in a previous team we put in a |
@Callisto13 - yes for sure. The screenshots above come from the pprof endpoint i've added to flintlock. Are you thinking that we have our own handler that we serve from |
either or both other metrics could be like total mvm count etc |
What happened:
I have seen this a couple of times now.
On infrastructure which has been left running for a little while (like 24hrs+), further requests to flintlock's service result in this error:
Restarting the server (in my case with
systemctl restart flintlockd.service
on equinix) fixes the issue, and further requests are served just fine.What did you expect to happen:
Flintlock's long running grpc service should not seize up over time.
How to reproduce it:
I have been testing with Equinix a lot these days, so for all I know it could be a problem with that specific setup/environment.
systemctl restart flintlockd.service
Anything else you would like to add:
It should be proven out that this also happens/does not happen on another environment.
Environment:
flintlock v0.1.1-3-g982d429
containerd github.com/containerd/containerd v1.6.6 10c12954828e7c7c9b6e0ea9b0c02b01407d3ae1
/etc/os-release
):Linux host-0 5.13.0-44-generic #49~20.04.1-Ubuntu SMP Wed May 18 18:44:28 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered: