-
-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timeout in GossipReenterAfterSend #2130
Comments
Hi, we recently added a link from the documentation to this article: https://home.robusta.dev/blog/stop-using-cpu-limits Kubernetes is prone to throttle the CPU in this kind of systems, and thus resulting in timeouts. Could you give that a try and see if this fixes the problems in your case? |
We may try but according to our monitoring there is no high CPU activity going on at the time of the issue. |
Could you also give this a try?
And pass this in to the actor system config. The exception you linked above is from the gossip loop and it seems to be timing out when trying to just get gossip state, indicating that the gossip actor is for some reason deadlocked. That specific exception does not look like it could be kubernetes related tbh. |
We already use |
Ah right.
e.g. search for "$gossip" in the logs It would be great to know if the system detects if the gossip actor is timing out on messages |
Or any of these
|
Or this one
|
I've found only this one and only once:
|
@AqlaSolutions I believe #2133 may fix the underlying issue that can cause alot of |
Great news! |
Hi, sometimes we see a spam of such error messages in a log (2-3 per second):
Timeout in GossipReenterAfterSend
. It may continue for hours. There are no other messages in the log preceding this. After an hour of this we also see these messages:They continue for hours and may be until a restart.
We can't reproduce it locally but it regularly happens in kuber on staging and prod servers. Is there a way to debug this? Any help?
The text was updated successfully, but these errors were encountered: