You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are deploying and managing a Trino cluster on Kubernetes, consisting of one coordinator and 14 worker nodes.
Occasionally, Trino needs to execute heavy queries that utilize nearly 100% of the CPU allocated to the pods.
While we haven't encountered any issues so far, we are concerned about the following scenario:
If a worker node reaches 100% of its CPU limit for an extended period, the liveness probe might fail. In such cases, the worker node could be forcibly restarted, even if it is actively processing splits.
From my understanding, the liveness probe relies on /v1/info.
Could you let me know if such an issue could occur? If it is possible, how can we prevent it?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
We are deploying and managing a Trino cluster on Kubernetes, consisting of one coordinator and 14 worker nodes.
Occasionally, Trino needs to execute heavy queries that utilize nearly 100% of the CPU allocated to the pods.
While we haven't encountered any issues so far, we are concerned about the following scenario:
If a worker node reaches 100% of its CPU limit for an extended period, the liveness probe might fail. In such cases, the worker node could be forcibly restarted, even if it is actively processing splits.
From my understanding, the liveness probe relies on /v1/info.
Could you let me know if such an issue could occur? If it is possible, how can we prevent it?
Beta Was this translation helpful? Give feedback.
All reactions