You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this commit we created a default start/stop timeout for systemd units of 10 seconds. In the process of testing VMWare, we have found that kubelet requires more time to start. We have also observed journald crash looping as it needs more than 10 seconds to restart and recover a corrupted journal.
We should just remove the system-wide timeout and only enforce it where it is really necessary.
The text was updated successfully, but these errors were encountered:
How confident are we that moving to 90s (the default) is better for most of our services? Services taking longer than 10s are rare in Bottlerocket. It seems like to do the correct thing without that default, we'd be adding much lower timeouts for many services. It seems simpler and safer to me to keep a lower default that's reasonable for most things and raise it in exceptional cases.
We are partly addressing this with the default reservations for kubelet, but under heavy enough load pretty much any service could take a long time to start. We also want to avoid a death spiral where services are repeatedly started and killed because they're just over the limit, which adds additional resource pressure.
In this commit we created a default start/stop timeout for systemd units of 10 seconds. In the process of testing VMWare, we have found that
kubelet
requires more time to start. We have also observedjournald
crash looping as it needs more than 10 seconds to restart and recover a corrupted journal.We should just remove the system-wide timeout and only enforce it where it is really necessary.
The text was updated successfully, but these errors were encountered: