You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Current situation
We have some one-shot units, like coreos-metadata, that don't get retried if they failed when they ran the first time. They just stay around as failed.
Impact
For coreos-metadata this means that if the metadata service is unavailable when the machine boots, but later becomes available, the machine never recovers.
Ideal future situation
To make this type of units more robust, we should add Restart=on-failure (as well as some delay, like say RestartSec=10 or maybe 1m, unfortunately there's no exponential backoff).
Additionally, we should consider adding RemainAfterExit=yes, so that these units don't get executed more than once it they get pulled in as as wanted/required. Otherwise, it could mean that an existing file gets lost when the server is unavailable later.
The text was updated successfully, but these errors were encountered:
When the metadata server is unavailable for some time the service did
not retry. Also, the service was triggered possibly multiple times
each time another service pulled it in which can cause problems if,
e.g., the service experiences a failure and corrupts the existing file
which could have been kept because rerunning wasn't needed.
Fixesflatcar/Flatcar#311
Current situation
We have some one-shot units, like
coreos-metadata
, that don't get retried if they failed when they ran the first time. They just stay around as failed.Impact
For
coreos-metadata
this means that if the metadata service is unavailable when the machine boots, but later becomes available, the machine never recovers.Ideal future situation
To make this type of units more robust, we should add
Restart=on-failure
(as well as some delay, like sayRestartSec=10
or maybe1m
, unfortunately there's no exponential backoff).Additionally, we should consider adding
RemainAfterExit=yes
, so that these units don't get executed more than once it they get pulled in as as wanted/required. Otherwise, it could mean that an existing file gets lost when the server is unavailable later.The text was updated successfully, but these errors were encountered: