Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

Add Restart=on-failure for maintained systemd units to make them more robust #1298

Closed
invidian opened this issue Jan 5, 2021 · 0 comments · Fixed by #1362
Closed

Add Restart=on-failure for maintained systemd units to make them more robust #1298

invidian opened this issue Jan 5, 2021 · 0 comments · Fixed by #1362
Assignees
Labels
kind/enhancement New feature or request size/m Issues which likely require up to a couple of work days

Comments

@invidian
Copy link
Member

invidian commented Jan 5, 2021

Right now, if for example Packet metadata service is not reachable, coreos-metadata will never converge, so if node reboots during this time, it will never come back.

Even when the metadata service comes back, node will still be stuck until either node is rebooted or service is manually restarted.

We should be able to automate that to make it more robust by adding Restart=on-failure.

Following https://unix.stackexchange.com/a/272650, we should probably add Restart=on-failure and RestartSec=5s (or some other value) to all units with Type=oneshot to make sure they eventually converge and not just give up.

@pothos also suggesting adding RemainAfterExit=yes:

RemainAfterExit=yes is also missing which means that currently the service is executed multiple times, each time when pulled in as wanted/required. It could mean that an existing file gets lost when the server is unavailable later (but that also depends on how afterburn writes this file out, so it may not be a problem right now).

@invidian invidian added kind/enhancement New feature or request size/m Issues which likely require up to a couple of work days labels Jan 5, 2021
@surajssd surajssd added the proposed/next-sprint Issues proposed for next sprint label Jan 20, 2021
@iaguis iaguis removed the proposed/next-sprint Issues proposed for next sprint label Jan 25, 2021
@surajssd surajssd self-assigned this Feb 4, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/enhancement New feature or request size/m Issues which likely require up to a couple of work days
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants