You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 30, 2020. It is now read-only.
fleet does a fine job at reporting unit state in the simple cases, but adding any complexity to the unit lifecycle causes mis-publishing of unit state into etcd.
For example, starting and destroying a single unit will result in all states being published properly. Now imagine seting an ExecStop option that takes a long time to finish (i.e. /usr/bin/sleep 10s). Start it, unload it and immediately start it again. If that unit is scheduled to a different machine the second time it is started, an active state will be published, but 10s later an inactive state will overwrite it.
We need to stop treating units like only one agent can possibly report state for it at a given time. It is incredibly important to know if a unit is still running in some capacity on a node, when it shouldn't be.
Taking systemd's lead, it allows you to visualize processes across a bunch of systemd-nspawn containers like so:
$ systemctl --recursive list-units fleet.service
UNIT LOAD ACTIVE SUB DESCRIPTION
fleet.service loaded active running fleet
smoke0:fleet.service loaded active running fleet.service
smoke1:fleet.service loaded active running fleet.service
smoke2:fleet.service loaded active running fleet.service
smoke3:fleet.service loaded active running fleet.service
The text was updated successfully, but these errors were encountered:
This is a clone of #638.
fleet does a fine job at reporting unit state in the simple cases, but adding any complexity to the unit lifecycle causes mis-publishing of unit state into etcd.
For example, starting and destroying a single unit will result in all states being published properly. Now imagine seting an ExecStop option that takes a long time to finish (i.e. /usr/bin/sleep 10s). Start it, unload it and immediately start it again. If that unit is scheduled to a different machine the second time it is started, an active state will be published, but 10s later an inactive state will overwrite it.
We need to stop treating units like only one agent can possibly report state for it at a given time. It is incredibly important to know if a unit is still running in some capacity on a node, when it shouldn't be.
Taking systemd's lead, it allows you to visualize processes across a bunch of systemd-nspawn containers like so:
The text was updated successfully, but these errors were encountered: