You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 30, 2020. It is now read-only.
As touched on in #720 and #866: the agent is not truly recovering on start-up before it starts reconciling:
core-01 ~ # systemctl kill -s SIGKILL fleet
core-01 ~ # Oct 21 23:03:30 core-01 systemd[1]: fleet.service: main process exited, code=killed, status=9/KILL
Oct 21 23:03:30 core-01 systemd[1]: Unit fleet.service entered failed state.
core-01 ~ # systemctl status foo
● foo.service
Loaded: loaded (/run/fleet/units/foo.service; linked-runtime)
Active: active (running) since Tue 2014-10-21 22:54:43 UTC; 8min ago
Main PID: 1864 (sleep)
CGroup: /system.slice/foo.service
└─1864 /bin/sleep 999999999
Oct 21 22:58:17 core-01 systemd[1]: Started foo.service.
core-01 ~ # Oct 21 23:03:40 core-01 systemd[1]: fleet.service holdoff time over, scheduling restart.
Oct 21 23:03:40 core-01 systemd[1]: Stopping fleet daemon...
Oct 21 23:03:40 core-01 systemd[1]: Starting fleet daemon...
Oct 21 23:03:40 core-01 systemd[1]: Started fleet daemon.
Oct 21 23:03:40 core-01 fleetd[1978]: INFO fleet.go:58: Starting fleet version 0.8.3+git
Oct 21 23:03:40 core-01 fleetd[1978]: INFO fleet.go:162: No provided or default config file found - proceeding without
Oct 21 23:03:40 core-01 fleetd[1978]: INFO server.go:153: Establishing etcd connectivity
Oct 21 23:03:40 core-01 fleetd[1978]: INFO server.go:164: Starting server components
Oct 21 23:03:40 core-01 fleetd[1978]: INFO manager.go:262: Writing systemd unit foo.service (41b)
Oct 21 23:03:40 core-01 fleetd[1978]: INFO manager.go:198: Instructing systemd to reload units
Oct 21 23:03:40 core-01 fleetd[1978]: INFO reconcile.go:309: AgentReconciler completed task: type=LoadUnit job=foo.service reason="unit scheduled here but not loaded"
Oct 21 23:03:40 core-01 fleetd[1978]: INFO manager.go:134: Triggered systemd unit foo.service start: job=8409
Oct 21 23:03:40 core-01 fleetd[1978]: INFO reconcile.go:309: AgentReconciler completed task: type=StartUnit job=foo.service reason="unit currently loaded but desired state is launched"
core-01 ~ # systemctl status foo.service
● foo.service
Loaded: loaded (/run/fleet/units/foo.service; linked-runtime)
Active: active (running) since Tue 2014-10-21 22:54:43 UTC; 10min ago
Main PID: 1864 (sleep)
CGroup: /system.slice/foo.service
└─1864 /bin/sleep 999999999
Oct 21 22:58:17 core-01 systemd[1]: Started foo.service.
Oct 21 23:03:40 core-01 systemd[1]: Started foo.service.
It just so happens that the LoadUnit/StartUnit operations are idempotent, so there is no interruption in foo.service, and the illusion of continuity; but really, we should not be invoking LoadUnit/StartUnit and writing the unit to disk again.
The text was updated successfully, but these errors were encountered:
we should start tracking the unit state of these units at the point of recovery; right now, we only add them to the set of subscribed units because we happen to call LoadUnit on them again
ideally we could leverage the unit state that the UnitStateGenerator/UnitStatePublisher collect and cache, rather than fetching it anew in each reconciliation.
As touched on in #720 and #866: the agent is not truly recovering on start-up before it starts reconciling:
It just so happens that the
LoadUnit
/StartUnit
operations are idempotent, so there is no interruption in foo.service, and the illusion of continuity; but really, we should not be invokingLoadUnit
/StartUnit
and writing the unit to disk again.The text was updated successfully, but these errors were encountered: