Skip to content
This repository has been archived by the owner on Jan 30, 2020. It is now read-only.

UnitState.UnitHash can be nil #720

Closed
bcwaldon opened this issue Jul 28, 2014 · 3 comments · Fixed by #987
Closed

UnitState.UnitHash can be nil #720

bcwaldon opened this issue Jul 28, 2014 · 3 comments · Fixed by #987
Assignees
Milestone

Comments

@bcwaldon
Copy link
Contributor

Agents attempt to bootstrap themselves on startup based on units that are already loaded locally. When this happens, it will not have a cached UnitHash for the UnitState.

@bcwaldon bcwaldon added the bug label Jul 28, 2014
@jonboulle
Copy link
Contributor

This kinda relates to #730 - can we canonicalise the representation of a unit so we can safely regenerate these hashes within the agent?

@bcwaldon bcwaldon added this to the v0.8.1 milestone Sep 3, 2014
@bcwaldon bcwaldon modified the milestones: v0.8.2, v0.8.1 Sep 12, 2014
@bcwaldon bcwaldon modified the milestones: v0.9.0, v0.8.4 Oct 20, 2014
@jonboulle
Copy link
Contributor

Agents attempt to bootstrap themselves on startup based on units that are already loaded locally.

@bcwaldon as far as I can tell this is not (no longer?) actually true. Consider the following with 0.8.2:

core-01 ~ # fleetctl list-units
UNIT        MACHINE         ACTIVE  SUB
foo.service 69aee328.../10.0.2.15   active  running
core-01 ~ # systemctl kill -s SIGKILL fleet

...

Oct 20 23:47:36 core-01 systemd[1]: fleet.service: main process exited, code=killed, status=9/KILL
Oct 20 23:47:36 core-01 systemd[1]: Unit fleet.service entered failed state.
Oct 20 23:47:47 core-01 systemd[1]: fleet.service holdoff time over, scheduling restart.
Oct 20 23:47:47 core-01 systemd[1]: Stopping fleet daemon...
Oct 20 23:47:47 core-01 systemd[1]: Starting fleet daemon...
Oct 20 23:47:47 core-01 systemd[1]: Started fleet daemon.
Oct 20 23:47:47 core-01 fleetd[2322]: INFO fleet.go:144: No provided or default config file found - proceeding without
Oct 20 23:47:47 core-01 fleetd[2322]: INFO server.go:137: Establishing etcd connectivity
Oct 20 23:47:47 core-01 fleetd[2322]: INFO server.go:148: Starting server components
Oct 20 23:47:47 core-01 fleetd[2322]: INFO engine.go:149: Engine leadership acquired
Oct 20 23:47:47 core-01 fleetd[2322]: INFO manager.go:218: Writing systemd unit foo.service (39b)
Oct 20 23:47:47 core-01 fleetd[2322]: INFO manager.go:142: Instructing systemd to reload units
Oct 20 23:47:47 core-01 fleetd[2322]: INFO reconcile.go:274: AgentReconciler completed task: type=LoadUnit job=foo.service reason="unit scheduled here but not loaded"
Oct 20 23:47:47 core-01 fleetd[2322]: INFO manager.go:78: Triggered systemd unit foo.service start: job=11150
Oct 20 23:47:47 core-01 fleetd[2322]: INFO reconcile.go:274: AgentReconciler completed task: type=StartUnit job=foo.service reason="unit currently loaded but desired state is launched"

Note that the AgentReconciler is creating both LoadUnit and StartUnit tasks. Now, it just so happens that calling StartUnit happily passes, so foo.service keeps running and gives the illusion of continuity with the previous fleet agent (i.e. that it's "recovered"), but really it hasn't.

@bcwaldon
Copy link
Contributor Author

Yes, the description of the ticket is off a bit. Currently, the first reconciliation results in all unit files found in /var/run/fleet/units seeding the first "current state" calculation with nil hashes and inactive target states. A simple preload step on Agent startup to seed the Agent's cache of unit hashes and target states (inactive vs loaded) would solve the problem here.

Side note - reading the unit files verbatim from /var/run/fleet/units obviously won't give us the actual loaded unit files. We need to ask systemd over dbus what the effective unit files are since there could be drop-ins in effect. Unfortunately, fleet does not have the power to rectify this situation as it would be dangerous to allow it to remove files outside of its workspace that it did not place initially.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants