Skip to content
This repository has been archived by the owner on Jan 30, 2020. It is now read-only.

Race condition in unloading units #1216

Closed
patrickbcullen opened this issue May 7, 2015 · 1 comment
Closed

Race condition in unloading units #1216

patrickbcullen opened this issue May 7, 2015 · 1 comment

Comments

@patrickbcullen
Copy link

I have noticed a race condition in production where job is supposed to be unloaded, but never stops.

I think I found the bug here

a.registry.ClearUnitHeartbeat(unitName)
.

I have an example where the systemd file is gone, but the unit is still running. This happened because the kill command in the systemd file did not succeed. Since this code does not check for error it happily removes all the systemd files. Now that the systemd files are gone I cannot stop it through fleet since fleet cannot call the systemd stop because it already deleted the systemd file.

@bcwaldon
Copy link
Contributor

bcwaldon commented Jul 9, 2015

@patrickbcullen Can you provide an explicit set of repro steps for this?

@jonboulle jonboulle added kind/bug and removed bug labels Sep 24, 2015
dongsupark pushed a commit to dongsupark/fleet that referenced this issue Jul 20, 2016
In Agent.unloadUnit(), if systemdUnitManager.TriggerStop() returns
any error, do not unload systemd units. Otherwise the unit could get
into a state where the unit cannot be stopped via fleet, because the
unit file was already removed.

Fixes coreos#1216
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants