Skip to content
This repository has been archived by the owner on Jan 30, 2020. It is now read-only.

Periodic failures due to "No such file or directory" #900

Closed
bcwaldon opened this issue Sep 15, 2014 · 3 comments · Fixed by #1134
Closed

Periodic failures due to "No such file or directory" #900

bcwaldon opened this issue Sep 15, 2014 · 3 comments · Fixed by #1134
Milestone

Comments

@bcwaldon
Copy link
Contributor

Playing around with deis, I noticed that sometimes when you deis scale up, some of the containers wouldn't launch. Digging through the logs, I found this:

Sep 15 18:19:17 deis-1 fleetd[579]: INFO manager.go:222: Writing systemd unit madras-stickpin_v3.cmd.10-announce.service (1235b)
Sep 15 18:19:17 deis-1 fleetd[579]: INFO reconcile.go:274: AgentReconciler completed task: type=LoadUnit job=madras-stickpin_v3.cmd.10-announce.service reason="unit scheduled here but not loaded"
Sep 15 18:19:17 deis-1 fleetd[579]: ERROR manager.go:113: Failed to start systemd unit madras-stickpin_v3.cmd.10-announce.service: Unit madras-stickpin_v3.cmd.10.service failed to load: No such file or director
y.
Sep 15 18:19:17 deis-1 fleetd[579]: INFO reconcile.go:274: AgentReconciler completed task: type=StartUnit job=madras-stickpin_v3.cmd.10-announce.service reason="unit currently loaded but desired state is launch
ed"

Manually calling systemctl start madras-stickpin_v3.cmd.10-announce.service worked fine, so clearly there's some race condition here between writing the file to disk and starting it in systemd.

@bcwaldon
Copy link
Contributor Author

This may be related to constant daemon-reloading of systemd. We should test this bug out on v0.9.0 since we've stopped reloading so much.

@yaronr
Copy link

yaronr commented Feb 14, 2015

@bcwaldon ETA?

@guruvan
Copy link

guruvan commented Feb 16, 2015

Had several units load in new 557.2 machines and couldn't get them out of state "dead" - /run/fleet had the files in there, but systemctl restart unit failed with no such file or directory errors. At the suggestion of @robszumski I did a systemctl daemon-reload and then reloaded the units, and that got them right up and running. (haven't had a chance to dig through the logs...much skydns to wade to find the right error)

@bcwaldon bcwaldon added this to the v0.10.0 milestone Feb 21, 2015
@bcwaldon bcwaldon modified the milestones: v0.9.1, v0.10.0 Mar 2, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants