-
Notifications
You must be signed in to change notification settings - Fork 302
ExecStop / ExecStopPost not executed when fleetctl destroy is used #1000
Comments
@flantel fleet should allow systemd to run whatever Exec* commands are defined. I'll look into this |
So with this unit:
I see the following from
@flantel Can you share more details of your units? Maybe your ExecStop is taking a "long time" and systemctl is killing it forcefully? |
@bcwaldon Thanks for looking at this. Your test unit works as you demonstrate. However, the following unit, which loads a container, does not. [Unit] With the above, running fleetctl destroy test.service results in the following:
Thanks. |
@flantel Ah, I think I figured it out. You can reproduce this directly by emulating fleetd directly with systemctl:
If the unit is no longer loaded in systemd when the In master (soon to be v0.9.0), fleetd does not call daemon-reload unless it absolutely needs to. I attempted the same |
On 14 November 2014 18:27, Brian Waldon notifications@github.com wrote:
Regards, -Barry Flanagan
|
Closing as v0.9.0 has been released. Reopen if this is still an issue. |
It looks like we're actually running into this still on fleetctl 0.9.2 journalctl logs
The unit file looks like this (hiding our service discovery info)
In our case we have an internal service that isn't responding to the SIGTERM sent by We're fixing the SIGTERM issue in our system since that's just sloppy but it feels like fleet should be able to guarantee (reasonably) that if it says a service is gone there isn't some process or container just lingering on the system indefinitely. Running |
I'm also seeing the same issue on fleet v0.9.2. This is how my unit file looks:
fleetctl destroy only removes the test1 frontend and leaves test2. |
Definitely worth re-opening: I got two nginx one front one mid, the front is called for both stop and rm, the mid on the other hand, is only called for stop, and for the mid it did a reload. v0.9.2 |
+1 for reopening We are using fleet |
+1 for reopening |
+1 |
@bcwaldon seems that post owner doesn't show recent activity. Could you please reopen to avoid creating a new one? |
Any news on this? Definitely still seeing the problem in 0.10.2 |
+1, We see the same problem with fleet version 0.11.5 |
+1 also reproduced with version 0.11.5 |
Can you guys post your fleet unit files? We have to reproduce the issue. |
Sure. Here is mine (fleet version 0.11.5):
|
Steps to reproduce. Unit file
Run Then run Stop logs:
Destroy logs:
|
@tixxdz |
We can reproduce similar behavior using these bash commands: submit and schedule unit: [Unit]
Description=MyApp
After=docker.service
Requires=docker.service
[Service]
TimeoutStartSec=0
ExecStartPre=-/usr/bin/docker kill busybox1
ExecStartPre=-/usr/bin/docker rm busybox1
ExecStartPre=/usr/bin/docker pull busybox
ExecStart=/usr/bin/docker run --name busybox1 busybox /bin/sh -c "while true; do echo Hello World2; sleep 1; done"
ExecStop=/usr/bin/docker stop busybox1
ExecStopPost=/usr/bin/echo Hello from stop sudo cp myapp.service /etc/systemd/system && sudo systemctl daemon-reload && sudo systemctl start myapp stop unit, destroy it, reload systemd (in parallel): sudo systemctl stop myapp.service & sudo rm /etc/systemd/system/myapp.service & sudo systemctl daemon-reload When systemd reloads stopping units - it doesn't take into consideration defined Fleet side fix should contain code which will wait for the unit "stopped" status, then destroy it. As for systemd, looks like this bug relates to this issue: systemd/systemd#518 |
Updates: We are working on this right now, basically this is a bug in both fleet and systemd. In systemd this is a well known issue. In systemd when doing a "daemon-reload" some "Exec*" may just be skipped since units are replaced in memory and previous commands are not there any more. Please see below. Summary:
On fleet side when doing a destroy we should do our best to wait for at least a stop, but currently I'm new to fleet so my understanding and from reading this documentation https://coreos.com/fleet/docs/latest/using-the-client.html So either we wait for units to stop then do the destroy, or we could add "--block" or "--no-block" switch to destroy command. With "--block" we will be backward compatible but I don't like it. With "--no-block" we will have to change the behaviour and make destory block by default unless you set the "--no-block" switch which is also the same in systemd when you do "systemctl --no-block daemon-reload" and of course use it are your own risk but we have to add some debug messages and make fleet and systemd a bit smarter. However even with this change there will still be bugs here and there, the main issue is when X (unrelated unit, user or whatever) triggers a "daemon-reload" in systemd, if you happen to have changed unit Y and didn't want to do a "daemon-reload" cause you know that some "Exec*" directives will be skipped... then you are also out of luck! yeh X just messed up with Y...
The related issue is here: systemd/systemd#518 So it's not only related to fleet destroy it may happen at anytime, we can improve fleet as noted above but we have to add at least some debug logs and make systemd smarter. So we could check in systemd what section did change or save the line numbers and continue... we are trying to figure out the best solution, at least doing systemctl status changed.service should also tell you that not only you need a "daemon-reload" but perhaps you also need to do a stop unit before that.. and also make it log every time daemon-reload is triggered and some "Exec*" where lost not executed etc. Anyway just to say that we are working on this. Thanks! |
This is a pretty fundamental issue in systemd that is very difficult for us to try mitigate in fleet itself. For posterity, the suggested workaround for now is to run |
I just noticed ExecStopPost is not played if I stop a fleet unit and immediatly destroy it after |
That last observation might be related to #1025 I guess?... |
This is basically blocked by systemd/systemd#518 |
Hi
Using fleetctl version 0.8.3 and CoreOS stable, if I stop a unit with 'fleetctl destroy unit.service' the ExecStop and ExecStopPost are not executed. If I use 'fleetctl stop ...' they are.
Is this expected behaviour? I would have expected (and I think this was the case with earlier versions) that destroy would execute StopExec and StopExecPost.
Thanks.
-Barry Flanagan
The text was updated successfully, but these errors were encountered: