post-run hook doesn't re-run on exit 1 #2364

jamessewell · 2017-05-15T00:02:13Z

Running hab from hab-0.22.1-20170509234454-x86_64-linux on CentOS 7.2, deploying into Docker.

It looks like when a unsuccessful code is encountered from the post-run hook script the script is not run again, like we see with init, run etc...

The use case for this would be:

Start the application in run
Perform some once off config which requires the application to be up in post-run

The two options I can see here are block in run until my service becomes available for config, or have the post-run script accept an exit 1 as requiring another post-run pass.

If this is the intended functionality ideas are welcome!

The text was updated successfully, but these errors were encountered:

srenatus · 2017-08-14T12:31:33Z

I, too, have just expected it to be re-run, and was surprised it didn't. So, that's another teeny-tiny data point 😉

jamessewell · 2017-08-14T23:47:54Z

Hi, Yeah I've seen a few people put post-run functionality into health-check with some logic to only run once, not pretty. I've actually got a PR for this which does the following: - Don't run the post-run hook till there has been a successful health check - Re-run the post-run hook until success I'll clean it up and try to get it through today. Cheers, James

…

On Mon, Aug 14, 2017 at 10:31 PM, Stephan Renatus ***@***.***> wrote: I, too, have just expected it to be re-run, and was surprised it didn't. So, that's another teeny-tiny data point 😉 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#2364 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABKRo0fvMACqD-TI-Syd8rFo30k-8oCqks5sYD4ngaJpZM4NajO0> .

rsertelon · 2017-11-26T20:11:09Z

@jamessewell any news on that PR? Is this something still relevant today? thanks!

themightychris · 2018-07-12T23:54:50Z

Relevant case: habitat-sh/core-plans#1674 (comment)

jsirex · 2019-01-23T17:45:24Z

This really must have.

Use-case:

I'm starting a cluster.
But habitat discovery is not happened yet (no members, they are starting)
post-run hook get executed but without any sense - cluster not ready
discovery happened
Service restarted
We are ready for the post-run hook but it will not be executed

Also it is not clear how much to wait in step 3? 1 second, 15 seconds, 8 minutes? Will service actually restart while post-run hook is running? Or hook gets killed? Or service will wait? Too many questions.

Immediate exit with non-zero code is better.

christophermaier · 2019-06-26T20:34:03Z

Taking some questions posed by @davidMcneil and putting them here (along with some thoughts) for posterity and broader discussion. As usual, all this stuff is open to discussion with anyone that has thoughts to contribute!

Should `post-run` only run after a successful health check?

I'd say "no" at this point, since it's conceivable (even likely?) that what is done in post-run could be required for a successful health check.

How many times should `post-run` be tried?

We don't currently have a way of restricting restarts of services, which makes me want something like Erlang's supervisor restart intensity configuration. Implementing something like that in a holistic way seems like a feature on its own, and adding ad-hoc partial throttling right now seems like it'd complicate things necessarily.

Should we use status codes to indicate if it should be rerun?

0: successful do not retry
1: failed retry
2: failed do not retry

That sounds like a great idea, with the observation that the "failed do not retry" case seems linked to the "how many times should we retry" question above.

Should there be a delay between tries or a backoff strategy?

Could be a good idea. It seems like we could ultimately model the post-run execution similar to how we do the health-check (i.e., run repeatedly, with a potentially configurable duration between runs, until we get the outcome we want). That provides a nice place to localize delay and backoff logic. If implemented properly, it would be easy to drop this in later, pending further product investigation (i.e., do we want to allow for these parameters to be customized on a per-service basis, or are they just hard-coded aspects of how the Supervisor operates?) (though @jsirex's comment above suggests that per-service customization could indeed be useful)

Should failure to run a lifecycle hook influence the result of a health check?

Possibly, though I think an argument can be made that if a given hook fails to do something important for the health of a service, then the health check hook should be explicitly checking for that.

Given that, I'm inclined to leave out any explicit connection between other hooks and health checking for right now. If we get user feedback to the contrary, we can certainly revisit it.

Should similar rules apply to other lifecycle hooks?

Probably so, but depending on the specifics of that hook. It would be interesting to see if we could eventually model all hooks as being customized implementations of some general "hook prototype" that has customization points for these behaviors you've mentioned.

christophermaier · 2019-06-26T20:34:14Z

Also, @jsirex's comments above:

Will service actually restart while post-run hook is running? Or hook gets killed? Or service will wait? Too many questions.

suggest that some kind of overall state machine for a service will be important in order to properly track these kinds of transitions.

reset added Bug labels May 19, 2017

eeyun added A-supervisor and removed Supervisor labels Jun 6, 2017

eeyun modified the milestones: Help Wanted, Accepted Minor Jul 11, 2017

christophermaier added the V-sup label Mar 12, 2018

christophermaier assigned christophermaier and unassigned christophermaier Jun 5, 2018

christophermaier modified the milestones: Accepted Minor, 1.0 Supervisor Stability Jun 7, 2018

christophermaier mentioned this issue Jul 10, 2018

Restarting a service when any hook changes is too extreme and leads to unnecessary restarts #5305

Closed

dmccown added the E-less-easy label Nov 30, 2018

christophermaier added the Focus:Supervisor ProcessManagement Related to how the Supervisor manages service processes label Nov 30, 2018

dmccown modified the milestones: 1.0 Supervisor (Planning), 1.0 Supervisor Dec 14, 2018

christophermaier mentioned this issue Dec 14, 2018

Fix existing lifecycle hook logic #5960

Closed

jsirex mentioned this issue Jan 25, 2019

Make builder-minio clustered habitat-sh/builder#885

Merged

davidMcneil self-assigned this Jun 26, 2019

davidMcneil mentioned this issue Jul 2, 2019

Make post-run async and add retry logic #6705

Merged

davidMcneil closed this as completed in #6705 Jul 29, 2019

themightychris mentioned this issue Aug 24, 2019

[zerotier] Add zerotier plan habitat-sh/core-plans#1674

Closed

2 tasks

christophermaier added Focus:Supervisor Related to the Habitat Supervisor (core/hab-sup) component Type: Bug Issues that describe broken functionality and removed A-supervisor labels Jul 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

post-run hook doesn't re-run on exit 1 #2364

post-run hook doesn't re-run on exit 1 #2364

jamessewell commented May 15, 2017 •

edited by stidhamlisa

Loading

srenatus commented Aug 14, 2017

jamessewell commented Aug 14, 2017 via email

rsertelon commented Nov 26, 2017

themightychris commented Jul 12, 2018

jsirex commented Jan 23, 2019

christophermaier commented Jun 26, 2019

christophermaier commented Jun 26, 2019

post-run hook doesn't re-run on exit 1 #2364

post-run hook doesn't re-run on exit 1 #2364

Comments

jamessewell commented May 15, 2017 • edited by stidhamlisa Loading

srenatus commented Aug 14, 2017

jamessewell commented Aug 14, 2017 via email

rsertelon commented Nov 26, 2017

themightychris commented Jul 12, 2018

jsirex commented Jan 23, 2019

christophermaier commented Jun 26, 2019

Should post-run only run after a successful health check?

How many times should post-run be tried?

Should we use status codes to indicate if it should be rerun?

Should there be a delay between tries or a backoff strategy?

Should failure to run a lifecycle hook influence the result of a health check?

Should similar rules apply to other lifecycle hooks?

christophermaier commented Jun 26, 2019

jamessewell commented May 15, 2017 •

edited by stidhamlisa

Loading

Should `post-run` only run after a successful health check?

How many times should `post-run` be tried?