-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to allow postrun to only run after a successful HealthCheck #5331
Conversation
Three changes: - postrun removed from run on init to run exactly once after HealthCheck is OK - re-run postrun after each service Start - changed self.health_check so it gets assigned (it didn't before) This does slightly change the behaviour of post-run - it will now re-run every time the service starts. This can be backed off if needed. Signed-off-by: James Sewell <james.sewell@gmail.com>
Thanks for the pull request! Here is what will happen next:
Thank you for contributing! |
@jamessewell I just got up and running with this, looks good! So far I've verified that:
Some issues I see though:
Neither of these issues are show-stoppers for me though, I'd be happy with this PR getting merged as-is and consider it an improvement in |
Hi Chris,
I think it’s actually doing a first health check quickly, which fails -
then it’s a 30sec standoff till the next one.
If you monitor the health check state it goes
- UNKNOWN
- FAILING
- OK
I’ll have a bit more of a poke - but the propsed changes below are much
better than this solution!
Cheers,
James Sewell
…On Sat, 14 Jul 2018 at 4:39 am, Chris Alfano ***@***.***> wrote:
@jamessewell <https://github.com/jamessewell> I just got up and running
with this, looks good!
So far I've verified that:
- post-run doesn't execute until after the first successful
health_check
- if heatlh_check fails, post-run doesn't run
- if health_check fails initially, and succeeds later, post-run does
run
Some issues I see though:
1. It takes quite a while for the first health_check to run
- With post-run deferred until then this means it takes quite a
while for the service to be finished starting up, while the supervisor
reports it as up
- Perhaps an initial health_check could be run immediately after run
like post-run used to?
2. post-run does not get re-run when a config change causes it to be
recompiled and the service reloaded
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5331 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABKRozF01HbUyWYTh687Gs6M0Eqt9m4uks5uGOlKgaJpZM4VONsE>
.
|
@jamessewell, are you still planning additional changes to this PR, or should it be considered final for review at this point? |
I'm going to close this for now, so that our PR reminders don't think we're ignoring it. Whenever you're ready, feel free to reopen @jamessewell. |
I’m not really sure what to do about this one - I was hoping Chris would
chime back in.
It works, but the other proposed (larger) solution is better.
…On Wed, 1 Aug 2018 at 11:35 pm, baumanj ***@***.***> wrote:
Closed #5331 <#5331>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5331 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABKRo6Z9jovUr5h-z4tNlEJxIrLPERZGks5uMa63gaJpZM4VONsE>
.
|
@christophermaier, can you weigh in? Sorry if I was premature, @jamessewell, I was reading
as an indication I should hold off on review until you had done more. |
Sorry @jamessewell ... I'll get around to reviewing this soon. |
I was just wondering, would it make sense to leave the current This would have the benefit of definitely not breaking any existing services, and allows |
That makes sense - although I do wonder what the point of the old post run
would be apart from backwards compatibility?
…On Fri, 3 Aug 2018 at 12:39 am, Chris Alfano ***@***.***> wrote:
I was just wondering, would it make sense to leave the current post-run
behavior alone and introduce this as a new hook, post-up/post-available/
post-online/post-healthy?
This would have the benefit of definitely not breaking any existing
services, and allows post-run code to play a role in getting a service
into the healthy state which might be important
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5331 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABKRo58N3hJw_ZXTU_Dx5y7P0fRNoYrNks5uMw8UgaJpZM4VONsE>
.
|
@jamessewell Sorry it's taken a while to get to this! I don't want to change the existing contract of Also, given the relatively long time until a health-check initially fires, this could cause services to be in a potentially incomplete state for a long time. This wouldn't be a problem after #5327, and possibly #5326, are implemented, though, since services wouldn't be available to the rest of the network until they're healthy (and, presumably, after their There's a lot of work currently planned around all the lifecycle hooks (#5318) (and I'm currently starting work on them), and I think this PR points out some additional real issues. As is, though, I think the potential for introducing additional instability and breakage is high, so I'm going to close this for now. I appreciate the work and effort you've put in thus far, and apologize for taking as long as I have to give you some feedback on this. |
@christophermaier deferring services being available to the rest of the network doesn't solve the use case here. All the related PRs you linked to are great and related, but they are far broader in scope than the use case at hand here:
I agree it's probably best not to change @jamessewell I think a very simple derivation of this PR would stop this gnarly from spreading and let packages that need this start passing CI again:
|
Posted in Slack, but copied here for visibility / posterity:
|
Three changes:
This does somewhat change the behaviour of postrun, which used to only
run at init - this part can be backed out if needed.