Incorporate health into gossip #5326

christophermaier · 2018-07-12T18:51:44Z

Health checks currently do not participate in the Habitat network; they're just run on a timer on the side of the main service loop, and report information only via the HTTP gateway. Other services in the Habitat network that depend on an unhealthy service have no way of knowing whether that service is actually healthy.

We'll need to think about how to best do this; too eagerly broadcasting that a service is unhealthy could have cascading effects, particularly if the failing health check is only a transient issue; we should have some kind of threshold like "the last X health checks failed; it's officially Unhealthy". This is basically the Service-level analog of the SWIM suspicion mechanism.

We'll also need to think of how to best expose this in templating data so dependent services can take advantage of it. As stated in #5325, we currently conflate presence of a Supervisor with the presence/health of the services running on that Supervisor. In our templating data, we currently only present service group members that are either "alive" or "suspect" (these are, of course, Supervisor-level states, and not Service-level states); we can probably just flip this over to using presence (#5325) and / or health (this issue) and preserve the desired semantics (and actually be correct about it 😄 )

We may also want to expose some service-level runtime configuration options to control the frequency and threshold of checking: hab svc load foo/bar --check-every=20s --unhealthy-after=3, or similar. Right now, health checks occur with a hard-coded 30 second period. We could add additional metadata to packages, but that may be to constraining; we'd likely still want to be able to override at runtime.

The text was updated successfully, but these errors were encountered:

jamessewell · 2018-07-16T01:41:54Z

This is a great idea - I think if Habitat wants health checks to feel like less of a bolt-on it needs something like this.

Could implementing this help with the ground work of allowing Health Checks to be used for triggering a Leader/Follower topology failover as well (#3249?

christophermaier · 2018-07-16T14:04:34Z

@jamessewell Yes, for sure this would be fundamental for #3249.

stale · 2020-04-02T23:10:52Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. We value your input and contribution. Please leave a comment if this issue still affects you.

christophermaier · 2020-05-05T14:30:58Z

Still needed.

stale · 2022-08-13T19:35:03Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. We value your input and contribution. Please leave a comment if this issue still affects you.

stale · 2023-10-15T15:36:31Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. We value your input and contribution. Please leave a comment if this issue still affects you.

christophermaier added A-supervisor labels Jul 12, 2018

christophermaier added this to the 1.0 Supervisor milestone Jul 12, 2018

This was referenced Jul 12, 2018

Take service readiness into account #5327

Open

Update to allow postrun to only run after a successful HealthCheck #5331

Closed

jeremymv2 mentioned this issue Nov 29, 2018

[svc health check] allow configurable interval between checks. #5890

Merged

4 tasks

dmccown modified the milestones: 1.0 Supervisor (Planning), 1.0 Supervisor Dec 11, 2018

stale bot added the Stale label Apr 2, 2020

stale bot removed the Stale label May 5, 2020

davidMcneil mentioned this issue May 8, 2020

Improve service rolling update #7576

Closed

christophermaier added Focus:Supervisor Related to the Habitat Supervisor (core/hab-sup) component Type: Bug Issues that describe broken functionality and removed A-supervisor labels Jul 24, 2020

rahulgoel1 removed V-sup labels Jul 23, 2021

stale bot added the Stale label Oct 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorporate health into gossip #5326

Incorporate health into gossip #5326

christophermaier commented Jul 12, 2018 •

edited by stidhamlisa

Loading

jamessewell commented Jul 16, 2018

christophermaier commented Jul 16, 2018

stale bot commented Apr 2, 2020

christophermaier commented May 5, 2020

stale bot commented Aug 13, 2022

stale bot commented Oct 15, 2023

Incorporate health into gossip #5326

Incorporate health into gossip #5326

Comments

christophermaier commented Jul 12, 2018 • edited by stidhamlisa Loading

jamessewell commented Jul 16, 2018

christophermaier commented Jul 16, 2018

stale bot commented Apr 2, 2020

christophermaier commented May 5, 2020

stale bot commented Aug 13, 2022

stale bot commented Oct 15, 2023

christophermaier commented Jul 12, 2018 •

edited by stidhamlisa

Loading