-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(healthcheck) adding visibility to target health status #3232
Conversation
@Tieske I think you'll be interested in discussing this as well, especially the |
Does this response distinguish between targets that have been manually "marked healthy" or "marked unhealthy" vs. those that have been ascertained to be healthy or unhealthy by health checks? (Not suggesting that it should, at least not yet, need to think about it more - just wondering) |
@coopr no, it doesn't -- as listed above, there is a single |
If possible, I would rather have a string (i.e. |
Does anything special need to be in the response to indicate that it is specific to the Kong node that is reporting it? I ask because all other parts of the Admin API will respond with cluster-wide details, while in this case, the details are node-specific. Perhaps healthcheck visibility should be added to https://getkong.org/docs/0.12.x/admin-api/#information-routes instead (or in addition)? |
kong/api/routes/upstreams.lua
Outdated
active_n = active_n + 1 | ||
active[active_n] = entry | ||
GET = function(self, dao_factory) | ||
self.params.active = nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a result of a copy/paste or does unsetting it have any influence over one of the below methods? None of them seems to at a glance.
kong/api/routes/upstreams.lua
Outdated
entry.health = health_info[entry.target] or "not in balancer" | ||
else | ||
-- Healthchecks disabled for this upstream | ||
entry.health = ngx.null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we here distinguish cases where the above get_upstream_health
returned an error, from cases where the healthchecks were disabled? I feel like logging an error in Kong might not be enough, the user probably should be informed that "something went wrong" while retrieving the health of those particular targets.
I think @coopr's point is particularly interesting - this is one case of node-specific data endpoint (but others already exist, like |
24be770
to
c43fc32
Compare
@coopr @thibaultcha I thought about that as well. Now that #3234 is merged and we have a node id available, I added added the node id as a wrapping key to the response as a whole, indicating that the response pertains to that node: {
"a2ad0188-c249-4f3d-b62b-f3605324031c": {
"data": [
{
"created_at": 1519078028938,
"health": "healthy",
"id": "37135a85-9752-4fbb-86ac-513ea8f8335c",
"target": "127.0.0.1:2112",
"upstream_id": "5136b1f1-1f06-4c6f-860c-cdc629044fbb",
"weight": 100
}
],
"total": 1
}
} |
I'm a bit torn between that (which makes it explicit that the data is "scoped" by the node) or just adding a |
I'm still leaning towards incorporating this information into the existing node-specific endpoints that @thibaultcha mentioned #3232 (comment) |
I like it better too but I am not sure if now is the time to make such paradigm shifts in the Admin API yet... I think it makes sense eventually, but that such a change is beyond the scope of this PR (or current items at work)? My favorite (for now) would be: {
"node_id": "a2ad0188-c249-4f3d-b62b-f3605324031c",
"data": [
{
"created_at": 1519078028938,
"health": "healthy",
"id": "37135a85-9752-4fbb-86ac-513ea8f8335c",
"target": "127.0.0.1:2112",
"upstream_id": "5136b1f1-1f06-4c6f-860c-cdc629044fbb",
"weight": 100
}
],
"total": 1
} |
The logic of having the As an alternative to @thibaultcha's proposal we could index the as for endpoint;
As long as we have an idea what the design looks like, now is as good a time to start as any... |
c43fc32
to
214c3bf
Compare
Implemented the second option listed here (the one in @thibaultcha's example), with Also, changed the |
a89608f
to
9ffe79a
Compare
Update: changed the behavior so that failure to resolve DNS (which causes the target to not be included in the balancer) returns |
I'd like to have some opinions on how we should call the four states. Things to consider:
tl;dr:
|
@Tieske Not really - we are designing this on the go which we want to avoid for such a massive change, but more importantly we have our work and priorities already set for the next few weeks, with 2 releases that need to be completed, documented, tested and shipped - there probably isn't time to start changing this now if we want to stick to our target dates.
@hishamhm My thinking reading this:
It does indeed look more appropriate for parsing... Maybe "not in balancer" isn't a state, but other states (such as
It'd be nice to also keep track of the error that triggerred the |
Yeah, an extra description field crossed my mind as well, but that error happens inside If we were to add a detailed diagnostics argument, it would be nice to also display the reason why the healthchecker switched state as well (HTTP status, timeout, etc.) but the library doesn't store that at this point, so file this under non-trivial too. :) |
9ffe79a
to
767a5f6
Compare
767a5f6
to
a681fb2
Compare
@hishamhm ditto: can we avoid splitting a |
kong/api/routes/upstreams.lua
Outdated
-- meaning that if DNS is down and it eventually resumes working, the | ||
-- library will issue the callback and the target will change state. | ||
entry.health = health_info[entry.target] or "DNS_ERROR" | ||
else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: it'd be nice to keep the line break above this just as usual - other branches in this file follow it
kong/api/routes/upstreams.lua
Outdated
local health_info | ||
health_info, err = balancer.get_upstream_health(upstream_id) | ||
if err then | ||
ngx.log(ngx.ERR, "[healthchecks] failed getting upstream health: ", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[healthchecks]
is the namespace we chose for the actual health-checker, but I am thinking maybe this one should be different since it is an error in the Admin API and not in the proxy itself (less worrying for users especially since no stacktrace is printed by ngx.log
).
If we use an "Admin API" namespace we could also add it to the above error log (or no namespace at all would be fine to imho).
a681fb2
to
34ca558
Compare
This adds `/upstreams/:upstream_id/health`, an endpoint to report health of the targets of an upstream. It returns the same contents of `/upstreams/:upstream_id/targets`, with the addition of a `"health"` field, which may return one of four values: * `"DNS_ERROR"` - target failed to be inserted into the load balancer (because of DNS resolution error), target is therefore **not** in use by the load balancer * `"HEALTHCHECKS_OFF"` - healthchecks are disabled, target is in use by the load balancer * `"HEALTHY"` - healthchecks are enabled, target is considered healthy by healthchecks, target is in use by the load balancer * `"UNHEALTHY"` - healthchecks are enabled, target is considered unhealthy by healthchecks, target is in **not** in use by the load balancer Note that if DNS for the target did not resolve, it will display as `"not in balancer"` regardless of healthchecks being enabled or disabled.
34ca558
to
bc8b8c1
Compare
Note: This is a work-in-progress.
This adds
/upstreams/:upstream_id/health
, an endpoint to report health of the targets of an upstream.It returns the same contents of
/upstreams/:upstream_id/targets
, with the addition of a"health"
field, which may return one of four values:"not in balancer"
- target failed to be inserted into the load balancer (usually because of DNS resolution error), target is therefore not in use by the load balancer"healthchecks disabled"
- healthchecks are disabled, target is in use by the load balancer"healthy"
- healthchecks are enabled, target is considered healthy by healthchecks, target is in use by the load balancer"unhealthy"
- healthchecks are enabled, target is considered unhealthy by healthchecks, target is in not in use by the load balancerNote that if DNS for the target did not resolve, it will display as
"not in balancer"
regardless of healthchecks being enabled or disabled.