Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for instance readiness KEP #1692

Merged
merged 1 commit into from
Sep 29, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 71 additions & 1 deletion keps/0034-instance-health.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ status: provisional

## Summary

KUDO helps people implement their operators and it's focus is day 2 operations. Part of day 2 is also monitoring your workload readiness after deployment. To help with that, KUDO will expose readiness computed as a heuristic based on readiness of the underlying resources. In the first iteration, readiness will be just a simple heuristic computed from Pods, StatefulSets, Deployments, ReplicaSets, DaemonSets and Services (let's call them *readiness phase 1 resources*).
KUDO helps people implement their operators and it's focus is day 2 operations. Part of day 2 is also monitoring your workload readiness after deployment. To help with that, KUDO will expose readiness computed as a heuristic based on readiness of the underlying resources. In the first iteration, readiness will be just a simple heuristic computed from Pods, StatefulSets, Deployments, ReplicaSets and DaemonSets (let's call them *readiness phase 1 resources*).

## Motivation

Expand All @@ -43,3 +43,73 @@ Drift detection (detecting that resource was deleted or changed manually)
Including other types of resources than *readiness phase 1 resources*
Determining if the underlying application is functional
Determining if the underlying application is reachable

## Proposal

Readiness of an `Instance` will be communicated via a newly introduced `Status.Conditions` field. This field as a convention is an array of items that will [conform the schema](https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/types.go#L1367) recommended by k8s api machinery. The `Type` of the newly added condition will be `Ready`. Condition is supposed to have these [three values](https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/types.go#L1288-L1298) which will have the following meaning:
- True - the last observation of readiness on this instance is that it’s ready
- False - the last observation of readiness on this instance is that it’s NOT ready
- Unknown - deploy/upgrade/update plan is running or last plan execution of these plans ended with FATAL_ERROR

Keeping an unknown state for plan running is important from the fact that resources can be applied sequentially so in time when pre-install steps are running, we would be saying the instance is ready even though no deployments were applied as that happens in later steps.

Example of Instance status being ready:
alenkacz marked this conversation as resolved.
Show resolved Hide resolved
```
apiVersion: kudo.dev/v1beta1
kind: Instance
Status:
Conditions:
- type: Ready
status: "True"
lastTransitionTime: 2018-01-01T00:00:00Z
```

Example of Instance status being NOT ready:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have an example of Unknown?

apiVersion: kudo.dev/v1beta1
kind: Instance
Status:
  Conditions:
    - type: Ready                     
      status: "Unknown"
      lastTransitionTime: 2018-01-01T00:00:00Z
      Reason: PlanRunning
      Message: Plan 'deploy' is currently running

```
apiVersion: kudo.dev/v1beta1
kind: Instance
Status:
Conditions:
- type: Ready
status: "False"
lastTransitionTime: 2018-01-01T00:00:00Z
Reason: ResourceNotReady
Message: Deployment ‘xxx’ is not ready, Deployment ‘yyy’ is not ready
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implementation detail warning :)
Is it your expectation that we will create a "message" through the joining of resource events (which is where we can see that a pod can't be scheduled)? that is what I have in my mental model... I don't know how you would do it another way... if that is the case... is there a way to identify the resource with the message or a format for this message.

I also assume we will use the constraints defined in the schema provided... which has Reason constrained to 1024 and Reason to 32768 which we may need to be mindful of and have a solution for

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you asking whether the resource name will be machine parse-able out of that? 🤔 but yeah, that idea is that we'll join some information we can get from the resources not ready. it won't be machine parse-able or that was not my intention - message has a goal of being human readable, while reason should be something that machine can rely on. If that's a hard requirement, we can try to make it machine parse-able.

Yeah, good point, we should think about those constraints in implementation. Won't be a problem for a reason as that should be an enum

```

Example of status being Unknown for running plan:
```
apiVersion: kudo.dev/v1beta1
kind: Instance
Status:
Conditions:
- type: Ready
status: "Unknown"
lastTransitionTime: 2018-01-01T00:00:00Z
Reason: PlanRunning
Message: Plan 'deploy' is currently running
```

Ready is expected to be an oscillating condition and it will indicate the resources owned by the instance were believed to be ready at the time it was last verified. There’s an expectation that the value won’t be stalled for more than 1 minute (at least one minute after the state changes, status should be reflecting it).

Unknown state will be set whenever deploy/upgrade/update plan is running - this is because plan run should be an atomic operation going from stable state to another stable state, evaluating health of all resources involved in the plan as part of the execution. It feels redundant to be also checking for readiness additionally to the health checks of the plan execution.

The reason for using Conditions field rather than introducing a separate field is mainly that it’s starting to be established as a go-to pattern in kubernetes world. Also conditions have a standard set of fields attached with metadata that are useful for our case (namely Reason, Message). Having a ‘Ready’ condition is also [encouraged by sig-architecture](https://github.com/kubernetes/community/pull/4521/files). (excerpt from that document: *Although they are not a consistent standard, the `Ready` and `Succeeded` condition types may be used by API designers for long-running and bounded-execution objects, respectively.*)

### Implementation Details/Notes/Constraints
ANeumann82 marked this conversation as resolved.
Show resolved Hide resolved

Setting `Ready` condition will be a responsibility of existing `instance_controller`. It will be watching for all the types in *readiness phase 1 resources* (it’s already watching most of those already so there’s really no additional big performance penalty) and trigger reconcile for owned Instance.

Controller will have to pull all the *readiness phase 1 resources* (this means additional N requests to API server where N is number of resource types) while filtering for labels `heritage: kudo` and `kudo.dev/instance=<instance-name>`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to filter out resources of other plans? E.g. if there are additional pods running (monitoring or 3rd party tools) that are not ready? Currently, we set kudo.dev/plan annotation but not as label.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to zen-dog's question (which aligns with earlier mention of job pods a little)

additionally... if there is a plan exec that launches a temp resource (a service for a short duration) or pod that finishes... is that resource involved in the status? is there a way to opt-out of being involved? should we design an opt-out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, good point, I think I like the idea of opt-out as it's hard to really capture these cases without it as one can use plans in any way possible. Would that in your mind solve the problem you pointed out @zen-dog ?

Copy link
Contributor

@zen-dog zen-dog Sep 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opt-out (or maybe rather opt-in?) should work. We would only consider resources with 'kudo.dev/plan' in (deploy, update) but extend the set with more plans if an operator developer specified it. But this is still "edgy" as resources can be reused between plans and other plans like e.g. upgrade can have one-time migration Jobs, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's enough to simply pull all resources and filter for labels. The issue is, that the deploy plan may have multiple steps that deploy resources, but at any given time the plan can land in FATAL_ERROR and cancel the execution of subsequent steps.

If we were to pull only deployed resources with a filter, we could end up with a half-executed deploy plan in FATAL_ERROR, but "Ready" condition reporting "True" because all deployed resources are healthy - But the operator itself isn't.

I think this touches a lot of the problems we'll see with drift detection. I'm not sure if opt-out or opt-in will be the better solution here. Maybe only apply to the "deploy" plan by default, and allow other plans to opt-in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You actually brought up a very good edge case, what if plan fails 🤔 I think if last plan is in fatal_error would not it make sense for ready to be "False"? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or should it rather be unknown then? I think ideally it should be degraded https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/types.go#L1288-L1298 but that does not exist yet :)


From all these resources it would compute readiness based on `Condition: Ready`, `Condition: Available` or other fields based on the convention of that particular type.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what to do for Service type. Maybe we should drop that from the listed resources?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, probably a good idea for phase 1. We can always add it later, but I think it'll require custom code and more than Condition checking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed the services for now, I'll add it there again if we have a clear path on how to assume readiness there


For operators installing their app via e.g. `Deployment`, there’s no need at this point to also check the Status of the underlying Pods because Deployment `Status` already mirrors the readiness of the underlying Pods. This is true to all higher-level types like StatefulSet etc.

### Risks and Mitigations

On a big cluster with a huge number of Pods and Deployments, it’s possible that a controller might have a scaling issues because of number of items it need to process.

This is mitigated by a fact that inside event handlers we’re filtering only for events that belong to KUDO which should limit the scope of events to only a very few (the expectation is that majority of the Deployments does not come from KUDO).

Also this feature is not making this situation any worse as it’s right now because KUDO controller already watches for Pods and Deployments, so we’re not introducing new overhead. That said the controller will have to perform much more work in times where it was just "idling" before - because it was working only when plan was run, otherwise the reconcile ended right after it started. This could pose problem on bigger clusters with many KUDO operators and could be mitigated by running KUDO controller with multiple workers (right now 1 worker is enough on most installations we're aware of).