Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-2339: beta details #3593

Merged
merged 7 commits into from
Oct 6, 2022
Merged

Conversation

enj
Copy link
Member

@enj enj commented Oct 6, 2022

/assign @deads2k

enj and others added 4 commits October 6, 2022 08:06
Signed-off-by: Monis Khan <mok@microsoft.com>
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Signed-off-by: Monis Khan <mok@microsoft.com>
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Oct 6, 2022
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Oct 6, 2022
Signed-off-by: Monis Khan <mok@microsoft.com>
## FAQ
##### e2e tests

No E2E tests are required for this enhancement as the functionality can be completely
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an e2e test for CRDs looks practical to develop. For built-in types, an e2e test does not look practical.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to not simply write an integration test for CRDs as well? We already have code for the ETCD storage path test that can create CRDs and CRs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to have both. I'd like to see an e2e demonstrate the functionality on a running cluster .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of an e2e would be to eventually make it a conformance test, I think.


###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

Yes, like any other alpha API, it can be disabled with runtime config.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're going to beta. When disabled, the cluster-admin should delete the stale resources.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naive question: why is that rule specifically pertaining to when an API is promoted to Beta?

###### Are there any missing metrics that would be useful to have to improve observability of this feature?

The following metrics could be useful, but are likely not practical due to cardinality issues or complexity of the implementation:
- The latency for a single apiserver to update it's encoding version after start-up
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this encoding update is indirectly exposed via kubernetes_healthcheck{name="poststarthook/built-in-resources-storage-version-updater"}, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some ways yes, although I can imagine scenarios where the time between when the health check passes and when the actual encoding version is updated can be non-negligible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that metric was a simple counter and did not include latency?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that metric was a simple counter and did not include latency?

Given a scrap interval, you can determine how long/often it is false. The idea for the metrics/sli is an order 5s poll.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't think we're missing this, but if you want to leave it in missing, that's ok.

Signed-off-by: Monis Khan <mok@microsoft.com>
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 6, 2022
// Spec is omitted because there is no spec field.
// Spec StorageVersionSpec
// Spec is an empty spec. It is here to comply with Kubernetes API style.
Spec StorageVersionSpec `json:"spec" protobuf:"bytes,2,opt,name=spec"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why we would need this, but we settle these things in API review, not KEPs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lavalamp this is copy pasted from the current code in k/k

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha ok

```golang
// Storage version of a specific resource.
```go
// Storage version of a specific resource.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is one of these per {group, resource}? Would be good to state that.

### Risks and Mitigations

Writes to most Kubernetes resources must be prevented until the API server has
had a chance to emit its encoding versions for all resources. This requires us to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally a readiness check will return false during this time so a loadbalancer can know not to send traffic?

@lavalamp
Copy link
Member

lavalamp commented Oct 6, 2022

OK, some nits, but I don't see anything blocking.

Signed-off-by: Monis Khan <mok@microsoft.com>
@lavalamp
Copy link
Member

lavalamp commented Oct 6, 2022

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 6, 2022
@deads2k
Copy link
Contributor

deads2k commented Oct 6, 2022

/label tide/merge-method-squash

@k8s-ci-robot k8s-ci-robot added the tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. label Oct 6, 2022
@deads2k
Copy link
Contributor

deads2k commented Oct 6, 2022

PRR looks good and the PR looks good

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, enj, lavalamp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 6, 2022
@k8s-ci-robot k8s-ci-robot merged commit b6bef75 into kubernetes:master Oct 6, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.26 milestone Oct 6, 2022
ahmedtd pushed a commit to ahmedtd/enhancements that referenced this pull request Feb 2, 2023
* KEP-2339: update to latest KEP template

Signed-off-by: Monis Khan <mok@microsoft.com>

* KEP-2339: add PRR for StorageVersion API

Signed-off-by: Andrew Sy Kim <andrewsy@google.com>

* KEP-2339: update test plan section with links to existing tests

Signed-off-by: Andrew Sy Kim <andrewsy@google.com>

* KEP-2339: add details around current code

Signed-off-by: Monis Khan <mok@microsoft.com>

* KEP-2339: add details around default behavior

Signed-off-by: Monis Khan <mok@microsoft.com>

* KEP-2339: address comments

Signed-off-by: Monis Khan <mok@microsoft.com>

* KEP-2339: address comments

Signed-off-by: Monis Khan <mok@microsoft.com>

Signed-off-by: Monis Khan <mok@microsoft.com>
Signed-off-by: Andrew Sy Kim <andrewsy@google.com>
Co-authored-by: Andrew Sy Kim <andrewsy@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants