KEP-3352: Aggregated Discovery KEP to Alpha #3364

Jefftree · 2022-06-08T17:17:49Z

One-line PR description: Add KEP for publishing an aggregated discovery document. This will reduce the number of requests by clients when fetching all api resources from the server.

Issue link: Aggregated Discovery #3352

Other comments:

keps/prod-readiness/sig-api-machinery/3352.yaml

keps/sig-api-machinery/3352-aggregated-discovery/README.md

deads2k · 2022-06-14T18:58:41Z

@Jefftree forgot a push maybe? I see some comments resolved, but not an update on the PR.

deads2k · 2022-06-14T20:30:40Z

keps/sig-api-machinery/3352-aggregated-discovery/README.md

+
+### Aggregation
+
+For the aggregation layer on the server, a new controller will be


This will likely need to have an input to readiness and you'll need to filter the discovery results by applicable group/version.

Yeah it's real bad if the first few clients get an empty doc because this controller hasn't run yet, but it's also bad to delay the cluster startup.

If api server signals it is ready even when discovery aggregation is not completed, this will not only affects clients but also causes flakiness of some e2e tests depends on discovery?

If api server signals it is ready even when discovery aggregation is not completed, this will not only affects clients but also causes flakiness of some e2e tests depends on discovery?

More importantly, that causes bad behavior of the namespace lifecycle controller.

ah yeah that's another thing to consider, how does this affect that controller.

will clients be able to know that e.g. a specific GV's backing apiserver hasn't been responsive for a while?

if an apiserver restarts in that situation, how does it build the cache?

Added that this will be an input to readiness.

will clients be able to know that e.g. a specific GV's backing apiserver hasn't been responsive for a while?

Yes, added a lastContacted time for all group versions

deads2k · 2022-06-14T20:30:52Z

Is this still WIP or do you want it?

lavalamp

all my comments are minor except for the one about what do we do on apiserver startup.

keps/sig-api-machinery/3352-aggregated-discovery/README.md

lavalamp · 2022-06-14T21:35:18Z

keps/sig-api-machinery/3352-aggregated-discovery/README.md

+
+### Aggregation
+
+For the aggregation layer on the server, a new controller will be


Yeah it's real bad if the first few clients get an empty doc because this controller hasn't run yet, but it's also bad to delay the cluster startup.

keps/sig-api-machinery/3352-aggregated-discovery/README.md

ardaguclu · 2022-06-16T06:56:16Z

keps/sig-api-machinery/3352-aggregated-discovery/README.md

+currently be found under `apis/<group>/<version>` and `/api/v1` for
+the legacy group version. This discovery endpoint will support
+publishing an ETag so clients who already have the latest version of
+the aggregated discovery can avoid redownloading the document.


Does that mean that client will no longer cache documents as files?

or caching just one file with TTL

Yes, we will skip the serverresources.json caching mechanism that is provided by client-go. Instead, we will rely on the httpcache library to automatically cache and use ETags. kubectl uses the diskcache provided by http cache so we should still be writing the cached document to a file.

keps/sig-api-machinery/3352-aggregated-discovery/README.md

deads2k · 2022-06-16T17:00:05Z

keps/sig-api-machinery/3352-aggregated-discovery/README.md

+
+This is approximately what the new API will look like (conflicting names will be renamed)
+
+```go


looks good in concept.

deads2k · 2022-06-17T18:17:00Z

There will be some details to work out either here or during implementation of the controller and APIs. @lavalamp is here next week, so I'll leave where that ends up.

PRR lgtm too.

I volunteer to review the implementation for this one. Please ping me on slack as PRs get opened up.

/approve
/assign @lavalamp

lavalamp · 2022-06-23T20:23:27Z

keps/sig-api-machinery/3352-aggregated-discovery/README.md

+type APIGroupList struct {
+	TypeMeta `json:",inline"`
+	// groups is a list of APIGroup.
+	Groups []APIGroup `json:"groups" protobuf:"bytes,1,rep,name=groups"`


is there a reason to have the group level? Why not just have all group versions?

We need the separation so we can indicate the priority of group-versions within a group.

That field is commented out below, is that intentional?

Yes, per David's comment, we'll remove the preferredVersion field and sort the Versions list based on priority. The preferredVersion should be at the top of the list.

Oh, if you mean that the GroupVersion field is commented out, this was based on #3364 (comment)

Well, you can still keep it sorted even if the list was flat, but we can work this out in the code.

lavalamp · 2022-06-23T22:08:24Z

keps/sig-api-machinery/3352-aggregated-discovery/README.md

+
+ 	// LastContacted is the last time that the apiserver has successfully reached the
+ 	// corresponding group version's discovery document. This will be nil if the group-version
+ 	// has not been aggregated yet. To maintain consistency across scenarios with multiple


If it is nil, does that imply that APIResources above is an empty list?

Yes, added a note

keps/sig-api-machinery/3352-aggregated-discovery/README.md

lavalamp · 2022-06-23T22:24:24Z

/lgtm
/approve

thanks!

k8s-ci-robot · 2022-06-23T22:24:45Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, Jefftree, lavalamp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/prod-readiness/OWNERS~~ [deads2k]
~~keps/sig-api-machinery/OWNERS~~ [deads2k,lavalamp]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

smarterclayton · 2022-09-12T20:53:23Z

keps/sig-api-machinery/3352-aggregated-discovery/README.md

+api servers may take longer to respond and we do not want to delay
+cluster startup, the health check will only block on the local api
+servers (built-ins and CRDs) to have their discovery ready. For api
+servers that have not been aggregated, their group-versions will be


Is this a typo? "have not been aggregated"? I expected to see a description of what client visible behavior we'd see for the discovery doc on initialization:

Built-ins - always visible?

CRDs - only visible once CRDs are loaded, so might be visible?

Aggregated APIs - could take materially longer to be visible?

I'd like to see some examples added to the KEP on the outcome for clients to clarify what they see during startup and how they know discovery is incomplete. Will an incomplete discovery doc return a non 200 error message? Will clients be expected to retry if the document is not complete?

smarterclayton · 2022-09-14T22:26:01Z

Asked in the #octant channel, poked the OpenShift UI folks for what their feedback on watch was (Sam said poll with etag would be enough).

I looked at a couple of places that do discovery:

https://github.com/kubernetes-client/python/blob/master/kubernetes/base/dynamic/discovery.py is python client - Ansible uses this for kube declarative management
https://github.com/kubernetes-client/java/blob/master/util/src/main/java/io/kubernetes/client/Discovery.java java, although i can't tell whether this is actually dynamic discovery or not (my java has atrophied)

I suspect that the aggregated api endpoint would be very valuable for removing some code, but latency and the need not to have a complex cache would be the big win. Will poke at others to see if we have any examples of dynamic discovery clients in the ecosystem that would benefit from better discovery as well who would potentially be impacted if we hit unrepresentable things.

spadgett · 2022-09-15T17:31:15Z

For OpenShift UI, we have the following wants:

Improve performance of initial API discovery. Right now, it's slow due to the number of requests, and discovery can block the UI loading. Browsers limit how many parallel requests we can make.
Efficiently detect new types so that the UI can react (for instance, when an operator is installed and adds CRDs).

The proposal definitely addresses (1). I think polling with ETags is sufficient for (2). Happy to see this 👍

kikisdeliveryservice

Kep.yaml needs to be updated to reflect 1.26.

keps/sig-api-machinery/3352-aggregated-discovery/kep.yaml

smarterclayton · 2022-09-21T15:28:34Z

keps/sig-api-machinery/3352-aggregated-discovery/README.md

+needs to be refreshed after 6 hours, even if it hasn’t expired.
+
+This not only impacts kubectl, but all clients of kubernetes. We can
+do better.


Some of the nuances we discussed in comments deserve to be in the KEP:

Which clients are we targeting primarily: clients that need to frequently request or maintain a discovery cache, like kubectl or web interfaces

We want all clients to benefit from this, but our alpha is targeted at solving these problems for those key consumers

We have some use cases that may require additional work to correctly support, we should identify those before entry to beta / reach consensus on their utility:

Namespace controller needs to ensure that discovery documents are refreshed after the namespace goes into the terminating phase (needs to guarantee that it sees the list of resources that could have been created within the namespace before the namespace is deleted). The controller needs to be able to guarantee discovery doc list happens-after that, which is best solved by letting the namespace controller "wait" for the next refresh. What we can't allow is for the namespace controller to not observe that resource

The use of server side caching may break some client side batch loops done with kubectl that need happens-before semantics, we should consider that usecase in kubectl and ensure we have a mitigation

Polling being sufficient over watch for most users

The API types in meta/v1 should be meta/v1alpha1 to start, not meta/v1

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 8, 2022

k8s-ci-robot requested review from fedebongio and lavalamp June 8, 2022 17:18

k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. labels Jun 8, 2022

Jefftree force-pushed the aggregated-discovery branch 2 times, most recently from fd391c2 to 553bc5d Compare June 8, 2022 18:09

deads2k reviewed Jun 10, 2022

View reviewed changes

keps/prod-readiness/sig-api-machinery/3352.yaml Outdated Show resolved Hide resolved

deads2k reviewed Jun 10, 2022

View reviewed changes

keps/sig-api-machinery/3352-aggregated-discovery/README.md Show resolved Hide resolved

deads2k reviewed Jun 10, 2022

View reviewed changes

keps/sig-api-machinery/3352-aggregated-discovery/README.md Outdated Show resolved Hide resolved

deads2k reviewed Jun 10, 2022

View reviewed changes

keps/sig-api-machinery/3352-aggregated-discovery/README.md Show resolved Hide resolved

deads2k reviewed Jun 10, 2022

View reviewed changes

keps/sig-api-machinery/3352-aggregated-discovery/README.md Show resolved Hide resolved

deads2k reviewed Jun 10, 2022

View reviewed changes

keps/sig-api-machinery/3352-aggregated-discovery/README.md Outdated Show resolved Hide resolved

deads2k reviewed Jun 10, 2022

View reviewed changes

keps/sig-api-machinery/3352-aggregated-discovery/README.md Show resolved Hide resolved

deads2k reviewed Jun 10, 2022

View reviewed changes

keps/sig-api-machinery/3352-aggregated-discovery/README.md Outdated Show resolved Hide resolved

deads2k reviewed Jun 10, 2022

View reviewed changes

keps/sig-api-machinery/3352-aggregated-discovery/README.md Show resolved Hide resolved

Priyankasaggu11929 mentioned this pull request Jun 11, 2022

Aggregated Discovery #3352

Open

11 tasks

negz reviewed Jun 14, 2022

View reviewed changes

keps/sig-api-machinery/3352-aggregated-discovery/README.md Show resolved Hide resolved

Jefftree force-pushed the aggregated-discovery branch from fbb455e to ffccd6a Compare June 14, 2022 19:06

deads2k reviewed Jun 14, 2022

View reviewed changes

Jefftree changed the title ~~[WIP] KEP-3352: Aggregated Discovery KEP to Alpha~~ KEP-3352: Aggregated Discovery KEP to Alpha Jun 14, 2022

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 14, 2022

lavalamp reviewed Jun 14, 2022

View reviewed changes

Jefftree force-pushed the aggregated-discovery branch 2 times, most recently from 9e947ba to 62b02c2 Compare June 15, 2022 18:19

ardaguclu reviewed Jun 16, 2022

View reviewed changes

deads2k reviewed Jun 16, 2022

View reviewed changes

keps/sig-api-machinery/3352-aggregated-discovery/README.md Outdated Show resolved Hide resolved

deads2k reviewed Jun 16, 2022

View reviewed changes

keps/sig-api-machinery/3352-aggregated-discovery/README.md Outdated Show resolved Hide resolved

deads2k reviewed Jun 16, 2022

View reviewed changes

k8s-ci-robot assigned lavalamp Jun 17, 2022

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 17, 2022

Jefftree force-pushed the aggregated-discovery branch from 8d9b7a5 to 6f45e97 Compare June 21, 2022 20:15

negz mentioned this pull request Jun 22, 2022

Add ability to disable certain CRDs on installation. crossplane/crossplane#2869

Closed

2 tasks

lavalamp reviewed Jun 23, 2022

View reviewed changes

Jefftree force-pushed the aggregated-discovery branch 2 times, most recently from 6459070 to 669cdc2 Compare June 23, 2022 21:25

lavalamp reviewed Jun 23, 2022

View reviewed changes

keps/sig-api-machinery/3352-aggregated-discovery/README.md Outdated Show resolved Hide resolved

Aggregated Discovery KEP

fcec4e1

Jefftree force-pushed the aggregated-discovery branch from 669cdc2 to fcec4e1 Compare June 23, 2022 22:19

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 23, 2022

k8s-ci-robot merged commit cd305f9 into kubernetes:master Jun 23, 2022

k8s-ci-robot added this to the v1.25 milestone Jun 23, 2022

apelisse mentioned this pull request Jun 27, 2022

kubectl discovery dramatically slower on MacOS than Linux kubernetes/kubernetes#110753

Closed

deads2k mentioned this pull request Jul 29, 2022

Aggregated Discovery Endpoint kubernetes/kubernetes#111409

Closed

Jefftree mentioned this pull request Aug 31, 2022

Aggregated discovery types kubernetes/kubernetes#111978

Merged

smarterclayton reviewed Sep 12, 2022

View reviewed changes

kikisdeliveryservice reviewed Sep 16, 2022

View reviewed changes

keps/sig-api-machinery/3352-aggregated-discovery/kep.yaml Show resolved Hide resolved

keps/sig-api-machinery/3352-aggregated-discovery/kep.yaml Show resolved Hide resolved

smarterclayton reviewed Sep 21, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KEP-3352: Aggregated Discovery KEP to Alpha #3364

KEP-3352: Aggregated Discovery KEP to Alpha #3364

Jefftree commented Jun 8, 2022

deads2k commented Jun 14, 2022

deads2k Jun 14, 2022

lavalamp Jun 14, 2022

ardaguclu Jun 16, 2022

deads2k Jun 16, 2022

lavalamp Jun 16, 2022

Jefftree Jun 22, 2022 •

edited

Loading

deads2k commented Jun 14, 2022

lavalamp left a comment

lavalamp Jun 14, 2022

ardaguclu Jun 16, 2022

ardaguclu Jun 16, 2022

Jefftree Jun 16, 2022

deads2k Jun 16, 2022

deads2k commented Jun 17, 2022

lavalamp Jun 23, 2022

Jefftree Jun 23, 2022

lavalamp Jun 23, 2022

Jefftree Jun 23, 2022

Jefftree Jun 23, 2022

lavalamp Jun 23, 2022

lavalamp Jun 23, 2022

Jefftree Jun 23, 2022

lavalamp commented Jun 23, 2022

k8s-ci-robot commented Jun 23, 2022

smarterclayton Sep 12, 2022

smarterclayton commented Sep 14, 2022 •

edited

Loading

spadgett commented Sep 15, 2022

kikisdeliveryservice left a comment

smarterclayton Sep 21, 2022


		### Aggregation

		For the aggregation layer on the server, a new controller will be


		This is approximately what the new API will look like (conflicting names will be renamed)

		```go

KEP-3352: Aggregated Discovery KEP to Alpha #3364

KEP-3352: Aggregated Discovery KEP to Alpha #3364

Conversation

Jefftree commented Jun 8, 2022

deads2k commented Jun 14, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jefftree Jun 22, 2022 • edited Loading

Choose a reason for hiding this comment

deads2k commented Jun 14, 2022

lavalamp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

deads2k commented Jun 17, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lavalamp commented Jun 23, 2022

k8s-ci-robot commented Jun 23, 2022

Choose a reason for hiding this comment

smarterclayton commented Sep 14, 2022 • edited Loading

spadgett commented Sep 15, 2022

kikisdeliveryservice left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jefftree Jun 22, 2022 •

edited

Loading

smarterclayton commented Sep 14, 2022 •

edited

Loading