Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ SSA: improve request caching #8243

Merged

Conversation

sbueringer
Copy link
Member

@sbueringer sbueringer commented Mar 7, 2023

Signed-off-by: Stefan Büringer buringerst@vmware.com

What this PR does / why we need it:
This PR further improves SSA request caching:

  1. caching was not used when writing KubeadmConfig / InfraMachine in KCP
  2. caching was not always effective as in some cases we continuously tried to write fields like creationTimestamp. In those cases the resourceVersion was continuously changed and thus the request was not cached.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #8146

Signed-off-by: Stefan Büringer buringerst@vmware.com
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 7, 2023
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 7, 2023

// Only keep the paths that we have opinions on.
FilterObject(u, &FilterObjectInput{
AllowedPaths: []contract.Path{
Copy link
Member Author

@sbueringer sbueringer Mar 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same as the default allowed paths in SSA + metadata.finalizers.

This should cover all relevant fields

@sbueringer
Copy link
Member Author

/test pull-cluster-api-e2e-full-main

Copy link
Contributor

@ykakarap ykakarap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Comment on lines +122 to +131
// Note: We have to patch the status here to explicitly set these two status fields.
// If we don't do it the Machine defaulting webhook will try to set the two fields to false.
// For an unknown reason this will happen with the 2nd update call (3.) below and not before.
// This means that this call would unexpectedly not cache the object because the resourceVersion
// is changed because the fields are set.
// It's unclear why those status fields are not already set during create (1.) or the first update (2.)
// (the webhook is returning patches for the two fields in those requests as well).
// To further investigate this behavior it would be necessary to debug the kube-apiserver.
// Fortunately, in reality this is not an issue as the fields will be set sooner or later and then
// the requests are cached.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh! This is interesting. 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super strange... I tried what I could with client-side & webhook debugging but didn't seem worth it to debug through the kube-apiserver to figure out why exactly it doesn't always apply the patches returned by the webhook.

createObject := initialObject.DeepCopy()
g.Expect(Patch(ctx, env.GetClient(), fieldManager, createObject)).To(Succeed())
// Note: We have to patch the status here to explicitly set these two status fields.
// If we don't do it the Machine defaulting webhook will try to set the two fields to false.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity: This is because these two fields are type bool and do not have "omitempty", right?

Copy link
Member Author

@sbueringer sbueringer Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Because they have omitempty our mutating/defaulting webhook sees a diff after marshaling old / new object (i.e. old object doesn't have the bools new ones has it).

Basically on initial create status is not written at all (because I'm just using the normal client not the status client). Then the webhook when comparing old/new after marshalling and when calculating the patches it should return always sees the diff. The webhook is sending back the patches to modify the status every time it is called.

For some reason on the 2nd update call the apiserver actually applies those patches (and thus bumps the resourceVersion).

Note: Just because we use omitempty the fields are not necessarily set in the apiserver (the apiserver doesn't care about our go structs it only cares about the fields being in the actual YAML/JSON). Usually those omitempty status fields are set as a side-effect of our mutating webhooks

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 8, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 8fdba64dc2fd463e545d8899e2a824611d30773d

@sbueringer
Copy link
Member Author

Test results > #8146 (comment)

@fabriziopandini
Copy link
Member

Overall ~ through both PRs: Before: 32934 => After: 4225 (~87% less calls / ~8x improvement)

Those are great numbers! Kudos for this work
/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: fabriziopandini

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 9, 2023
@k8s-ci-robot k8s-ci-robot merged commit 88c5261 into kubernetes-sigs:main Mar 9, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.4 milestone Mar 9, 2023
@sbueringer sbueringer deleted the pr-ssa-improve-caching branch March 10, 2023 02:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ClusterClass: cache SSA dry-run requests
4 participants