Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ExtensionConfig reconcile performance #8495

Closed
killianmuldoon opened this issue Apr 6, 2023 · 7 comments · Fixed by #9338
Closed

Improve ExtensionConfig reconcile performance #8495

killianmuldoon opened this issue Apr 6, 2023 · 7 comments · Fixed by #9338
Assignees
Labels
area/runtime-sdk Issues or PRs related to Runtime SDK triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@killianmuldoon
Copy link
Contributor

Leading on from #7985 we should ensure watch on ExtensionConfig correctly filters by extensions actually referenced in the ClusterClass. This might include adding an index of index of ExtensionConfigs -> ClusterClass to quickly find all ClusterClasses using a particular ExtensionConfig.

/area runtime-sdk

@k8s-ci-robot k8s-ci-robot added area/runtime-sdk Issues or PRs related to Runtime SDK needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 6, 2023
@fabriziopandini
Copy link
Member

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 11, 2023
@sbueringer
Copy link
Member

sbueringer commented Aug 4, 2023

Current state:

If we receive a create/update event on an ExtensionConfig we trigger a reconcile of all ClusterClass that have a DiscoverVariablesExtension configured. The only optimization in place is that if the ExtensionConfig has a namespaceSelector set we only do it for matching ClusterClasses.

Example:

  • 10 ClusterClasses with each referencing 2 ExtensionConfigs (with overall 20 ExtensionConfigs)
  • That means every resync period we get 20 reconciles for every ClusterClass (ignoring deduplication if multiple ClusterClass items are added to the queue at the same time)

I think this can lead to pretty high load on Runtime Extensions

@sbueringer
Copy link
Member

Oh wait I think it's totally broken.

Looks like we don't map ExtensionConfig to ClusterClass, we map ExtensionConfig to ExtensionConfig (i.e. we have to return ClusterClass name/namespace not ExtensionConfig name)

https://github.com/kubernetes-sigs/cluster-api/blob/3e16abc3df71314bbaee770846beba45c0eb170d/internal/controllers/clusterclass/clusterclass_controller.go#L412C1-L412C1

(@killianmuldoon can you double check)

@killianmuldoon
Copy link
Contributor Author

Yeah you're right. The current behaviour does trigger a reconcile - but it always returns an error after looking up the ClusterClass. Will open a bugfix now.

@fabriziopandini
Copy link
Member

fabriziopandini commented Aug 18, 2023

Does the reconcile queue take care of duplicate reconcile requests for the same cc?

@sbueringer
Copy link
Member

sbueringer commented Aug 18, 2023

Hard to predict. I think it depends on when/how exactly the resync for ExtensionConfigs come in and how quickly the ClusterClass is reconciled.

As soon as a ClusterClass reconcile is started the next event would mark the ClusterClass as dirty in the queue and lead to another reconcile. I think it's simpler to fix the issue then trying to estimate the exact impact (as it might vary based on performance etc.)

@killianmuldoon
Copy link
Contributor Author

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/runtime-sdk Issues or PRs related to Runtime SDK triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants