🌱 ClusterCacheTracker: use non-blocking per-cluster locking #7537

sbueringer · 2022-11-11T17:59:40Z

Co-authored-by: Florian Gutmann fgutmann@amazon.com

What this PR does / why we need it:
This PR builds on top of #6380. All credits to @fgutmann for the initial implementation.

As of today we have a global lock, which means that as soon as the clusterAccessor creation doesn't work quickly all workers of our reconcilers are blocked. This is most problematic with the Machine reconciler as usually the Machines have the highest cardinality. If the apiserver of a cluster is not reachable or slow it can take up to 10 seconds until the global lock is unlocked again. In this time no one else is able to create or retrieve a client for workload clusters.

Let's assume Cluster1 has 250 Machines and is not reachable and Cluster2 has 10 Machines and is reachable.

This leads to the following situation for the Machine reconciler:

Machine reconciler queue	Machine reconciler workers	Machine reconciler workers in GetClient
240 Machines (Cluster1) 10 Machines (Cluster2)	9 Machines (Cluster1)	1 Machine (Cluster1)

The first step is to split the global lock, this leads to the following:

Machine reconciler queue	Machine reconciler workers	Machine reconciler workers in GetClient
240 Machines (Cluster1) 9 Machines (Cluster2)	8 Machines (Cluster1)	1 Machine (Cluster1) 1 Machine (Cluster2)

While this improves the situation we now have the problem that 8 Machines are actively waiting and blocking the worker nodes. This means that it will be very hard for the 10 Machines of Cluster2 to ever get reconciled. This of course gets a lot worse the more Cluster and Machines we have and if multiple Clusters are unreachable.

This is why we now don't let workers actively wait. If a worker can't get a lock for a Cluster an error is returned and the Machine is moved back into the queue with an exponential backoff.

Machine reconciler queue	Machine reconciler workers	Machine reconciler workers in GetClient
248 Machines (Cluster1) 9 Machines (Cluster2)		1 Machine (Cluster1) 1 Machine (Cluster2)

This keeps the worker go routines free and increases the chance of all Machines to get reconciled.

Of course there are limits if for example there are far more clusters unreachable then we have workers. But I think this case would require a larger refactoring and with the new implementation + exponential backoff we should be fine.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

sbueringer · 2022-11-11T18:03:25Z

/assign @fgutmann
/assign @fabriziopandini
/assign @vincepri

k8s-ci-robot · 2022-11-11T18:03:27Z

@sbueringer: GitHub didn't allow me to assign the following users: fgutmann.

Note that only kubernetes-sigs members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @fgutmann
/assign @fabriziopandini
/assign @vincepri

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

fgutmann

Thank you for picking this up Stefan!

The approach overall looks good to me. One drawback of it is that there will be some unneccesary backoffs during normal operations. Generally, I'm not too worried about that as I think it won't affect the overall system performance much.

One aspect I don't like about the current error handling though is that they are treated as a regular reconcile error and will show up in error metrics, making it hard to properly alarm on them. Shall we update the callers to go into back-off without returning an error for this specific case?

controllers/remote/cluster_cache_tracker.go

controllers/remote/keyedmutex.go

controllers/remote/cluster_cache_tracker.go

controllers/remote/keyedmutex.go

controllers/remote/cluster_cache_tracker.go

sbueringer · 2022-11-14T13:52:01Z

@fgutmann @vincepri Thank you very much for the reviews. PTAL :)

controllers/remote/keyedmutex.go

fabriziopandini · 2022-11-14T17:05:24Z

Per lgtm to me but I'll wait for @fgutmann and @vincepri last pass before applying any label

exp/addons/internal/controllers/clusterresourceset_controller.go

controllers/remote/keyedmutex.go

fgutmann · 2022-11-14T17:09:50Z

/lgtm

k8s-ci-robot · 2022-11-14T17:09:53Z

@fgutmann: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

vincepri · 2022-11-14T17:38:59Z

/approve

k8s-ci-robot · 2022-11-14T17:39:08Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vincepri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [vincepri]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Co-authored-by: Florian Gutmann <fgutmann@amazon.com>

sbueringer · 2022-11-14T17:45:06Z

Thx for the reviews!

Squashed!

sbueringer · 2022-11-14T17:46:31Z

I'll open cherry-pick PR for release-1.2 once this PR is merged

vincepri · 2022-11-14T17:47:22Z

/lgtm

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 11, 2022

k8s-ci-robot requested review from enxebre and fabriziopandini November 11, 2022 17:59

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 11, 2022

sbueringer force-pushed the pr-cc-per-cluster-at-most-once-lock branch from 37a60e5 to 59de0a9 Compare November 11, 2022 18:03

k8s-ci-robot assigned fabriziopandini Nov 11, 2022

k8s-ci-robot assigned vincepri Nov 11, 2022

sbueringer force-pushed the pr-cc-per-cluster-at-most-once-lock branch from 59de0a9 to 63ae233 Compare November 11, 2022 18:03

fgutmann reviewed Nov 11, 2022

View reviewed changes

controllers/remote/cluster_cache_tracker.go Outdated Show resolved Hide resolved

controllers/remote/cluster_cache_tracker.go Outdated Show resolved Hide resolved

controllers/remote/keyedmutex.go Outdated Show resolved Hide resolved

vincepri reviewed Nov 11, 2022

View reviewed changes

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 14, 2022

sbueringer force-pushed the pr-cc-per-cluster-at-most-once-lock branch from 4036110 to 1318edc Compare November 14, 2022 15:12

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 14, 2022

sbueringer force-pushed the pr-cc-per-cluster-at-most-once-lock branch from 1318edc to 86ce450 Compare November 14, 2022 15:28

vincepri reviewed Nov 14, 2022

View reviewed changes

controllers/remote/keyedmutex.go Outdated Show resolved Hide resolved

vincepri reviewed Nov 14, 2022

View reviewed changes

exp/addons/internal/controllers/clusterresourceset_controller.go Show resolved Hide resolved

controllers/remote/keyedmutex.go Outdated Show resolved Hide resolved

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 14, 2022

ClusterCacheTracker: non-blocking per-cluster locking

1348144

Co-authored-by: Florian Gutmann <fgutmann@amazon.com>

sbueringer force-pushed the pr-cc-per-cluster-at-most-once-lock branch from 5fc0986 to 1348144 Compare November 14, 2022 17:44

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 14, 2022

k8s-ci-robot merged commit 75d0b22 into kubernetes-sigs:main Nov 14, 2022

k8s-ci-robot added this to the v1.3 milestone Nov 14, 2022

sbueringer deleted the pr-cc-per-cluster-at-most-once-lock branch November 14, 2022 18:17

sbueringer mentioned this pull request Nov 15, 2022

[release-1.2] 🌱 ClusterCacheTracker: non-blocking per-cluster locking #7544

Merged

sbueringer mentioned this pull request Mar 20, 2023

Blocking network access to workload clusters causes reconcile queue to grow #8306

Closed

sbueringer mentioned this pull request Jul 21, 2023

Improve test coverage of ClusterCacheTracker #9044

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🌱 ClusterCacheTracker: use non-blocking per-cluster locking #7537

🌱 ClusterCacheTracker: use non-blocking per-cluster locking #7537

sbueringer commented Nov 11, 2022 •

edited

Loading

sbueringer commented Nov 11, 2022

k8s-ci-robot commented Nov 11, 2022

fgutmann left a comment

sbueringer commented Nov 14, 2022

fabriziopandini commented Nov 14, 2022

fgutmann commented Nov 14, 2022

k8s-ci-robot commented Nov 14, 2022

vincepri commented Nov 14, 2022

k8s-ci-robot commented Nov 14, 2022

sbueringer commented Nov 14, 2022

sbueringer commented Nov 14, 2022

vincepri commented Nov 14, 2022

🌱 ClusterCacheTracker: use non-blocking per-cluster locking #7537

🌱 ClusterCacheTracker: use non-blocking per-cluster locking #7537

Conversation

sbueringer commented Nov 11, 2022 • edited Loading

sbueringer commented Nov 11, 2022

k8s-ci-robot commented Nov 11, 2022

fgutmann left a comment

Choose a reason for hiding this comment

sbueringer commented Nov 14, 2022

fabriziopandini commented Nov 14, 2022

fgutmann commented Nov 14, 2022

k8s-ci-robot commented Nov 14, 2022

vincepri commented Nov 14, 2022

k8s-ci-robot commented Nov 14, 2022

sbueringer commented Nov 14, 2022

sbueringer commented Nov 14, 2022

vincepri commented Nov 14, 2022

sbueringer commented Nov 11, 2022 •

edited

Loading