diff --git a/designs/multi-cluster.md b/designs/multi-cluster.md new file mode 100644 index 0000000000..5d630a1901 --- /dev/null +++ b/designs/multi-cluster.md @@ -0,0 +1,281 @@ +# Multi-Cluster Support +Author: @sttts +Initial implementation: @vincepri + +Last Updated on: 03/26/2024 + +## Table of Contents + + +- [Multi-Cluster Support](#multi-cluster-support) + - [Table of Contents](#table-of-contents) + - [Summary](#summary) + - [Motivation](#motivation) + - [Goals](#goals) + - [Examples](#examples) + - [Non-Goals/Future Work](#non-goalsfuture-work) + - [Proposal](#proposal) + - [Multi-Cluster-Compatible Reconcilers](#multi-cluster-compatible-reconcilers) + - [User Stories](#user-stories) + - [Controller Author with no interest in multi-cluster wanting to old behaviour.](#controller-author-with-no-interest-in-multi-cluster-wanting-to-old-behaviour) + - [Multi-Cluster Integrator wanting to support cluster managers like Cluster-API or kind](#multi-cluster-integrator-wanting-to-support-cluster-managers-like-cluster-api-or-kind) + - [Multi-Cluster Integrator wanting to support apiservers with logical cluster (like kcp)](#multi-cluster-integrator-wanting-to-support-apiservers-with-logical-cluster-like-kcp) + - [Controller Author without self-interest in multi-cluster, but open for adoption in multi-cluster setups](#controller-author-without-self-interest-in-multi-cluster-but-open-for-adoption-in-multi-cluster-setups) + - [Controller Author who wants to support certain multi-cluster setups](#controller-author-who-wants-to-support-certain-multi-cluster-setups) + - [Risks and Mitigations](#risks-and-mitigations) + - [Alternatives](#alternatives) + - [Implementation History](#implementation-history) + + + +## Summary + +Controller-runtime today allows to write controllers against one cluster only. +Multi-cluster use-cases require the creation of multiple managers and/or cluster +objects. This proposal is about adding native support for multi-cluster use-cases +to controller-runtime. + +## Motivation + +This change is important because: +- multi-cluster use-cases are becoming more and more common, compare projects + like Kamarda, Crossplane or kcp. They all need to write (controller-runtime) + controllers that operate on multiple clusters. +- writing controllers for upper systems in a **portable** way is hard today. + Consequently, there is no multi-cluster controller ecosystem, but could and + should be. +- kcp maintains a [controller-runtime fork with multi-cluster support](https://github.com/kcp-dev/controller-runtime) + because adding support on-top leads to an inefficient controller design, and + even more important leads of divergence in the ecosystem. + +### Goals + +- Provide a way to natively write controllers that + 1. (UNIFORM MULTI-CLUSTER CONTROLLER) operate on multiple clusters in a uniform way, + i.e. reconciling the same resources on multiple clusters, **optionally** + - sourcing information from one central hub cluster + - sourcing information cross-cluster. + + Example: distributed `ReplicaSet` controller, reconciling `ReplicaSets` on multiple clusters. + 2. (AGGREGATING MULTI-CLUSTER CONTROLLER) operate on one central hub cluster aggregating information from multiple clusters. + + Example: distributed `Deployment` controller, aggregating `ReplicaSets` back into the `Deployment` object. +- Allow clusters to dynamically join and leave the set of clusters a controller operates on. +- Allow event sources to be cross-cluster: + 1. Multi-cluster events that trigger reconciliation in the one central hub cluster. + 2. Central hub cluster events to trigger reconciliation on multiple clusters. +- Allow (informer) indexes that span multiple clusters. +- Allow logical clusters where a set of clusters is actually backed by one physical informer store. +- Allow 3rd-parties to plug in their multi-cluster adapter (in source code) into + an existing multi-cluster-compatible code-base. +- Minimize the amount of changes to make a controller-runtime controller + multi-cluster-compatible, in a way that 3rd-party projects have no reason to + object these kind of changes. + +Here we call a controller to be multi-cluster-compatible if the reconcilers get +reconcile requests in cluster `X` and do all reconciliation in cluster `X`. This +is less than being multi-cluster-aware, where reconcilers implement cross-cluster +logic. + +### Examples + +- Run a controller-runtime controller against a kubeconfig with arbitrary many contexts, all being reconciled. +- Run a controller-runtime controller against cluster-managers like kind, Cluster-API, Open-Cluster-Manager or Hypershift. +- Run a controller-runtime controller against a kcp shard with a wildcard watch. + +### Non-Goals/Future Work + +- Ship integration for different multi-cluster setups. This should become + out-of-tree subprojects that can individually evolve and vendor'ed by controller authors. +- Make controller-runtime controllers "binary pluggable". +- Manage one manager per cluster. +- Manage one controller per cluster with dedicated workqueues. + +## Proposal + +The `ctrl.Manager` _SHOULD_ be extended to get an optional `cluster.Provider` via +`ctrl.Options` implementing + +```golang +// pkg/cluster +type Provider interface { + Get(ctx context.Context, clusterName string, opts ...Option) (Cluster, error) + List(ctx context.Context) ([]string, error) + Watch(ctx context.Context) (Watcher, error) +} +``` +The `cluster.Cluster` _SHOULD_ be extended with a unique name identifier: +```golang +// pkg/cluster: +type Cluster interface { + Name() string + ... +} +``` + +The `ctrl.Manager` will use the provider to watch clusters coming and going, and +will inform runnables implementing the `cluster.AwareRunnable` interface: + +```golang +// pkg/cluster +type AwareRunnable interface { + Engage(context.Context, Cluster) error + Disengage(context.Context, Cluster) error +} +``` +In particular, controllers implement the `AwareRunnable` interface. They react +to engaged clusters by duplicating and starting their registered `source.Source`s +and `handler.EventHandler`s for each cluster through implementation of +```golang +// pkg/source +type DeepCopyableSyncingSource interface { + SyncingSource + DeepCopyFor(cluster cluster.Cluster) DeepCopyableSyncingSource +} + +// pkg/handler +type DeepCopyableEventHandler interface { + EventHandler + DeepCopyFor(c cluster.Cluster) DeepCopyableEventHandler +} +``` +The standard implementing types, in particular `internal.Kind` will adhere to +these interfaces. + +The `ctrl.Manager` _SHOULD_ be extended by a `cluster.Cluster` getter: +```golang +// pkg/manager +type Manager interface { + // ... + GetCluster(ctx context.Context, clusterName string) (cluster.Cluster, error) +} +``` +The embedded `cluster.Cluster` corresponds to `GetCluster(ctx, "")`. We call the +clusters with non-empty name "provider clusters" or "enganged clusters", while +the embedded cluster of the manager is called the "default cluster" or "hub +cluster". + +The `reconcile.Request` _SHOULD_ be extended by an optional `ClusterName` field: +```golang +// pkg/reconile +type Request struct { + ClusterName string + types.NamespacedName +} +``` + +With these changes, the behaviour of controller-runtime without a set cluster +provider will be unchanged. + +### Multi-Cluster-Compatible Reconcilers + +Reconcilers can be made multi-cluster-compatible by changing client and cache +accessing code from directly accessing `mgr.GetClient()` and `mgr.GetCache()` to +going through `mgr.GetCluster(req.ClusterName).GetClient()` and +`mgr.GetCluster(req.ClusterName).GetCache()`. + +When building a controller like +```golang +builder.NewControllerManagedBy(mgr). + For(&appsv1.ReplicaSet{}). + Owns(&v1.Pod{}). + Complete(reconciler) +``` +with the described change to use `GetCluster(ctx, req.ClusterName)` will automatically +act as *uniform multi-cluster controller*. It will reconcile resources from cluster `X` +in cluster `X`. + +For a manager with `cluster.Provider`, the builder _SHOULD_ create a controller +that sources events **ONLY** from the provider clusters that got engaged with +the controller. + +Controllers that should be triggered by events on the hub cluster will have to +opt-in like in this example: + +```golang +builder.NewControllerManagedBy(mgr). + For(&appsv1.Deployment{}, builder.InDefaultCluster). + Owns(&v1.ReplicaSet{}). + Complete(reconciler) +``` +A mixed set of sources is possible as shown here in the example. + +## User Stories + +### Controller Author with no interest in multi-cluster wanting to old behaviour. + +- Do nothing. Controller-runtime behaviour is unchanged. + +### Multi-Cluster Integrator wanting to support cluster managers like Cluster-API or kind + +- Implement the `cluster.Provider` interface, either via polling of the cluster registry + or by watching objects in the hub cluster. +- For every new cluster create an instance of `cluster.Cluster`. + +### Multi-Cluster Integrator wanting to support apiservers with logical cluster (like kcp) + +- Implement the `cluster.Provider` interface by watching the apiserver for logical cluster objects + (`LogicalCluster` CRD in kcp). +- Return a facade `cluster.Cluster` that scopes all operations (client, cache, indexers) + to the logical cluster, but backed by one physical `cluster.Cluster` resource. +- Add cross-cluster indexers to the physical `cluster.Cluster` object. + +### Controller Author without self-interest in multi-cluster, but open for adoption in multi-cluster setups + +- Replace `mgr.GetClient()` and `mgr.GetCache` with `mgr.GetCluster(req.ClusterName).GetClient()` and `mgr.GetCluster(req.ClusterName).GetCache()`. +- Make manager and controller plumbing vendor'able to allow plugging in multi-cluster provider. + +### Controller Author who wants to support certain multi-cluster setups + +- Do the `GetCluster` plumbing as described above. +- Vendor 3rd-party multi-cluster providers and wire them up in `main.go` + +## Risks and Mitigations + +- The standard behaviour of controller-runtime is unchanged for single-cluster controllers. +- The activation of the multi-cluster mode is through attaching the `cluster.Provider` to the manager. + To make it clear that the semantics are experimental, we make the `Options.provider` field private + and adds `Options.WithExperimentalClusterProvider` method. +- We only extend these interfaces and structs: + - `ctrl.Manager` with `GetCluster(ctx, clusterName string) (cluster.Cluster, error)` + - `cluster.Cluster` with `Name() string` + - `reconcile.Request` with `ClusterName string` + We think that the behaviour of these extensions is well understood and hence low risk. + Everything else behind the scenes is an implementation detail that can be changed + at any time. + +## Alternatives + +- Multi-cluster support could be built outside of core controller-runtime. This would + lead likely to a design with one manager per cluster. This has a number of problems: + - only one manager can serve webhooks or metrics + - cluster management must be custom built + - logical cluster support would still require a fork of controller-runtime and + with that a divergence in the ecosystem. The reason is that logical clusters + require a shared workqueue because they share the same apiserver. So for + fair queueing, this needs deep integration into one manager. + - informers facades are not supported in today's cluster/cache implementation. +- We could deepcopy the builder instead of the sources and handlers. This would + lead to one controller and one workqueue per cluster. For the reason outlined + in the previous alternative, this is not desireable. +- We could skip adding `ClusterName` to `reconcile.Request` and instead pass the + cluster through in the context. On the one hand, this looks attractive as it + would avoid having to touch reconcilers at all to make them multi-cluster-compatible. + On the other hand, with `cluster.Cluster` embedded into `manager.Manager`, not + every method of `cluster.Cluster` carries a context. So virtualizing the cluster + in the manager leads to contradictions in the semantics. + + For example, it can well be that every cluster has different REST mapping because + installed CRDs are different. Without a context, we cannot return the right + REST mapper. + + An alternative would be to add a context to every method of `cluster.Cluster`, + which is a much bigger and uglier change than what is proposed here. + + +## Implementation History + +- [PR #2207 by @vincepri : WIP: ✨ Cluster Provider and cluster-aware controllers](https://github.com/kubernetes-sigs/controller-runtime/pull/2207) – with extensive review +- [PR #2208 by @sttts replace #2207: WIP: ✨ Cluster Provider and cluster-aware controllers](https://github.com/kubernetes-sigs/controller-runtime/pull/2726) – + picking up #2207, addressing lots of comments and extending the approach to what kcp needs, with a `fleet-namespace` example that demonstrates a similar setup as kcp with real logical clusters. +- [github.com/kcp-dev/controller-runtime](https://github.com/kcp-dev/controller-runtime) – the kcp controller-runtime fork