From 1807fe74ffe32600cec33c1339ed2131a047e0a7 Mon Sep 17 00:00:00 2001 From: Jack Francis Date: Fri, 6 Oct 2023 16:48:30 -0700 Subject: [PATCH] initial draft of modified proposal including new CRD Signed-off-by: Jack Francis --- docs/book/src/reference/glossary.md | 22 +- docs/proposals/20220725-managed-kubernetes.md | 18 +- ...0230407-flexible-managed-k8s-endpoints.md} | 235 +++++++++++++----- 3 files changed, 196 insertions(+), 79 deletions(-) rename docs/proposals/{20230407-managed-k8s-capi-contract-changes.md => 20230407-flexible-managed-k8s-endpoints.md} (56%) diff --git a/docs/book/src/reference/glossary.md b/docs/book/src/reference/glossary.md index 7c21c9211b6c..6b58ead3d96d 100644 --- a/docs/book/src/reference/glossary.md +++ b/docs/book/src/reference/glossary.md @@ -26,9 +26,9 @@ A temporary cluster that is used to provision a Target Management cluster. ### Bootstrap provider Refers to a [provider](#provider) that implements a solution for the [bootstrap](#bootstrap) process. -Bootstrap provider's interaction with Cluster API is based on what is defined in the [Cluster API contract](#contract). +Bootstrap provider's interaction with Cluster API is based on what is defined in the [Cluster API contract](#contract). -See [CABPK](#cabpk). +See [CABPK](#cabpk). # C --- @@ -132,6 +132,12 @@ See [core provider](#core-provider) The Cluster API execution model, a set of controllers cooperating in managing the Kubernetes cluster lifecycle. +### Cluster Infrastructure + +or __Kubernetes Cluster Infrastructure__ + +Defines the **infrastructure that supports a Kubernetes cluster**, like e.g. VPC, security groups, load balancers, etc. Please note that in the context of managed Kubernetes some of those components are going to be provided by the corresponding abstraction for a specific Cloud provider (EKS, OKE, AKS etc), and thus Cluster API should not take care of managing a subset or all those components. + ### Contract Or __Cluster API contract__ @@ -155,7 +161,7 @@ See [KCP](#kcp). ### Core provider -Refers to a [provider](#provider) that implements Cluster API core controllers; if you +Refers to a [provider](#provider) that implements Cluster API core controllers; if you consider that the first project that must be deployed in a management Cluster is Cluster API itself, it should be clear why the Cluster API project is also referred to as the core provider. @@ -196,7 +202,7 @@ see [Server](#server) ### Infrastructure provider -Refers to a [provider](#provider) that implements provisioning of infrastructure/computational resources required by +Refers to a [provider](#provider) that implements provisioning of infrastructure/computational resources required by the Cluster or by Machines (e.g. VMs, networking, etc.). Infrastructure provider's interaction with Cluster API is based on what is defined in the [Cluster API contract](#contract). @@ -205,7 +211,7 @@ When there is more than one way to obtain resources from the same infrastructure For a complete list of providers see [Provider Implementations](providers.md). -### Inline patch +### Inline patch A [patch](#patch) defined inline in a [ClusterClass](#clusterclass). An alternative to an [external patch](#external-patch). @@ -269,6 +275,10 @@ See also: [Server](#server) Perform create, scale, upgrade, or destroy operations on the cluster. +### Managed Kubernetes + +Managed Kubernetes refers to any Kubernetes cluster provisioning and maintenance abstraction, usually exposed as an API, that is natively available in a Cloud provider. For example: [EKS](https://aws.amazon.com/eks/), [OKE](https://www.oracle.com/cloud/cloud-native/container-engine-kubernetes/), [AKS](https://azure.microsoft.com/en-us/products/kubernetes-service), [GKE](https://cloud.google.com/kubernetes-engine), [IBM Cloud Kubernetes Service](https://www.ibm.com/cloud/kubernetes-service), [DOKS](https://www.digitalocean.com/products/kubernetes), and many more throughout the Kubernetes Cloud Native ecosystem. + ### Managed Topology See [Topology](#topology) @@ -306,7 +316,7 @@ A generically understood combination of a kernel and system-level userspace inte # P --- -### Patch +### Patch A set of instructions describing modifications to a Kubernetes object. Examples include JSON Patch and JSON Merge Patch. diff --git a/docs/proposals/20220725-managed-kubernetes.md b/docs/proposals/20220725-managed-kubernetes.md index 6ae49931b2cf..5191e3e22a0b 100644 --- a/docs/proposals/20220725-managed-kubernetes.md +++ b/docs/proposals/20220725-managed-kubernetes.md @@ -16,7 +16,7 @@ reviewers: creation-date: 2022-07-25 last-updated: 2023-06-15 status: implementable -see-also: ./20230407-managed-k8s-capi-contract-changes.md +see-also: ./20230407-flexible-managed-k8s-endpoints.md replaces: superseded-by: --- @@ -97,7 +97,7 @@ Some Cluster API Providers (i.e. Azure with AKS first and then AWS with EKS) hav While working on supporting ClusterClass for EKS in Cluster API Provider AWS (CAPA), it was discovered that the current implementation of EKS within CAPA, where a single resource kind (AWSManagedControlPlane) is used for both ControlPlane and Infrastructure, is incompatible with other parts of CAPI assuming the two objects are different (Reference [issue here](https://github.com/kubernetes-sigs/cluster-api/issues/6126)). -Separation of ControlPlane and Infrastructure is expected for the ClusterClass implementation to work correctly. However, after the changes documented in the [Contract Changes to Support Managed Kubernetes CAEP](./20230407-managed-k8s-capi-contract-changes) have been implemented there is the option to supply only the control plane, but you still cannot supply the same resource for both. +Separation of ControlPlane and Infrastructure is expected for the ClusterClass implementation to work correctly. However, after the changes documented in the [Contract Changes to Support Managed Kubernetes CAEP](./20230407-flexible-managed-k8s-endpoints.md) have been implemented there is the option to supply only the control plane, but you still cannot supply the same resource for both. The responsibilities between the CAPI control plane and infrastructure are blurred with a managed Kubernetes service like AKS or EKS. For example, when you create a EKS control plane in AWS it also creates infrastructure that CAPI would traditionally view as the responsibility of the cluster “infrastructure provider”. @@ -118,7 +118,7 @@ A good example here is the API server load balancer: - Enforce the Managed Kubernetes recommendations as a requirement for Cluster API providers when they implement Managed Kubernetes. - If providers that have already implemented Managed Kubernetes and would like guidance on if/how they could move to be aligned with the recommendations of this proposal then discussions should be facilitated. - Provide advice in this proposal on how to refactor the existing implementations of managed Kubernetes in CAPA & CAPZ. -- Propose a new architecture or API changes to CAPI for managed Kubernetes. This has been covered by the [Contract Changes to Support Managed Kubernetes CAEP](./20230407-managed-k8s-capi-contract-changes). +- Propose a new architecture or API changes to CAPI for managed Kubernetes. This has been covered by the [Contract Changes to Support Managed Kubernetes CAEP](./20230407-flexible-managed-k8s-endpoints.md). - Be a concrete design for the GKE implementation in Cluster API Provider GCP (CAPG). - Recommend how Managed Kubernetes services would leverage CAPI internally to run their offer. @@ -247,7 +247,7 @@ The following section discusses different API implementation options along with #### Option 1: Two kinds with a ControlPlane and a pass-through InfraCluster -**This option will be no longer needed when the changes documented in the [Contract Changes to Support Managed Kubernetes CAEP](./20230407-managed-k8s-capi-contract-changes) have been implemented as option 2 can be used for a simpler solution** +**This option will be no longer needed when the changes documented in the [Contract Changes to Support Managed Kubernetes CAEP](./20230407-flexible-managed-k8s-endpoints.md) have been implemented as option 2 can be used for a simpler solution** This option introduces 2 new resource kinds: @@ -304,7 +304,7 @@ type GCPManagedClusterSpec struct { #### Option 2: Just a ControlPlane kind and no InfraCluster -**This option is enabled when the changes documented in the [Contract Changes to Support Managed Kubernetes CAEP](./20230407-managed-k8s-capi-contract-changes) have been implemented.** +**This option is enabled when the changes documented in the [Contract Changes to Support Managed Kubernetes CAEP](./20230407-flexible-managed-k8s-endpoints.md) have been implemented.** This option introduces 1 new resource kind: @@ -400,7 +400,7 @@ type GCPManagedClusterSpec struct { } ``` -When the changes documented in the [Contract Changes to Support Managed Kubernetes CAEP](./20230407-managed-k8s-capi-contract-changes) have been implemented there is the option to return the control plane endpoint directly from the ControlPlane instead of passing it via the Infracluster. +When the changes documented in the [Contract Changes to Support Managed Kubernetes CAEP](./20230407-flexible-managed-k8s-endpoints.md) have been implemented there is the option to return the control plane endpoint directly from the ControlPlane instead of passing it via the Infracluster. **Pros** @@ -429,7 +429,7 @@ The reasons for this recommendation are as follows: If the managed Kubernetes services does not require any base infrastructure to be setup before creating the instance of the service then option 2 (Just a ControlPlane kind (and no InfraCluster) is the recommendation. -This recommendation assumes that the changes documented in the [Contract Changes to Support Managed Kubernetes CAEP](./20230407-managed-k8s-capi-contract-changes) have been implemented. Until that point option 1 (Two kinds with a ControlPlane and a pass-through InfraCluster) will have to be used. +This recommendation assumes that the changes documented in the [Contract Changes to Support Managed Kubernetes CAEP](./20230407-flexible-managed-k8s-endpoints.md) have been implemented. Until that point option 1 (Two kinds with a ControlPlane and a pass-through InfraCluster) will have to be used. ### Existing Managed Kubernetes Implementations @@ -484,7 +484,7 @@ Some of the areas of change (this is not an exhaustive list): - Update the [Provider contracts documentation](../book/src/developer/providers/contracts.md) to state that the same kind should not be used to satisfy 2 different provider contracts. - Update the [Cluster Infrastructure documentation](../book/src/developer/providers/cluster-infrastructure.md) to provide guidance on how to populate the `controlPlaneEndpoint` in the scenario where the control plane creates the api server load balancer. We should include sample code. - Update the [Control Plane Controller](../book/src/developer/architecture/controllers/control-plane.md) diagram for managed k8s services case. The Control Plane reconcile needs to start when `InfrastructureReady` is true. -- Updates based on the changes documented in the [Contract Changes to Support Managed Kubernetes CAEP](./20230407-managed-k8s-capi-contract-changes). +- Updates based on the changes documented in the [Contract Changes to Support Managed Kubernetes CAEP](./20230407-flexible-managed-k8s-endpoints.md). ## Other Considerations for CAPI @@ -606,4 +606,4 @@ As mentioned in the goals section, it is up to providers with existing implement - [x] 03/17/2022: Compile a Google Doc following the CAEP template ([link](https://docs.google.com/document/d/1dMN4-KppBkA51sxXPSQhYpqETp2AG_kHzByXTmznxFA/edit?usp=sharing)) - [x] 04/20/2022: Present proposal at a community meeting - [x] 07/27/2022: Move the proposal to a PR in CAPI repo -- [x] 06/15/2023: Updates as a result of the [Contract Changes to Support Managed Kubernetes CAEP](./20230407-managed-k8s-capi-contract-changes.md) and also updates as a result of the current state of managed k8s in CAPI. +- [x] 06/15/2023: Updates as a result of the [Contract Changes to Support Managed Kubernetes CAEP](./20230407-flexible-managed-k8s-endpoints.md) and also updates as a result of the current state of managed k8s in CAPI. diff --git a/docs/proposals/20230407-managed-k8s-capi-contract-changes.md b/docs/proposals/20230407-flexible-managed-k8s-endpoints.md similarity index 56% rename from docs/proposals/20230407-managed-k8s-capi-contract-changes.md rename to docs/proposals/20230407-flexible-managed-k8s-endpoints.md index 6b69f69ea382..faeb3dc16a3f 100644 --- a/docs/proposals/20230407-managed-k8s-capi-contract-changes.md +++ b/docs/proposals/20230407-flexible-managed-k8s-endpoints.md @@ -2,8 +2,7 @@ **Table of Contents** *generated with [DocToc](https://github.com/thlorenz/doctoc)* -- [Contract Changes to Support Managed Kubernetes](#contract-changes-to-support-managed-kubernetes) - - [Table of Contents](#table-of-contents) +- [Flexible Managed Kubernetes Endpoints](#flexible-managed-kubernetes-endpoints) - [Glossary](#glossary) - [Summary](#summary) - [Motivation](#motivation) @@ -19,8 +18,14 @@ - [Story 5](#story-5) - [Story 6](#story-6) - [Story 7](#story-7) - - [Current State of Managed Kubernetes in CAPI](#current-state-of-managed-kubernetes-in-capi) - - [EKS in CAPA](#eks-in-capa) + - [Design](#design) + - [Core Cluster API changes](#core-cluster-api-changes) + - [Infra Providers API changes](#infra-providers-api-changes) + - [Core Cluster API Controllers changes](#core-cluster-api-controllers-changes) + - [Provider controller changes](#provider-controller-changes) + - [Guidelines for infra providers implementation](#guidelines-for-infra-providers-implementation) + - [Background work](#background-work) + - [EKS in CAPA](#eks-in-capa) - [AKS in CAPZ](#aks-in-capz) - [GKE in CAPG](#gke-in-capg) - [Learnings from original Proposal: Two kinds with a Managed Control Plane & Managed Infra Cluster adhering to the current CAPI contracts](#learnings-from-original-proposal-two-kinds-with-a-managed-control-plane--managed-infra-cluster-adhering-to-the-current-capi-contracts) @@ -34,7 +39,7 @@ --- -title: Contract Changes to Support Managed Kubernetes +title: Flexible Managed Kubernetes Endpoints authors: - "@jackfrancis" reviewers: @@ -54,43 +59,7 @@ see-also: - "/docs/proposals/20220725-managed-kubernetes.md" --- -# Contract Changes to Support Managed Kubernetes - -## Table of Contents - -A table of contents is helpful for quickly jumping to sections of a proposal and for highlighting -any additional information provided beyond the standard proposal template. -[Tools for generating](https://github.com/ekalinin/github-markdown-toc) a table of contents from markdown are available. - -- [Contract Changes to Support Managed Kubernetes](#contract-changes-to-support-managed-kubernetes) - - [Table of Contents](#table-of-contents) - - [Glossary](#glossary) - - [Summary](#summary) - - [Motivation](#motivation) - - [Goals](#goals) - - [Non-Goals](#non-goals) - - [Future work](#future-work) - - [Proposal](#proposal) - - [User Stories](#user-stories) - - [Story 1](#story-1) - - [Story 2](#story-2) - - [Requirements (Optional)](#requirements-optional) - - [Functional Requirements](#functional-requirements) - - [FR1](#fr1) - - [FR2](#fr2) - - [Non-Functional Requirements](#non-functional-requirements) - - [NFR1](#nfr1) - - [NFR2](#nfr2) - - [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints) - - [Security Model](#security-model) - - [Risks and Mitigations](#risks-and-mitigations) - - [Alternatives](#alternatives) - - [Upgrade Strategy](#upgrade-strategy) - - [Additional Details](#additional-details) - - [Test Plan [optional]](#test-plan-optional) - - [Graduation Criteria [optional]](#graduation-criteria-optional) - - [Version Skew Strategy [optional]](#version-skew-strategy-optional) - - [Implementation History](#implementation-history) +# Flexible Managed Kubernetes Endpoints ## Glossary @@ -98,31 +67,27 @@ Refer to the [Cluster API Book Glossary](https://cluster-api.sigs.k8s.io/referen The following terms will be used in this document. -- `Cluster` - - When we say `Cluster` we refer to any provider's infra-specific implementation of the Cluster API `Cluster` resource spec. When you see ``, interpret that as a placeholder for any provider implementation. Some concrete examples of provider infra cluster implementations are Azure's CAPZ provider (e.g., `AzureCluster` and `AzureManagedCluster`), AWS's CAPA provider (e.g., `AWSCluster` and `AWSManagedCluster`), and Google Cloud's CAPG provider (e.g., `GCPCluster` and `GCPManagedCluster`). Rather than referencing any one of the preceding actual implementations of infra cluster resources, we prefer to generalize to `Cluster` so that we don't suggest any provider-specific bias informing our conclusions. -- `ControlPlane` - - When we say `ControlPlane` we refer to any provider's infra-specific implementation of the a Kubernetes cluster's control plane. When you see ``, interpret that as a placeholder for any provider implementation. Some concrete examples of provider infra control plane implementations are Azure's CAPZ provider (e.g., `AzureManagedControlPlane`), AWS's CAPA provider (e.g., `AWSManagedControlPlane`), and Google Cloud's CAPG provider (e.g., `GCPManagedControlPlane`). - Managed Kubernetes - - Managed Kubernetes refers to any Kubernetes Cluster provisioning and maintenance platform that is exposed by a service API. For example: [EKS](https://aws.amazon.com/eks/), [OKE](https://www.oracle.com/cloud/cloud-native/container-engine-kubernetes/), [AKS](https://azure.microsoft.com/en-us/products/kubernetes-service), [GKE](https://cloud.google.com/kubernetes-engine), [IBM Cloud Kubernetes Service](https://www.ibm.com/cloud/kubernetes-service), [DOKS](https://www.digitalocean.com/products/kubernetes), and many more throughout the Kubernetes Cloud Native ecosystem. + - Managed Kubernetes refers to any Kubernetes Cluster provisioning and maintenance abstraction, usually exposed as an API, that is natively available in a Cloud provider. For example: [EKS](https://aws.amazon.com/eks/), [OKE](https://www.oracle.com/cloud/cloud-native/container-engine-kubernetes/), [AKS](https://azure.microsoft.com/en-us/products/kubernetes-service), [GKE](https://cloud.google.com/kubernetes-engine), [IBM Cloud Kubernetes Service](https://www.ibm.com/cloud/kubernetes-service), [DOKS](https://www.digitalocean.com/products/kubernetes), and many more throughout the Kubernetes Cloud Native ecosystem. +- `ControlPlane Provider` + - When we say `ControlPlane Provider` we refer to a solution that implements a solution for the management of a Kubernetes [control plane](https://kubernetes.io/docs/concepts/#kubernetes-control-plane) according to the Cluster API contract. Please note that in the context of managed Kubernetes, the `ControlPlane Provider` usually wraps the corresponding abstraction for a specific Cloud provider. Concrete example for Microsoft Azure is the `AzureManagedControlPlane`, for AWS the `AWSManagedControlPlane`, for Google the `GCPManagedControlPlane` etc. - _Kubernetes Cluster Infrastructure_ - - When we refer to _Kubernetes Cluster Infrastructure_ we aim to distinguish required environmental infrastructure (e.g., cloud virtual networks) in which a Kubernetes cluster resides as a "set of child resources" from the Kubernetes cluster resources themselves (e.g., virtual machines that underlie nodes, managed by Cluster API). Sometimes this is referred to as "BYO Infrastructure"; essentially, we are talking about **infrastructure that supports a Kubernetes cluster, but is not actively managed by Cluster API**. As we will see, this boundary is different when discussing Managed Kubernetes: more infrastructure resources are not managed by Cluster API when running Managed Kubernetes. + - When we refer to _Kubernetes Cluster Infrastructure_ (abbr. _Cluster Infrastructure_) we refer to the **infrastructure that supports a Kubernetes cluster**, like e.g. VPC, security groups, load balancers etc. Please note that in the context of Managed Kubernetes some of those components are going to be provided by the corresponding abstraction for a specific Cloud provider (EKS, OKE, AKS etc), and thus Cluster API should not take care of managing a subset or all those components. +- `Cluster` + - When we say `Cluster` we refer to any provider that provides Kubernetes Cluster Infrastructure for a specific Cloud provider. Concrete example for Microsoft Azure is the `AzureCluster` and the `AzureManagedCluster`, for AWS the `AWSCluster` and the `AWSManagedCluster`, for Google Cloud the `GCPCluster` and the `GCPManagedCluster`). - e.g. - This just means "For example:"! ## Summary -We propose to relax the `Cluster` resource Cluster API contract so that the `ControlPlane` resource may authoritatively express the control plane endpoint in order to better represent real workflows and reduce the complexity for provider implementers. - -By relaxing the `Cluster` contract with respect to the control plane endpoint we can also now provide the opportunity to make the `Cluster` resource fully optional. This additional flexibility will allow Cluster API providers to better represent various Managed Kubernetes service offerings: +This proposal aims to address the lesson learned by running Managed Kubernetes solution on top of Cluster API, and make this use case simpler and more straight forward both for Cluster API users and for the maintainers of the Cluster API providers. -- Cluster Infra is entirely abstracted away from the Managed Kubernetes user -- Cluster Infra is exposed to the Managed Kubernetes user, but managed by the Managed Kubernetes service -- Cluster Infra is provided by the user (BYO) to support the Managed Kubernetes service +More specifically we would like to introduce first class support for two scenarios: -In order to support the above, we propose that the API Server endpoint reference can also originate from the `ControlPlane` resource, and not the `Cluster` resource. These changes will introduce two new possible implementation options for providers implementing Managed Kubernetes in Cluster API: +- Permit omitting the `Cluster` entirely, thus making it simpler to use with Cluster API all the Managed Kubernetes implementations which do not require any additional Kubernetes Cluster Infrastructure (network settings, security groups, etc) on top of what is provided out of the box by the managed Kubernetes primitive offered by a Cloud provider. +- Allow the `ControlPlane Provider` component to take ownership of the responsibility of creating the control plane endpoint, thus making it simpler to use with Cluster API all the Managed Kubernetes implementations which are taking care out of the box of this piece of Cluster Infrastructure. -1. A Managed Kubernetes cluster solution whose configuration surface area is expressed exclusively in a `ControlPlane` resource (no `Cluster` resource). -2. A Managed Kubernetes cluster solution whose configuration surface area comprises both a `Cluster` and a `ControlPlane` resource, with `ControlPlane` being solely responsible for configuring the API Server endpoint (instead of the API Server endpoint being configured via the `Cluster`). +The above capabilities can be used alone or in combination depending on the requirements of a specific Managed Kubernetes or on the specific architecture/set of Cloud components being implemented. ## Motivation @@ -130,18 +95,19 @@ The implementation of Managed Kubernetes scenarios by Cluster API providers occu One particular part of the existing Cluster API surface area that is inconsistent with most Managed Kubernetes user experiences is the accounting of the [Kubernetes API server](https://kubernetes.io/docs/concepts/overview/components/#kube-apiserver). In the canonical "self-managed" user story that Cluster API addresses, it is the provider implementation of Cluster API (e.g., CAPA) that is responsible for scaffolding the necessary _Kubernetes Cluster Infrastructure_ that is required in order to create the Kubernetes API server (e.g., a Load Balancer and a public IP address). This provider responsibility is declared in the `Cluster` resource, and carried out via its controllers; and then finally this reconciliation is synchronized with the parent `Cluster` Cluster API resource. -Because there exist Managed Kubernetes scenarios that handle all _Kubernetes Cluster Infrastructure_ responsibilities themselves, Cluster API's requirement of a `Cluster` resource leads to weird implementation decisions, because in these scenarios there is no actual work for a Cluster API provider to do to scaffold _Kubernetes Cluster Infrastructure_. +Because there exist Managed Kubernetes scenarios that handle a subset or all _Kubernetes Cluster Infrastructure_ responsibilities themselves, Cluster API's requirement of a `Cluster` resource leads to undesirable implementation decisions, because in these scenarios there is no actual work for a Cluster API provider to do to scaffold _Kubernetes Cluster Infrastructure_. + +Finally, for Managed Kubernetes scenarios that _do_ include additional, user-exposed infra (e.g., GKE and EKS as of this writing), we want to make it easier to account for the representation of the Managed Kubernetes API server endpoint, which is not always best owned by a `Cluster` resource. ### Goals - Build upon [the existing Cluster API Managed Kubernetes proposal](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220725-managed-kubernetes.md). Any net new recommendations and/or proposals will be a continuation of the existing proposal, and consistent with its original conclusions. -- Make `Cluster` resources optional. -- Enable API Server endpoint reporting from a provider's Control Plane resource rather than from its `Cluster` resource. +- Identify and document API changes and controllers changes required to omit the `Cluster` entirely, where this is applicable. +- Identify and document API changes and controllers changes required to allow the `ControlPlane Provider` component to take ownership of the responsibility of creating the control plane endpoint. - Ensure any changes to the current behavioral contract are backwards-compatible. ### Non-Goals -- Changes to existing Cluster API CRDs. - Introduce new "Managed Kubernetes" data types in Cluster API. - Invalidate [the existing Cluster API Managed Kubernetes proposal and concluding recommendations](https://github.com/kubernetes-sigs/cluster-api/blob/main/docs/proposals/20220725-managed-kubernetes.md). @@ -163,7 +129,7 @@ As a cluster operator, I want to be able to provision both "unmanaged" and "mana #### Story 3 -As a Cluster API provider implementor, I want to be able to return the control plane endpoint via the ControlPlane custom resource, so that it fits naturally with how I create an instance of the service provider's Managed Kubernetes which creates the endpoint, and so i don't have to pass through the value via another custom resource. +As a Cluster API provider implementor, I want to be able to return the control plane endpoint created by the `ControlPlane Provider`, so that it fits naturally with how most of the native Managed Kubernetes implementations works. #### Story 4 @@ -181,9 +147,150 @@ As a cluster operator, I want to use Cluster API to provision and manage the lif As a service provider I want to be able to offer Managed Kubernetes clusters by using CAPI referencing my own managed control plane implementation that satisfies Cluster API contracts. -### Current State of Managed Kubernetes in CAPI +### Design + +Below we are documenting API changes and controllers changes required to omit the `Cluster` entirely and to allow the `ControlPlane Provider` component to take ownership of the responsibility of creating the control plane endpoint. + +#### Core Cluster API changes + +This proposal does not introduce any breaking changes for the existing "core" API. More specifically: + +The existing Cluster API types are already able to omit the `Cluster`: + +- The `infrastructureRef` field on the Cluster object is already a pointer and thus it could be set to nil, and in fact we are already creating Clusters without `infrastructureRef` when we use a cluster class). +- The `infrastructure.Ref` field on the ClusterClass objects already a pointer and thus it could be set to nil, but in this case it is required to change the validation webhook to allow the user to not specify it; on top of that, when validating inline patches, we should reject patches targeting the infrastructure template objects if not specified. + +In order to allow the `ControlPlane Provider` component to take ownership of the responsibility of creating the control plane endpoint we are going to introduce a new `ClusterEndpoint` CRD, below some example: + +```yaml +apiVersion: cluster.x-k8s.io/v1beta1 +kind: ClusterEndpoint +spec: + host: "name-1234567890.region.elb.amazonaws.com" + port: 1234 + type: ExternalControlPlaneEndpoint +``` + +```yaml +apiVersion: cluster.x-k8s.io/v1beta1 +kind: ClusterEndpoint +spec: + host: 10.40.85.102 + port: 1234 + type: ExternalControlPlaneEndpoint +``` + +This is how the type specification would look: + +```go +// ClusterEndpoint represents a reachable Kubernetes API endpoint serving a particular cluster function. +type ClusterEndpoint struct { + // The Host is the DNS record or the IP address that the endpoint is reachable on. + Host string `json:"host"` + + // The port on which the endpoint is serving. + Port int32 `json:"port"` + + // The type of endpoint represents the function that this endpoint serves on the cluster. + Type string `json:"type,omitempty"` +} +``` + +This type will be the primary way for both `Cluster` or `ControlPlane Provider` to communicate the ClusterAddress to "core" Cluster API controllers. + +The `Cluster` object which is currently using the `spec.controlPlaneEndpoint` for the same scope will continue to work because "core" Cluster API controllers will continue to recognize when this field is set and take care of generating the `ClusterEndpoint` automatically; however this mechanism should be considered as a temporary machinery to migrate to the new CRD, and it will be removed in future versions of Cluster API. In addition, once the legacy behavior is removed, we will deprecate and eventually remove the `spec.controlPlaneEndpoint` field from the `Cluster` CustomResourceDefinition, and recommend that providers do the same for their `Cluster` CustomResourceDefinitions as well. + +Notes: + +- The `type` field has been introduced to allow to extend the usage of this CRD to address https://github.com/kubernetes-sigs/cluster-api/issues/5295 in a future iterations +- The current implementation originates from the `Cluster.spec.ControlPlaneEndpoint` field, which defines the info we need for this proposal; but in future iterations we might consider to support more addressed or more ports for each ClusterEndpoint, similarly what is implemented in the core v1 Endpoint type. + +#### Infra Providers API changes + +This proposal does not introduce any breaking changes for the provider's API. + +However, Infra providers MUST deprecate the `spec.controlPlaneEndpoint` field in their `Cluster` so they can remove it in a future API version. + +#### Core Cluster API Controllers changes + +- All the controllers working with ClusterClass objects must take into account that the `infrastructure.Ref` field could be omitted; most notably: + - The ClusterClass controller must ignore nil `infrastructure.Ref` fields while adding owner references to all the objects referenced by a ClusterClass. + - The Topology controller must skip the generation of the `Cluster` objects when the `infrastructure.Ref` field in a ClusterClass is empty. + +- All the controllers working with Cluster objects must take into account that the `infrastructureRef` field could be omitted; most notably: + - The Cluster controller must use skip reconciling this external reference when the `infrastructureRef` is missing; also, the `status.InfrastructureReady` field must be automatically set to true in this case. + +- The Cluster controller must reconcile the new `ClusterEndpoint` CR. Please note that: + - The value from the `ClusterEndpoint` CRD must surface on the `spec.ControlPlaneEndpoint` field on the `Cluster` object. + - If both are present, the value from the `ClusterEndpoint` CRD must take precedence on the value from `Cluster` objects still using the `spec.controlPlaneEndpoint`. + +- The Cluster controller must implement the temporary machinery to migrate to the new CRD existing Clusters and to deal with `Cluster` objects still using the `spec.controlPlaneEndpoint` field as a way to communicate the ClusterAddress to "core" Cluster API controllers: + - If there is the `spec.ControlPlaneEndpoint` on the `Cluster` object but not a corresponding `ClusterEndpoint` CR, the CR must be created. + +#### Provider controller changes + +- All the `Cluster` controller who are responsible to create a control plane endpoint + - As soon as the `spec.controlPlaneEndpoint` field in the `Cluster` object will removed, the `Cluster` controller must instead create a `ClusterEndpoint` CR to communicate the control plane endpoint to the Cluster API core controllers + - NOTE: technically it is possible to start creating the `ClusterEndpoint` CR *before* the removal of the `spec.controlPlaneEndpoint` field, because the new CR will take precedence on the value read from the field, but this is up to the infra provider maintainers. + - The `ClusterEndpoint` CR must have an owner reference to the `Cluster` object from which it is originated. + +- All the `ControlPlane Provider` controller who are responsible to create a control plane endpoint + - Must stop to wait for the `spec.ControlPlaneEndpoint` field on the `Cluster` object to be set before starting to provision the control plane. + - As soon as the Managed Kubernetes Service-provided control plane endpoint is available, the controller must create a `ClusterEndpoint` CR to communicate this to the control plane endpoint to the Cluster API core controllers + - The `ClusterEndpoint` CR must have an owner reference to the `ControlPlane` object from which is originated. + +### Guidelines for infra providers implementation + +Let's consider following scenarios for an hypothetical `cluster-api-provider-foo` infra provider: + +_Scenario 1._ + +If the `Foo` cloud provider has a `FKS` managed Kubernetes offering that is taking care of _the entire Kubernetes Cluster infrastructure_, the maintainers of the `cluster-api-provider-foo` provider: +- Must not implement a `FKSCluster` CRD and the corresponding `FKSClusterTemplate` CRD (nor the related controllers) +- Must implement a `FKRControlControlplane provider`, a `FKRControlControlplane` CRD, the corresponding `FKRControlControlplane` and related controllers +- The `FKRControlControlplane` controller: + - Must not wait for `spec.ControlPlaneEndpoint` field on the `Cluster` object to be set before starting to provision the `FKS` managed Kubernetes instance. + - As soon as the control plane endpoint is available, Must create a `ClusterEndpoint` CR to communicate the control plane endpoint to the Cluster API core controllers; the `ClusterEndpoint` CR must have an owner reference to the `FKRControlControlplane` object from which is originated. + - Must set the `status.Ready` field on the `FKRControlControlplane` object when the provisioning is complete + +_Scenario 2._ + +If the `Foo` cloud provider has a `FKS` managed Kubernetes offering that is taking care of _only of a subset of the Kubernetes Cluster infrastructure_, or it is required to provision some additional pieces of infrastructure on top of what provisioned out of the box, e.g. a SSH bastion host, the maintainers of the `cluster-api-provider-foo` provider: +- Must implement a `FKSCluster` CRD and the corresponding `FKSClusterTemplate` CRD and the related controllers + - The `FKSCluster` controller + - Must create only the additional piece of the _Kubernetes Cluster infrastructure_ not provisioned by the `FKS` managed Kubernetes instance (in this example a SSH bastion host) + - Must not create a `ClusterEndpoint` CR (nor set the `spec.controlPlaneEndpoint` field in the `FKSCluster` object), because provisioning the control plane endpoint is not responsibility of this controller. + - Must set the `status.Ready` field on the `FKSCluster` object when the provisioning is complete +- Must implement a `FKRControlControlplane provider`, a `FKRControlControlplane` CRD, the corresponding `FKRControlControlplane` and related controllers + - The `FKRControlControlplane` controller: + - Must wait for `status.InfrastructureReady` field on the `Cluster` object to be set to true before starting to provision the control plane. + - Must not wait for `spec.ControlPlaneEndpoint` field on the `Cluster` object to be set before starting to provision the control plane. + - As soon as the control plane endpoint is available, Must create a `ClusterEndpoint` CR to communicate the control plane endpoint to the Cluster API core controllers; the `ClusterEndpoint` CR must have an owner reference to the `FKRControlControlplane` object from which is originated. + - Must set the `status.Ready` field on the `FKRControlControlplane` object when the provisioning is complete + +_Scenario 3._ + +If the `Foo` cloud provider has a `FKS` managed Kubernetes offering that is not taking care of the control plane endpoint e.g. because it requires an existing `FooElasticIP`, a `FooElacticLoadBalancer` to be provisioned before creating the `FKS` managed Kubernetes cluster, the maintainers of the `cluster-api-provider-foo` provider: +- Must implement a `FKSCluster` CRD and the corresponding `FKSClusterTemplate` CRD and the related controllers; those controllers must create a `ClusterEndpoint` CR as soon as the control plane endpoint is available + - The `FKSCluster` controller + - Must create only the additional piece of the _Kubernetes Cluster infrastructure_ not provisioned by the `FKS` managed Kubernetes instance (in this example `FooElasticIP`, a `FooElacticLoadBalancer`) + - As soon as the control plane endpoint is available, Must create a `ClusterEndpoint` CR; the `ClusterEndpoint` CR must have an owner reference to the `FKSCluster` object from which is originated. + - Must set the `status.Ready` field on the `FKSCluster` object when the provisioning is complete +- Must implement a `FKRControlControlplane provider`, a `FKRControlControlplane` CRD, the corresponding `FKRControlControlplane` and related controllers + - The `FKRControlControlplane` controller: + - Must wait for `status.InfrastructureReady` field on the `Cluster` object to be set to true before starting to provision the `FKS` managed Kubernetes instance. + - Must wait for `spec.ControlPlaneEndpoint` field on the `Cluster` object to be set before starting to provision the `FKS` managed Kubernetes instance. + - Must set the `status.Ready` field on the `FKRControlControlplane` object when the provisioning is complete + +Please note that this scenario is equivalent to what is implemented for a non managed Kubernetes `FooCluster`, backed by Cluster API managed `FooMachines`, with the only difference that in this case it possible to rely on `KCP` as `ControlControlplane provider`, and thus point 2 of the above list do not apply. + +## Background work + +This proposal builds on top of the awesome research work of the Managed Kubernetes working group, and it is a result of a huge work of a team of passionate Cluster API contributors. + +Below a summary of the main evidences / alternative considered during this this work. -#### EKS in CAPA +### EKS in CAPA - [Docs](https://cluster-api-aws.sigs.k8s.io/topics/eks/index.html) - Feature Status: GA