Skip to content

Commit

Permalink
CAEP: Make Cluster Infra Resource Optional
Browse files Browse the repository at this point in the history
  • Loading branch information
jackfrancis committed Apr 10, 2023
1 parent 7ee0cd4 commit 2723963
Showing 1 changed file with 207 additions and 0 deletions.
207 changes: 207 additions & 0 deletions docs/proposals/20230407-optional-cluster-infra-resource.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,207 @@
---
title: Make Cluster Infra Resource Optional
authors:
- "@jackfrancis"
reviewers:
- "@richardcase"
- "@pydctw"
- "@mtougeron"
- “@CecileRobertMichon”
- “@fabriziopandini”
- “@sbueringer”
- "@killianmuldoon"
- "@mboersma"
- "@nojnhuh"
creation-date: 2023-04-07
last-updated: 2023-04-07
status: provisional
see-also:
- "/docs/proposals/20220725-managed-kubernetes.md"
---

# Make Cluster Infra Resource Optional

## Table of Contents

A table of contents is helpful for quickly jumping to sections of a proposal and for highlighting
any additional information provided beyond the standard proposal template.
[Tools for generating](https://github.com/ekalinin/github-markdown-toc) a table of contents from markdown are available.

- [Make Cluster Infra Resource Optional](#make-cluster-infra-resource-optional)
- [Table of Contents](#table-of-contents)
- [Glossary](#glossary)
- [Summary](#summary)
- [Motivation](#motivation)
- [Goals](#goals)
- [Non-Goals](#non-goals)
- [Future work](#future-work)
- [Proposal](#proposal)
- [User Stories](#user-stories)
- [Story 1](#story-1)
- [Story 2](#story-2)
- [Requirements (Optional)](#requirements-optional)
- [Functional Requirements](#functional-requirements)
- [FR1](#fr1)
- [FR2](#fr2)
- [Non-Functional Requirements](#non-functional-requirements)
- [NFR1](#nfr1)
- [NFR2](#nfr2)
- [Implementation Details/Notes/Constraints](#implementation-detailsnotesconstraints)
- [Security Model](#security-model)
- [Risks and Mitigations](#risks-and-mitigations)
- [Alternatives](#alternatives)
- [Upgrade Strategy](#upgrade-strategy)
- [Additional Details](#additional-details)
- [Test Plan [optional]](#test-plan-optional)
- [Graduation Criteria [optional]](#graduation-criteria-optional)
- [Version Skew Strategy [optional]](#version-skew-strategy-optional)
- [Implementation History](#implementation-history)

## Glossary

Refer to the [Cluster API Book Glossary](https://cluster-api.sigs.k8s.io/reference/glossary.html).

The following terms will be used in this document.

- `<Infra>Cluster`
- When we say `<Infra>Cluster` we refer to any provider's infra-specific implementation of the Cluster API `Cluster` resource spec. When you see `<Infra>`, interpret that as a placeholder for any provider implementation. Some concrete examples of provider infra cluster implementations are Azure's CAPZ provider (e.g., `AzureCluster` and `AzureManagedCluster`), AWS's CAPA provider (e.g., `AWSCluster` and `AWSManagedCluster`), and Google Cloud's CAPG provider (e.g., `GCPCluster` and `GCPManagedCluster`). Rather than referencing any one of the preceding actual implementations of infra cluster resources, we prefer to generalize to `<Infra>Cluster` so that we don't suggest any provider-specific bias informing our conclusions.
- Managed Kubernetes
- Managed Kubernetes refers to any Kubernetes Cluster provisioning and maintenance platform that is exposed by a service API. For example: [EKS](https://aws.amazon.com/eks/), [OKE](https://www.oracle.com/cloud/cloud-native/container-engine-kubernetes/), [AKS](https://azure.microsoft.com/en-us/products/kubernetes-service), [GKE](https://cloud.google.com/kubernetes-engine), [IBM Cloud Kubernetes Service](https://www.ibm.com/cloud/kubernetes-service), [DOKS](https://www.digitalocean.com/products/kubernetes), and many more throughout the Kubernetes Cloud Native ecosystem.
- _Kubernetes Cluster Infrastructure_
- When we refer to _Kubernetes Cluster Infrastructure_ we aim to distinguish required environmental infrastructure (e.g., cloud virtual networks) in which a Kubernetes cluster resides as a "set of child resources" from the Kubernetes cluster resources themselves (e.g., virtual machines that underlie nodes, managed by Cluster API). Sometimes this is referred to as "BYO Infrastructure"; essentially, we are talking about **infrastructure that supports a Kubernetes cluster, but is not actively managed by Cluster API**. As we will see, this boundary is different when discussing Managed Kubernetes: more infrastructure resources are not managed by Cluster API when running Managed Kubernetes.
- e.g.
- This just means "For example:"!

## Summary

We propose to make provider `<Infra>Cluster` resources optional in order to better represent Managed Kubernetes scenarios where all _Kubernetes Cluster Infrastructure_ is managed by the service provider, and not by Cluster API.

## Motivation

The implementation of Managed Kubernetes scenarios by Cluster API providers occurred after the architectural design of Cluster API, and thus that design process did not consider these Managed Kubernetes scenarios as a user story. In practice, Cluster API's specification has allowed Managed Kubernetes solutions to emerge that aid running fleets of clusters at scale, with CAPA's `AWSManagedCluster` and `AzureManagedCluster` being notable examples. However, because these Managed Kubernetes solutions arrived after the Cluster API contract was defined, providers have not settled on a consistent rendering of how a "Service-Managed Kubernetes" specification fits into a "Cluster API-Managed Kubernetes" surface area.

One particular part of the existing Cluster API surface area that is inconsistent with most Managed Kubernetes user experiences is the accounting of the [Kubernetes API server](https://kubernetes.io/docs/concepts/overview/components/#kube-apiserver). In the canonical "self-managed" user story that Cluster API addresses, it is the provider implementation of Cluster API (e.g., CAPA) that is responsible for scaffolding the necessary _Kubernetes Cluster Infrastructure_ that is required in order to create the Kubernetes API server (e.g., a Load Balancer and a public IP address). This provider responsibility is declared in the `<Infra>Cluster` resource, and carried out via its controllers; and then finally this reconciliation is synchronized with the parent `Cluster` Cluster API resource.

Because there exist Managed Kubernetes scenarios that handle all _Kubernetes Cluster Infrastructure_ responsibilities themselves, Cluster API's requirement of a `<Infra>Cluster` resource leads to weird implementation decisions, because in these scenarios there is no actual work for a Cluster API provider to do to scaffold _Kubernetes Cluster Infrastructure_.

### Goals

- Make `<Infra>Cluster` resources optional.
- Enable API Server endpoint reporting from a provider's Control Plane resource rather than from its `<Infra>Cluster` resource.
- Ensure any changes to the current behavioral contract are backwards-compatible.

### Non-Goals

- Change the Cluster API data type specification.
- Introduce new "Managed Kubernetes" data types in Cluster API.

### Future Work

- Detailed documentation that references the flavors of Managed Kubernetes scenarios and how they can be implemented in Cluster API, with provider examples.

## Proposal

### User Stories

- Detail the things that people will be able to do if this proposal is implemented.
- Include as much detail as possible so that people can understand the "how" of the system.
- The goal here is to make this feel real for users without getting bogged down.

#### Story 1

As a cluster operator, I want to use Cluster API to provision and manage the lifecycle of a control plane that utilizes my service provider's managed Kubernetes control plane (i.e. EKS, AKS, GKE), so that I don’t have to worry about the management/provisioning of control plane nodes, and so I can take advantage of any value add services offered by my cloud provider.

#### Story 2

As a cluster operator, I want to be able to provision both "unmanaged" and "managed" Kubernetes clusters from the same management cluster, so that I can support different requirements and use cases as needed whilst using a single operating model.

#### Story 3

As a Cluster API provider developer, I want guidance on how to incorporate a managed Kubernetes service into my provider, so that its usage is compatible with Cluster API architecture/features and its usage is consistant with other providers.

#### Story 4

As a Cluster API provider developer, I want to enable the ClusterClass feature for a Managed Kubernetes service, so that users can take advantage of an improved UX with ClusterClass-based clusters.

#### Story 5

As a cluster operator, I want to use Cluster API to provision and manage the lifecycle of worker nodes that utilizes my cloud providers' managed instances (if they support them), so that I don't have to worry about the management of these instances.

#### Story 6

As a service provider I want to be able to offer Managed Kubernetes clusters by using CAPI referencing my own managed control plane implementation that satisfies Cluster API contracts.

### Requirements (Optional)

Some authors may wish to use requirements in addition to user stories.
Technical requirements should derived from user stories, and provide a trace from
use case to design, implementation and test case. Requirements can be prioritised
using the MoSCoW (MUST, SHOULD, COULD, WON'T) criteria.

The FR and NFR notation is intended to be used as cross-references across a CAEP.

The difference between goals and requirements is that between an executive summary
and the body of a document. Each requirement should be in support of a goal,
but narrowly scoped in a way that is verifiable or ideally - testable.

#### Functional Requirements

TODO

##### FR1

TODO

##### FR2

TODO

#### Non-Functional Requirements

TODO

##### NFR1

TODO

##### NFR2

TODO

### Implementation Details/Notes/Constraints

- TODO

### Security Model

TODO

### Risks and Mitigations

- TODO

## Alternatives

TODO

## Upgrade Strategy

TODO

## Additional Details

### Test Plan [optional]

TODO

### Graduation Criteria [optional]

TODO

### Version Skew Strategy [optional]

TODO

## Implementation History

- [ ] 01/11/2023: Compile a Google Doc to organize thoughts prior to CAEP (link here)[https://docs.google.com/document/d/1rqzZfsO6k_RmOHUxx47cALSr_6SeTG89e9C44-oHHdQ/]

0 comments on commit 2723963

Please sign in to comment.