-
Notifications
You must be signed in to change notification settings - Fork 21
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: Add MachineHealthCheck example template (#175)
- Loading branch information
Showing
3 changed files
with
307 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
# Create a workload cluster with MachineHealthChecks (MHC) | ||
|
||
To better understand MachineHealthChecks please read over [the Cluster-API book][mhc] | ||
and make sure to read the [limitations][mhc-limitations] sections. | ||
|
||
## Create a new workload cluster with MHC | ||
|
||
In the project's code repository we provide an [example template][mhc-template] that sets up two MachineHealthChecks | ||
at workload creation time. The example sets up two MHCs to allow differing remediation values: | ||
|
||
- `control-plane-unhealthy-5m` setups a health check for the control plane machines | ||
- `md-unhealthy-5m` sets up a health check for the workload machines | ||
|
||
> NOTE: As a part of the example template the MHCs will start remediating nodes that are `not ready` after 10 minutes. | ||
In order prevent this side effect make sure to [install your CNI][install-a-cni-provider] once the API is available. | ||
This will move the machines into a `Ready` state. | ||
|
||
## Add MHC to existing workload cluster | ||
|
||
Another approach is to install MHC after the cluster is up and healthy (aka Day-2 Operation). This can prevent | ||
machine remediation while setting up the cluster. | ||
|
||
### Add control-plane MHC | ||
|
||
We need to add the `controlplane.remediation` label to the `KubeadmControlPlane`. | ||
|
||
Create a file named `control-plane-patch.yaml` that has this content: | ||
```yaml | ||
spec: | ||
machineTemplate: | ||
metadata: | ||
labels: | ||
controlplane.remediation: "" | ||
``` | ||
Then run `kubectl patch KubeadmControlPlane <your-cluster-name>-control-plane --patch-file control-plane-patch.yaml --type=merge`. | ||
|
||
Then add the new label to any existing control-plane node(s) | ||
`kubectl label node <control-plane-name> controlplane.remediation=""`. This will prevent the `KubeadmControlPlane` provisioning | ||
new nodes once the MHC is deployed. | ||
|
||
Create a file named `control-plane-mhc.yaml` that has this content: | ||
```yaml | ||
apiVersion: cluster.x-k8s.io/v1beta1 | ||
kind: MachineHealthCheck | ||
metadata: | ||
name: "<your-cluster-name>-control-plane-unhealthy-5m" | ||
spec: | ||
clusterName: "<your-cluster-name>" | ||
maxUnhealthy: 100% | ||
nodeStartupTimeout: 10m | ||
selector: | ||
matchLabels: | ||
controlplane.remediation: "" | ||
unhealthyConditions: | ||
- type: Ready | ||
status: Unknown | ||
timeout: 300s | ||
- type: Ready | ||
status: "False" | ||
timeout: 300s | ||
``` | ||
|
||
Then run `kubectl apply -f control-plane-mhc.yaml`. | ||
|
||
Then run `kubectl get machinehealthchecks` to check your MachineHealthCheck sees the expected machines. | ||
|
||
### Add machine MHC | ||
|
||
We need to add the `machine.remediation` label to the `MachineDeployment`. | ||
|
||
Create a file named `machine-patch.yaml` that has this content: | ||
```yaml | ||
spec: | ||
template: | ||
metadata: | ||
labels: | ||
machine.remediation: "" | ||
``` | ||
|
||
Then run `kubectl patch MachineDeployment oci-cluster-stage-md-0 --patch-file machine-patch.yaml --type=merge`. | ||
|
||
Then add the new label to any existing control-plane node(s) | ||
`kubectl label node <machine-name> machine.remediation=""`. This will prevent the `MachineDeployment` provisioning | ||
new nodes once the MHC is deployed. | ||
|
||
Create a file named `machine-mhc.yaml` that has this content: | ||
```yaml | ||
apiVersion: cluster.x-k8s.io/v1beta1 | ||
kind: MachineHealthCheck | ||
metadata: | ||
name: "<your-cluster-name>-stage-md-unhealthy-5m" | ||
spec: | ||
clusterName: "oci-cluster-stage" | ||
maxUnhealthy: 100% | ||
nodeStartupTimeout: 10m | ||
selector: | ||
matchLabels: | ||
machine.remediation: "" | ||
unhealthyConditions: | ||
- type: Ready | ||
status: Unknown | ||
timeout: 300s | ||
- type: Ready | ||
status: "False" | ||
timeout: 300s | ||
``` | ||
|
||
Then run `kubectl apply -f machine-mhc.yaml`. | ||
|
||
Then run `kubectl get machinehealthchecks` to check your MachineHealthCheck sees the expected machines. | ||
|
||
[install-a-cni-provider]: ../gs/create-workload-cluster.md#install-a-cni-provider | ||
[mhc]: https://cluster-api.sigs.k8s.io/tasks/automated-machine-management/healthchecking.html | ||
[mhc-limitations]: https://cluster-api.sigs.k8s.io/tasks/automated-machine-management/healthchecking.html#limitations-and-caveats-of-a-machinehealthcheck | ||
[mhc-template]: https://github.com/oracle/cluster-api-provider-oci/blob/main/templates/cluster-template-healcheck.yaml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,190 @@ | ||
apiVersion: cluster.x-k8s.io/v1beta1 | ||
kind: Cluster | ||
metadata: | ||
labels: | ||
cluster.x-k8s.io/cluster-name: "${CLUSTER_NAME}" | ||
name: "${CLUSTER_NAME}" | ||
namespace: "${NAMESPACE}" | ||
spec: | ||
clusterNetwork: | ||
pods: | ||
cidrBlocks: | ||
- ${POD_CIDR:="192.168.0.0/16"} | ||
serviceDomain: ${SERVICE_DOMAIN:="cluster.local"} | ||
services: | ||
cidrBlocks: | ||
- ${SERVICE_CIDR:="10.128.0.0/12"} | ||
infrastructureRef: | ||
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 | ||
kind: OCICluster | ||
name: "${CLUSTER_NAME}" | ||
namespace: "${NAMESPACE}" | ||
controlPlaneRef: | ||
apiVersion: controlplane.cluster.x-k8s.io/v1beta1 | ||
kind: KubeadmControlPlane | ||
name: "${CLUSTER_NAME}-control-plane" | ||
namespace: "${NAMESPACE}" | ||
--- | ||
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 | ||
kind: OCICluster | ||
metadata: | ||
labels: | ||
cluster.x-k8s.io/cluster-name: "${CLUSTER_NAME}" | ||
name: "${CLUSTER_NAME}" | ||
spec: | ||
compartmentId: "${OCI_COMPARTMENT_ID}" | ||
--- | ||
kind: KubeadmControlPlane | ||
apiVersion: controlplane.cluster.x-k8s.io/v1beta1 | ||
metadata: | ||
name: "${CLUSTER_NAME}-control-plane" | ||
namespace: "${NAMESPACE}" | ||
spec: | ||
version: "${KUBERNETES_VERSION}" | ||
replicas: ${CONTROL_PLANE_MACHINE_COUNT} | ||
machineTemplate: | ||
metadata: | ||
labels: | ||
controlplane.remediation: "" | ||
infrastructureRef: | ||
kind: OCIMachineTemplate | ||
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 | ||
name: "${CLUSTER_NAME}-control-plane" | ||
namespace: "${NAMESPACE}" | ||
kubeadmConfigSpec: | ||
clusterConfiguration: | ||
kubernetesVersion: ${KUBERNETES_VERSION} | ||
apiServer: | ||
certSANs: [localhost, 127.0.0.1] | ||
dns: {} | ||
etcd: {} | ||
networking: {} | ||
scheduler: {} | ||
initConfiguration: | ||
nodeRegistration: | ||
criSocket: /var/run/containerd/containerd.sock | ||
kubeletExtraArgs: | ||
cloud-provider: external | ||
provider-id: oci://{{ ds["id"] }} | ||
joinConfiguration: | ||
discovery: {} | ||
nodeRegistration: | ||
criSocket: /var/run/containerd/containerd.sock | ||
kubeletExtraArgs: | ||
cloud-provider: external | ||
provider-id: oci://{{ ds["id"] }} | ||
--- | ||
kind: OCIMachineTemplate | ||
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 | ||
metadata: | ||
name: "${CLUSTER_NAME}-control-plane" | ||
# labels: | ||
# controlplane.remediation: "" | ||
spec: | ||
template: | ||
spec: | ||
imageId: "${OCI_IMAGE_ID}" | ||
compartmentId: "${OCI_COMPARTMENT_ID}" | ||
shape: "${OCI_CONTROL_PLANE_MACHINE_TYPE=VM.Standard.E4.Flex}" | ||
shapeConfig: | ||
ocpus: "${OCI_CONTROL_PLANE_MACHINE_TYPE_OCPUS=1}" | ||
metadata: | ||
ssh_authorized_keys: "${OCI_SSH_KEY}" | ||
isPvEncryptionInTransitEnabled: ${OCI_CONTROL_PLANE_PV_TRANSIT_ENCRYPTION=true} | ||
--- | ||
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 | ||
kind: OCIMachineTemplate | ||
metadata: | ||
name: "${CLUSTER_NAME}-md-0" | ||
# labels: | ||
# machine.remediation: "" | ||
spec: | ||
template: | ||
spec: | ||
imageId: "${OCI_IMAGE_ID}" | ||
compartmentId: "${OCI_COMPARTMENT_ID}" | ||
shape: "${OCI_NODE_MACHINE_TYPE=VM.Standard.E4.Flex}" | ||
shapeConfig: | ||
ocpus: "${OCI_NODE_MACHINE_TYPE_OCPUS=1}" | ||
metadata: | ||
ssh_authorized_keys: "${OCI_SSH_KEY}" | ||
isPvEncryptionInTransitEnabled: ${OCI_NODE_PV_TRANSIT_ENCRYPTION=true} | ||
--- | ||
apiVersion: bootstrap.cluster.x-k8s.io/v1alpha4 | ||
kind: KubeadmConfigTemplate | ||
metadata: | ||
name: "${CLUSTER_NAME}-md-0" | ||
spec: | ||
template: | ||
spec: | ||
joinConfiguration: | ||
nodeRegistration: | ||
kubeletExtraArgs: | ||
cloud-provider: external | ||
provider-id: oci://{{ ds["id"] }} | ||
--- | ||
apiVersion: cluster.x-k8s.io/v1beta1 | ||
kind: MachineDeployment | ||
metadata: | ||
name: "${CLUSTER_NAME}-md-0" | ||
# labels: | ||
# machine.remediation: "" | ||
spec: | ||
clusterName: "${CLUSTER_NAME}" | ||
replicas: ${NODE_MACHINE_COUNT} | ||
selector: | ||
matchLabels: | ||
template: | ||
metadata: | ||
labels: | ||
machine.remediation: "" | ||
spec: | ||
clusterName: "${CLUSTER_NAME}" | ||
version: "${KUBERNETES_VERSION}" | ||
bootstrap: | ||
configRef: | ||
name: "${CLUSTER_NAME}-md-0" | ||
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1 | ||
kind: KubeadmConfigTemplate | ||
infrastructureRef: | ||
name: "${CLUSTER_NAME}-md-0" | ||
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1 | ||
kind: OCIMachineTemplate | ||
--- | ||
apiVersion: cluster.x-k8s.io/v1beta1 | ||
kind: MachineHealthCheck | ||
metadata: | ||
name: "${CLUSTER_NAME}-control-plane-unhealthy-5m" | ||
spec: | ||
clusterName: "${CLUSTER_NAME}" | ||
maxUnhealthy: 100% | ||
nodeStartupTimeout: 10m | ||
selector: | ||
matchLabels: | ||
controlplane.remediation: "" | ||
unhealthyConditions: | ||
- type: Ready | ||
status: Unknown | ||
timeout: 300s | ||
- type: Ready | ||
status: "False" | ||
timeout: 300s | ||
--- | ||
apiVersion: cluster.x-k8s.io/v1beta1 | ||
kind: MachineHealthCheck | ||
metadata: | ||
name: "${CLUSTER_NAME}-md-unhealthy-5m" | ||
spec: | ||
clusterName: "${CLUSTER_NAME}" | ||
maxUnhealthy: 100% | ||
nodeStartupTimeout: 10m | ||
selector: | ||
matchLabels: | ||
machine.remediation: "" | ||
unhealthyConditions: | ||
- type: Ready | ||
status: Unknown | ||
timeout: 300s | ||
- type: Ready | ||
status: "False" | ||
timeout: 300s |