Skip to content

Commit

Permalink
Update the machineHealthCheck and advance docs (#267)
Browse files Browse the repository at this point in the history
Making the MHC a bit more clear which steps to run on specific
clusters. This should help make it a bit more clear for users.

Adding the setup heterogeneous cluster section to advanced. This
should allow users to see how to setup a mix workload windows/linux
cluster.
  • Loading branch information
joekr committed May 15, 2023
1 parent 92c873a commit 47fd880
Show file tree
Hide file tree
Showing 2 changed files with 154 additions and 9 deletions.
128 changes: 127 additions & 1 deletion docs/src/gs/advanced.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,130 @@ go into error state, and the following error will show up in the CAPOCI pod logs

`OCI authentication credentials could not be retrieved from pod or cluster level,please install Cluster API Provider for OCI with OCI authentication credentials or set Cluster Identity in the OCICluster`

[cluster-identity]: ./multi-tenancy.md
## Setup heterogeneous cluster

> This section assumes you have [setup a Windows workload cluster][windows-cluster].
To add Linux nodes to the existing Windows workload cluster use the following YAML as a guide to provision
just the new Linux machines.

Create a file and call it `cluster-template-windows-calico-heterogeneous.yaml`. Then add the following:

```yaml
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: OCIMachineTemplate
metadata:
name: "${CLUSTER_NAME}-md-0"
spec:
template:
spec:
imageId: "${OCI_IMAGE_ID}"
compartmentId: "${OCI_COMPARTMENT_ID}"
shape: "${OCI_NODE_MACHINE_TYPE=VM.Standard.E4.Flex}"
shapeConfig:
ocpus: "${OCI_NODE_MACHINE_TYPE_OCPUS=1}"
metadata:
ssh_authorized_keys: "${OCI_SSH_KEY}"
isPvEncryptionInTransitEnabled: ${OCI_NODE_PV_TRANSIT_ENCRYPTION=true}
---
apiVersion: bootstrap.cluster.x-k8s.io/v1alpha4
kind: KubeadmConfigTemplate
metadata:
name: "${CLUSTER_NAME}-md-0"
spec:
template:
spec:
joinConfiguration:
nodeRegistration:
kubeletExtraArgs:
cloud-provider: external
provider-id: oci://{{ ds["id"] }}
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
name: "${CLUSTER_NAME}-md-0"
spec:
clusterName: "${CLUSTER_NAME}"
replicas: ${NODE_MACHINE_COUNT}
selector:
matchLabels:
template:
spec:
clusterName: "${CLUSTER_NAME}"
version: "${KUBERNETES_VERSION}"
bootstrap:
configRef:
name: "${CLUSTER_NAME}-md-0"
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
infrastructureRef:
name: "${CLUSTER_NAME}-md-0"
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: OCIMachineTemplate
```

Then apply the template
```bash
OCI_IMAGE_ID=<your new linux image OCID> \
OCI_NODE_IMAGE_ID=<your new linux image OCID> \
OCI_COMPARTMENT_ID=<your compartment> \
NODE_MACHINE_COUNT=2 \
OCI_NODE_MACHINE_TYPE=<shape> \
OCI_NODE_MACHINE_TYPE_OCPUS=4 \
OCI_SSH_KEY="<your public ssh key>" \
clusterctl generate cluster <cluster-name> --kubernetes-version <kubernetes-version> \
--target-namespace default \
--from cluster-template-windows-calico-heterogeneous.yaml | kubectl apply -f -
```

After a few minutes the instances will come up and the CNI will be installed.

### Node constraints

All future deployments make sure to setup node constraints using something like [`nodeselctor`](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector). Example:

| Windows | Linux |
| ----------- | ----------- |
| ```nodeSelector: kubernetes.io/os: windows``` | ```nodeSelector:kubernetes.io/os: linux``` |

<br/>
<details>
<summary>nodeSelector examples - click to expand</summary>

Linux nginx deployment example:
```bash
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-nginx-linux
spec:
selector:
matchLabels:
run: my-nginx-linux
replicas: 2
template:
metadata:
labels:
run: my-nginx-linux
spec:
nodeSelector:
kubernetes.io/os: linux
containers:
- args:
- /bin/sh
- -c
- sleep 3600
name: nginx
image: nginx:latest
```

For a Windows deployment example see the [Kubernetes Getting Started: Deploying a Windows workload][windows-kubernetes-deployment] documentation

</details>

Without doing this it is possible that the Kubernetes scheduler will try to deploy your Windows pods onto a Linux worker, or vice versa.

[cluster-identity]: ./multi-tenancy.md
[windows-cluster]: ./create-windows-workload-cluster.md
[windows-kubernetes-deployment]: https://kubernetes.io/docs/concepts/windows/user-guide/#getting-started-deploying-a-windows-workload
35 changes: 27 additions & 8 deletions docs/src/gs/create-mhc-workload-cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,15 @@ This will move the machines into a `Ready` state.
Another approach is to install MHC after the cluster is up and healthy (aka Day-2 Operation). This can prevent
machine remediation while setting up the cluster.

Adding the MHC to either control-plane or machine is a multistep process. The steps are run on specific clusters
(e.g. management cluster, workload cluster):
1. Update the spec for future instances (management cluster)
2. Add label to existing nodes (workload cluster)
3. Add the MHC (management cluster)

### Add control-plane MHC

#### Update control plane spec
We need to add the `controlplane.remediation` label to the `KubeadmControlPlane`.

Create a file named `control-plane-patch.yaml` that has this content:
Expand All @@ -33,13 +40,18 @@ spec:
controlplane.remediation: ""
```

Then run `kubectl patch KubeadmControlPlane <your-cluster-name>-control-plane --patch-file control-plane-patch.yaml --type=merge`.
Then on the management cluster run
`kubectl patch KubeadmControlPlane <your-cluster-name>-control-plane --patch-file control-plane-patch.yaml --type=merge`.

#### Add label to existing nodes

Then add the new label to any existing control-plane node(s)
Then on the workload cluster add the new label to any existing control-plane node(s)
`kubectl label node <control-plane-name> controlplane.remediation=""`. This will prevent the `KubeadmControlPlane` provisioning
new nodes once the MHC is deployed.

Create a file named `control-plane-mhc.yaml` that has this content:
#### Add the MHC

Finally, create a file named `control-plane-mhc.yaml` that has this content:
```yaml
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineHealthCheck
Expand All @@ -61,12 +73,14 @@ spec:
timeout: 300s
```

Then run `kubectl apply -f control-plane-mhc.yaml`.
Then on the management cluster run `kubectl apply -f control-plane-mhc.yaml`.

Then run `kubectl get machinehealthchecks` to check your MachineHealthCheck sees the expected machines.

### Add machine MHC

#### Update machine spec

We need to add the `machine.remediation` label to the `MachineDeployment`.

Create a file named `machine-patch.yaml` that has this content:
Expand All @@ -78,13 +92,18 @@ spec:
machine.remediation: ""
```

Then run `kubectl patch MachineDeployment oci-cluster-stage-md-0 --patch-file machine-patch.yaml --type=merge`.
Then on the management cluster run
`kubectl patch MachineDeployment oci-cluster-stage-md-0 --patch-file machine-patch.yaml --type=merge`.

#### Add label to existing nodes

Then add the new label to any existing control-plane node(s)
Then on the workload cluster add the new label to any existing control-plane node(s)
`kubectl label node <machine-name> machine.remediation=""`. This will prevent the `MachineDeployment` provisioning
new nodes once the MHC is deployed.

Create a file named `machine-mhc.yaml` that has this content:
#### Add the MHC

Finally, create a file named `machine-mhc.yaml` that has this content:
```yaml
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineHealthCheck
Expand All @@ -106,7 +125,7 @@ spec:
timeout: 300s
```

Then run `kubectl apply -f machine-mhc.yaml`.
Then on the management cluster run `kubectl apply -f machine-mhc.yaml`.

Then run `kubectl get machinehealthchecks` to check your MachineHealthCheck sees the expected machines.

Expand Down

0 comments on commit 47fd880

Please sign in to comment.