Skip to content

Commit

Permalink
docs: add troubleshooting note on VM_MEMORY_OVERHEAD_PERCENT
Browse files Browse the repository at this point in the history
  • Loading branch information
jmdeal committed Aug 20, 2024
1 parent e284774 commit 3523315
Showing 1 changed file with 44 additions and 0 deletions.
44 changes: 44 additions & 0 deletions website/content/en/docs/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -325,6 +325,50 @@ then the following solution(s) may resolve your issue.
...
```

### Karpenter incorrectly computes available resources for a node

When creating nodes, the allocatable resources Karpenter computed (as seen in logs and `nodeClaim.status.allocatable`) does not always match the allocatable resources on the created node (`node.status.allocatable`).
Karpenter uses the results from `ec2:DescribeInstanceTypes` to determine the resources available on a node launched with a given instance type.
The following computation is used to determine allocatable CPU, memory, and ephemeral storage based on the results returned from `ec2:DescribeInstanceTypes`.

```
nodeClaim.allocatable.cpu = instance.cpu - kubeReserved.cpu - systemReserved.cpu
nodeClaim.allocatable.memory = (instance.memory * (1.0 - VM_MEMORY_OVERHEAD_PERCENT)) - kubeReserved.memory - systemReserved.memory - max(evictionSoft.memory.available, evictionHard.memory.available)
nodeClaim.allocatable.ephemeralStorage = instance.storage - kubeReserved.ephemeralStorage - systemReserved.ephemeralStorage - max(evictionSoft.nodefs.available, evictionHard.nodefs.available)
```

Most of these factors directly model user configuration (i.e. the KubeletConfiguration options).
On the other hand, `VM_MEMORY_OVERHEAD_PERCENT` models an implicit reduction of available memory that varies by instance type and AMI.
Karpenter can't compute the exact value being modeled, so `VM_MEMORY_OVERHEAD_PERCENT` is a [global setting]({{< ref="../reference/settings.md" >}}) used across all instance type and AMI combinations.
The default value (`7.5%`) has been tuned to closely match reality for the majority of instance types while not overestimating.
As a result, Karpenter will typically underestimate the memory availble on a node for a given instance type.
If you know the real `VM_MEMORY_OVERHEAD_PERCENT` for the specific instances you're provisioning in your cluster, you can tune this value to tighten the bound.
However, this should be done with caution.
A `VM_MEMORY_OVERHEAD_PERCENT` which results in Karpenter overestimating the memory available on a node can result in Karpenter launching nodes which are too small for your workload.
In the worst case, this can result in an instance launch loop and your workload remaining unschedulable indefinitely.

To detect instances of Karpenter overestimating resource availability, the following status condition can be monitored:

```bash
$ kg nodeclaim default-chcj2 -o jsonpath='{.status.conditions[?(@.type=="ConsistentStateFound")]}'
```

```json
{
"type": "ConsistentStateFound",
"status": "False",
"reason": "ConsistencyCheckFailed",
"message": "Consistency Check Failed",
"lastTransitionTime": "2024-08-19T20:02:16Z"
}
```

This can be spot checked like shown above, or monitored via the following metric:

```
operator_status_condition_count{type="ConsistentStateFound",kind="NodeClaim",status="False"}
```

## Deprovisioning

### Nodes not deprovisioned
Expand Down

0 comments on commit 3523315

Please sign in to comment.