diff --git a/containers/kubernetes/reference-content/using-kapsule-autoheal-feature.mdx b/containers/kubernetes/reference-content/using-kapsule-autoheal-feature.mdx new file mode 100644 index 0000000000..33d305841d --- /dev/null +++ b/containers/kubernetes/reference-content/using-kapsule-autoheal-feature.mdx @@ -0,0 +1,47 @@ +--- +meta: + title: Using the Scaleway Kubernetes Kapsule autoheal feature + description: This page explains the concept of Scaleway Kubernetes Kapsule autoheal +content: + h1: Using the Scaleway Kubernetes Kapsule autoheal feature + paragraph: This page explains the concept of Scaleway Kubernetes Kapsule autoheal +tags: kubernetes kapsule autoheal +dates: + validation: 2024-04-04 + posted: 2024-04-04 +categories: + - kubernetes +--- + +The Scaleway Kubernetes Kapsule autoheal feature is designed to automatically detect and recover from failures within a Kubernetes cluster. +It provides a proactive approach to maintaining the health and availability of cluster nodes by automatically addressing issues that may arise. +The autoheal feature periodically checks the health of the Kubernetes cluster and takes action based on predefined conditions. + +You can enable the autoheal feature to ensure that your applications remain operational even in the event of failures. Some common use cases include: + +- **Enhanced reliability**: By automatically recovering from failures, autoheal improves the reliability of nodes forming the Kubernetes cluster. +- **Fault tolerance**: It enhances the fault tolerance of the Kubernetes cluster by detecting and addressing node failures. +- **Reduced downtime**: By automatically detecting and recovering from failures, autoheal reduces downtime and minimizes the impact on application performance. +- **Operational efficiency**: It reduces the need for manual intervention in addressing failures, thereby improving operational efficiency. + +## Autoheal process + +Autoheal reconciliation loop is triggered every five (5) minutes. If a node remains in `notReady` state for more than **15 minutes**, it will be **rebooted** (only once), and after **30 minutes** it will be **replaced**. + +## When to enable or disable autoheal + +### When to enable autoheal + +It is advised to enable autoheal in production environments where maintaining high availability and minimizing downtime is critical. + +### When to disable autoheal + +There are scenarios where autoheal should be disabled: + +- **Testing environments**: In testing or development environments where failures can be tolerated for troubleshooting purposes. +- **Custom recovery mechanisms**: If you have configured custom recovery mechanisms that handle failures in a different way than the autoheal feature. +- **Operational Control**: If you prefer to handle node failures in a more manual way and get a good grasp of how things work. + + + We recommend that you carefully consider what enabling or disabling autoheal involves, based on your specific use case requirements and operational considerations. + diff --git a/menu/navigation.json b/menu/navigation.json index 6a9105de98..7750942dd0 100644 --- a/menu/navigation.json +++ b/menu/navigation.json @@ -1998,6 +1998,10 @@ "label": "Using Load Balancer annotations", "slug": "using-load-balancer-annotations" }, + { + "label": "Using the Kapsule autoheal feature", + "slug": "using-kapsule-autoheal-feature" + }, { "label": "Wildcard DNS routing", "slug": "wildcard-dns"