Skip to content

Commit

Permalink
Promote sysctls to Beta (#8804)
Browse files Browse the repository at this point in the history
* Promote sysctls to Beta

* Copyedits

Signed-off-by: Misty Stanley-Jones <mistyhacks@google.com>

* Review comments

* Address feedback

* More feedback
  • Loading branch information
ingvagabund authored and Misty Linville committed Jun 27, 2018
1 parent 4bd3e92 commit 26805bb
Show file tree
Hide file tree
Showing 2 changed files with 69 additions and 27 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,7 @@ different Kubernetes components.
| `SupportIPVSProxyMode` | `true` | Beta | 1.10 | 1.10 |
| `SupportIPVSProxyMode` | `true` | GA | 1.11 | |
| `SupportPodPidsLimit` | `false` | Alpha | 1.10 | |
| `Sysctls` | `true` | Beta | 1.11 | |
| `TaintBasedEvictions` | `false` | Alpha | 1.6 | |
| `TaintNodesByCondition` | `false` | Alpha | 1.8 | |
| `TokenRequest` | `false` | Alpha | 1.10 | |
Expand Down Expand Up @@ -216,6 +217,8 @@ Each feature gate is designed for enabling/disabling a specific feature:
- `SupportIPVSProxyMode`: Enable providing in-cluster service load balancing using IPVS.
See [service proxies](/docs/concepts/services-networking/service/#virtual-ips-and-service-proxies) for more details.
- `SupportPodPidsLimit`: Enable the support to limiting PIDs in Pods.
- `Sysctls`: Enable support for namespaced kernel parameters (sysctls) that can be set for each pod.
See [sysctls](/docs/tasks/administer-cluster/sysctl-cluster/) for more details.
- `TaintBasedEvictions`: Enable evicting pods from nodes based on taints on nodes and tolerations on Pods.
See [taints and tolerations](/docs/concepts/configuration/taint-and-toleration/) for more details.
- `TaintNodesByCondition`: Enable automatic tainting nodes based on [node conditions](/docs/concepts/architecture/nodes/#condition).
Expand All @@ -224,4 +227,3 @@ Each feature gate is designed for enabling/disabling a specific feature:
PersistentVolumeClaim (PVC) binding aware of scheduling decisions. It also
enables the usage of [`local`](/docs/concepts/storage/volumes/#local) volume
type when used together with the `PersistentLocalVolumes` feature gate.

92 changes: 66 additions & 26 deletions content/en/docs/tasks/administer-cluster/sysctl-cluster.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,15 @@
---
title: Using Sysctls in a Kubernetes Cluster
title: Using sysctls in a Kubernetes Cluster
reviewers:
- sttts
content_template: templates/task
---

{{% capture overview %}}
{{< feature-state for_k8s_version="v1.11" state="beta" >}}

This document describes how sysctls are used within a Kubernetes cluster.
This document describes how to configure and use kernel parameters within a
Kubernetes cluster using the sysctl interface.

{{% /capture %}}

Expand Down Expand Up @@ -74,7 +76,7 @@ application tuning. _Unsafe_ sysctls are enabled on a node-by-node basis with a
flag of the kubelet, e.g.:

```shell
$ kubelet --experimental-allowed-unsafe-sysctls \
$ kubelet --allowed-unsafe-sysctls \
'kernel.msg*,net.ipv4.route.min_pmtu' ...
```

Expand All @@ -89,36 +91,49 @@ Only _namespaced_ sysctls can be enabled this way.
## Setting Sysctls for a Pod

A number of sysctls are _namespaced_ in today's Linux kernels. This means that
they can be set independently for each pod on a node. Being namespaced is a
requirement for sysctls to be accessible in a pod context within Kubernetes.
they can be set independently for each pod on a node. Only namespaced sysctls
are configurable via the pod securityContext within Kubernetes.

The following sysctls are known to be _namespaced_:
The following sysctls are known to be namespaced. This list could change
in future versions of the Linux kernel.

- `kernel.shm*`,
- `kernel.msg*`,
- `kernel.sem`,
- `fs.mqueue.*`,
- `net.*`.

Sysctls which are not namespaced are called _node-level_ and must be set
manually by the cluster admin, either by means of the underlying Linux
distribution of the nodes (e.g. via `/etc/sysctls.conf`) or using a DaemonSet
with privileged containers.
Sysctls with no namespace are called _node-level_ sysctls. If you need to set
them, you must manually configure them on each node's operating system, or by
using a DaemonSet with privileged containers.

The sysctl feature is an alpha API. Therefore, sysctls are set using annotations
on pods. They apply to all containers in the same pod.
Use the pod securityContext to configure namespaced sysctls. The securityContext
applies to all containers in the same pod.

Here is an example, with different annotations for _safe_ and _unsafe_ sysctls:
This example uses the pod securityContext to set a safe sysctl
`kernel.shm_rmid_forced` and two unsafe sysctls `net.ipv4.route.min_pmtu` and
`kernel.msgmax` There is no distinction between _safe_ and _unsafe_ sysctls in
the specification.

{{< warning >}}
Only modify sysctl parameters after you understand their effects, to avoid
destabilizing your operating system.
{{< /warning >}}

```yaml
apiVersion: v1
kind: Pod
metadata:
name: sysctl-example
annotations:
security.alpha.kubernetes.io/sysctls: kernel.shm_rmid_forced=1
security.alpha.kubernetes.io/unsafe-sysctls: net.ipv4.route.min_pmtu=1000,kernel.msgmax=1 2 3
spec:
securityContext:
sysctls:
- name: kernel.shm_rmid_forced
value: "0"
- name: net.ipv4.route.min_pmtu
value: "552"
- name: kernel.msgmax
value: "65536"
...
```
{{% /capture %}}
Expand All @@ -143,27 +158,52 @@ is recommended to use
[taints on nodes](/docs/concepts/configuration/taint-and-toleration/)
to schedule those pods onto the right nodes.

## PodSecurityPolicy Annotations
## PodSecurityPolicy

You can further control which sysctls can be set in pods by specifying lists of
sysctls or sysctl patterns in the `forbiddenSysctls` and/or
`allowedUnsafeSysctls` fields of the PodSecurityPolicy. A sysctl pattern ends
with a `*` character, such as `kernel.*`. A `*` character on its own matches
all sysctls.

By default, all safe sysctls are allowed.

Both `forbiddenSysctls` and `allowedUnsafeSysctls` are lists of plain sysctl names
or sysctl patterns (which end with `*`). The string `*` matches all sysctls.

The use of sysctl in pods can be controlled via annotation on the PodSecurityPolicy.
The `forbiddenSysctls` field excludes specific sysctls. You can forbid a
combination of safe and unsafe sysctls in the list. To forbid setting any
sysctls, use `*` on its own.

Sysctl annotation represents a whitelist of allowed safe and unsafe sysctls
in a pod spec. It's a comma-separated list of plain sysctl names or sysctl patterns
(which end in `*`). The string `*` matches all sysctls.
If you specify any unsafe sysctl in the `allowedUnsafeSysctls` field and it is
not present in the `forbiddenSysctls` field, that sysctl can be used in Pods
using this PodSecurityPolicy. To allow all unsafe sysctls in the
PodSecurityPolicy to be set, use `*` on its own.

Here is an example, it authorizes binding user creating pod with corresponding sysctls.
Do not configure these two fields such that there is overlap, meaning that a
given sysctl is both allowed and forbidden.

{{< warning >}}
**Warning**: If you whitelist unsafe sysctls via the `allowedUnsafeSysctls` field
in a PodSecurityPolicy, any pod using such a sysctl will fail to start
if the sysctl is not whitelisted via the `--allowed-unsafe-sysctls` kubelet
flag as well on that node.
{{< /warning >}}

This example allows unsafe sysctls prefixed with `kernel.msg` to be set and
disallows setting of the `kernel.shm_rmid_forced` sysctl.

```yaml
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: sysctl-psp
annotations:
security.alpha.kubernetes.io/sysctls: 'net.ipv4.route.*,kernel.msg*'
spec:
allowedUnsafeSysctls:
- kernel.msg*
forbiddenSysctls:
- kernel.shm_rmid_forced
...
```

{{% /capture %}}


0 comments on commit 26805bb

Please sign in to comment.