Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Security Self-Assessment: [STRIDE-MULTIPLE] Secure Cluster Class for Cluster API (MVP) #6329

Closed
PushkarJ opened this issue Mar 23, 2022 · 13 comments
Assignees
Labels
area/clusterclass Issues or PRs related to clusterclass area/security Issues or PRs related to security kind/feature Categorizes issue or PR as related to a new feature. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/security Categorizes an issue or PR as relevant to SIG Security. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@PushkarJ
Copy link
Member

PushkarJ commented Mar 23, 2022

Secure Cluster Class for Cluster API

Create a secure cluster class, that allows end users to spin "secure by default" clusters with sane defaults which are configurable.

Motivation

  1. Creates a single place to add other security features
  2. It mitigates some threats from CAPI Security assessment
  3. It helps with compliance guidelines like NSA/CISA k8s hardening

Goals

MVP (Minimum Viable Product) goal is to support pod security admission with baseline pod security standard enforced at cluster level.

Demo: https://asciinema.org/a/477476

Non-Goals/Future Work

Post MVP can include features like but not limited to support for:

Any complex Pod Security Configurations that are not supported by built-in pod security admission controller are out of scope.

Proposal

To enable pod security admission with baseline pod security standard at cluster level, API server needs to be passed an extraArgs parameter that points to AdmissionConfiguration file that defines the cluster level pod security standard with exemptions.

This file needs to be present on the control plane nodes where API server is running. If running as a pod this file needs to be mounted from host inside the pod or generated within the pod before API server binary is executed

An example of a ClusterClass Configuration for Cluster API Provider - Docker can be found here

To auto-generate this file we need to add a new feature to clusterctl generate that takes as input these parameters:

  • Pod Security standard name (restricted, baseline, privileged)
  • Applicable mode (enforce, warn, audit)
  • Applicable version (usually default)

Few possible CLI UX options are as follows:

Secure with sane defaults

clusterctl generate cluster capi-quickstart --flavor development-topology \
  --secure pod-security
  --kubernetes-version v1.23.3 \
  --control-plane-machine-count=3 \
  --worker-machine-count=3 \
  > capi-quickstart-secure-default.yaml

By default clusterctl will enforce baseline pod security standard & audit and warn on restricted an exempt kube-system namespace and version will default to latest

Secure with configurable defaults

clusterctl generate cluster capi-quickstart --flavor development-topology \
  --secure pod-security=baseline:enforce,restricted:warn,restricted:audit
  --kubernetes-version v1.23.3 \
  --control-plane-machine-count=3 \
  --worker-machine-count=3 \
  > capi-quickstart-secure-configured.yaml

This needs to be confirmed for OpenAPI schema compatability

Secure with configurable defaults via environment substring

POD_SECURITY_CONFIG='{"baseline":"enforce","restricted":"warn","restricted":"audit"}'
clusterctl generate cluster capi-quickstart --flavor development-topology \
  --secure pod-security
  --kubernetes-version v1.23.3 \
  --control-plane-machine-count=3 \
  --worker-machine-count=3 \
  > capi-quickstart-secure-env-configured.yaml

The outcome of the either of the above UX would be generation of the cluster-level-pss.yaml file which is accessible to API server during start up.

Implementation Notes

Example Cluster Class configuration

Click to view full content
apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
metadata:
  name: quick-start-secure
  namespace: default
spec:
  controlPlane:
    machineInfrastructure:
      ref:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: DockerMachineTemplate
        name: quick-start-secure-control-plane
    ref:
      apiVersion: controlplane.cluster.x-k8s.io/v1beta1
      kind: KubeadmControlPlaneTemplate
      name: quick-start-secure-control-plane
  infrastructure:
    ref:
      apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
      kind: DockerClusterTemplate
      name: quick-start-secure-cluster
  patches:
  - definitions:
    - jsonPatches:
      - op: add
        path: /spec/template/spec/kubeadmConfigSpec/clusterConfiguration/imageRepository
        valueFrom:
          variable: imageRepository
      selector:
        apiVersion: controlplane.cluster.x-k8s.io/v1beta1
        kind: KubeadmControlPlaneTemplate
        matchResources:
          controlPlane: true
    description: Sets the imageRepository used for the KubeadmControlPlane.
    name: imageRepository
  - definitions:
    - jsonPatches:
      - op: add
        path: /spec/template/spec/kubeadmConfigSpec/clusterConfiguration/etcd
        valueFrom:
          template: |
            local:
              imageTag: {{ .etcdImageTag }}
      selector:
        apiVersion: controlplane.cluster.x-k8s.io/v1beta1
        kind: KubeadmControlPlaneTemplate
        matchResources:
          controlPlane: true
    description: Sets tag to use for the etcd image in the KubeadmControlPlane.
    name: etcdImageTag
  - definitions:
    - jsonPatches:
      - op: add
        path: /spec/template/spec/kubeadmConfigSpec/clusterConfiguration/dns
        valueFrom:
          template: |
            imageTag: {{ .coreDNSImageTag }}
      selector:
        apiVersion: controlplane.cluster.x-k8s.io/v1beta1
        kind: KubeadmControlPlaneTemplate
        matchResources:
          controlPlane: true
    description: Sets tag to use for the etcd image in the KubeadmControlPlane.
    name: coreDNSImageTag
  - definitions:
    - jsonPatches:
      - op: add
        path: /spec/template/spec/customImage
        valueFrom:
          template: |
            kindest/node:{{ .builtin.machineDeployment.version }}
      selector:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: DockerMachineTemplate
        matchResources:
          machineDeploymentClass:
            names:
            - default-worker
    - jsonPatches:
      - op: add
        path: /spec/template/spec/customImage
        valueFrom:
          template: |
            kindest/node:{{ .builtin.controlPlane.version }}
      selector:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: DockerMachineTemplate
        matchResources:
          controlPlane: true
    description: Sets the container image that is used for running dockerMachines
      for the controlPlane and default-worker machineDeployments.
    name: customImage
  variables:
  - name: imageRepository
    required: true
    schema:
      openAPIV3Schema:
        default: k8s.gcr.io
        description: imageRepository sets the container registry to pull images from.
          If empty, `k8s.gcr.io` will be used by default.
        example: k8s.gcr.io
        type: string
  - name: etcdImageTag
    required: true
    schema:
      openAPIV3Schema:
        default: ""
        description: etcdImageTag sets the tag for the etcd image.
        example: 3.5.1-0
        type: string
  - name: coreDNSImageTag
    required: true
    schema:
      openAPIV3Schema:
        default: ""
        description: coreDNSImageTag sets the tag for the coreDNS image.
        example: v1.8.5
        type: string
  workers:
    machineDeployments:
    - class: default-worker
      template:
        bootstrap:
          ref:
            apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
            kind: KubeadmConfigTemplate
            name: quick-start-secure-default-worker-bootstraptemplate
        infrastructure:
          ref:
            apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
            kind: DockerMachineTemplate
            name: quick-start-secure-default-worker-machinetemplate
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: DockerClusterTemplate
metadata:
  name: quick-start-secure-cluster
  namespace: default
spec:
  template:
    spec: {}
---
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlaneTemplate
metadata:
  name: quick-start-secure-control-plane
  namespace: default
spec:
  template:
    spec:
      kubeadmConfigSpec:
        clusterConfiguration:
          apiServer:
            certSANs:
            - localhost
            - 127.0.0.1
            - 0.0.0.0
            extraArgs:
              admission-control-config-file: /etc/config/cluster-level-pss.yaml
            extraVolumes:
              - name: accf
                hostPath: /etc/config
                mountPath: /etc/config
                readOnly: false
                pathType: "DirectoryOrCreate"
          controllerManager:
            extraArgs:
              enable-hostpath-provisioner: "true"
        initConfiguration:
          nodeRegistration:
            criSocket: /var/run/containerd/containerd.sock
            kubeletExtraArgs:
              cgroup-driver: cgroupfs
              eviction-hard: nodefs.available<0%,nodefs.inodesFree<0%,imagefs.available<0%
        joinConfiguration:
          nodeRegistration:
            criSocket: /var/run/containerd/containerd.sock
            kubeletExtraArgs:
              cgroup-driver: cgroupfs
              eviction-hard: nodefs.available<0%,nodefs.inodesFree<0%,imagefs.available<0%
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: DockerMachineTemplate
metadata:
  name: quick-start-secure-control-plane
  namespace: default
spec:
  template:
    spec:
      extraMounts:
      - containerPath: /var/run/docker.sock
        hostPath: /var/run/docker.sock
      - containerPath: /etc/config/cluster-level-pss.yaml
        hostPath: /tmp/pss/cluster-level-pss.yaml
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: DockerMachineTemplate
metadata:
  name: quick-start-secure-default-worker-machinetemplate
  namespace: default
spec:
  template:
    spec: {}
---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
  name: quick-start-secure-default-worker-bootstraptemplate
  namespace: default
spec:
  template:
    spec:
      joinConfiguration:
        nodeRegistration:
          kubeletExtraArgs:
            cgroup-driver: cgroupfs
            eviction-hard: nodefs.available<0%,nodefs.inodesFree<0%,imagefs.available<0%
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: capi-quickstart-secure
  namespace: default
spec:
  clusterNetwork:
    pods:
      cidrBlocks:
      - 192.168.0.0/16
    serviceDomain: k8s.test
    services:
      cidrBlocks:
      - 10.96.0.0/12
  topology:
    class: quick-start-secure
    controlPlane:
      metadata: {}
      replicas: 3
    variables:
    - name: imageRepository
      value: k8s.gcr.io
    - name: etcdImageTag
      value: ""
    - name: coreDNSImageTag
      value: ""
    version: v1.23.3
    workers:
      machineDeployments:
      - class: default-worker
        name: md-0
        replicas: 3

Demo shell script

Click to view full content
cat /tmp/pss/cluster-level-pss.yaml
sleep 2
echo "\nLet's apply this pod security standard to a cluster created with secure cluster class"
echo "\n\n"
kubectl apply -f capi-quickstart-secure.yaml
sleep 2
kubectl get cluster
kubectl get kubeadmcontrolplane
sleep 5
clusterctl describe cluster capi-quickstart-secure
clusterctl get kubeconfig capi-quickstart-secure > capi-quickstart-secure.kubeconfig
# Point the kubeconfig to the exposed port of the load balancer, rather than the inaccessible container IP.\n
sed -i -e "s/server:.*/server: https:\/\/$(docker port capi-quickstart-secure-lb 6443/tcp | sed "s/0.0.0.0/127.0.0.1/")/g" ./capi-quickstart-secure.kubeconfig
sleep 5
kubectl apply --kubeconfig=./capi-quickstart-secure.kubeconfig \\n -f https://docs.projectcalico.org/v3.21/manifests/calico.yaml
kubectl get --kubeconfig=./capi-quickstart-secure.kubeconfig nodes
cat <<EOF > /tmp/pss/nginx-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
    - image: nginx
      name: nginx
      ports:
        - containerPort: 80
EOF
echo "Let's wait for cluster to get ready"
sleep 10
kubectl get --kubeconfig=./capi-quickstart-secure.kubeconfig nodes
echo "\n\n"
echo "Creating pod that will throw warning on restricted pod security standard"
cat /tmp/pss/nginx-pod.yaml
echo "\n\n"
kubectl apply --kubeconfig=./capi-quickstart-secure.kubeconfig -f /tmp/pss/nginx-pod.yaml
sleep 2
echo "\nYay, as expected, during pod creation, restricted pod security standard threw a warning but it passed the baseline pod security standard\n"
echo "\n\n-------H A P P Y-------H O N K I N G----------\n\n "
sleep 2
echo "Clean up"
kubectl delete cluster capi-quickstart-secure

Pre-requisites

Cluster level Pod Security Admission configuration

Content of /tmp/pss/cluster-level-pss.yaml

apiVersion: apiserver.config.k8s.io/v1
kind: AdmissionConfiguration
plugins:
- name: PodSecurity
  configuration:
    apiVersion: pod-security.admission.config.k8s.io/v1beta1
    kind: PodSecurityConfiguration
    defaults:
      enforce: "baseline"
      enforce-version: "latest"
      audit: "restricted"
      audit-version: "latest"
      warn: "restricted"
      warn-version: "latest"
    exemptions:
      usernames: []
      runtimeClasses: []
      namespaces: [kube-system]
Example pod yaml

Passes baseline but warns on restricted pod security standard

Content of /tmp/pss/nginx-pod.yaml

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
    - image: nginx
      name: nginx
      ports:
        - containerPort: 80

Happy to open a CAEP after initial discussion and community feedback on this issue

/kind feature
/sig cluster-lifecycle security
/area security
/cc @fabriziopandini @sbueringer

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/security Categorizes an issue or PR as relevant to SIG Security. area/security Issues or PRs related to security labels Mar 23, 2022
@fabriziopandini
Copy link
Member

/milestone v1.2

@k8s-ci-robot k8s-ci-robot added this to the v1.2 milestone Mar 28, 2022
@fabriziopandini
Copy link
Member

@PushkarJ I don't think this requires a KEP
I'm +1 to get this template in the docker examples ASAP and to add a security guidelines section to the book.
Eventually, we can also add a periodic job ensuring secure clusters with CAPI works.
/area clusterclass

@chrischdi might be you are interested in this work

@k8s-ci-robot
Copy link
Contributor

@fabriziopandini: The label(s) area/clusterclass cannot be applied, because the repository doesn't have them.

In response to this:

@PushkarJ I don't think this requires a KEP
I'm +1 to get this template in the docker examples ASAP and to add a security guidelines section to the book.
Eventually, we can also add a periodic job ensuring secure clusters with CAPI works.
/area clusterclass

@chrischdi might be you are interested in this work

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@fabriziopandini
Copy link
Member

/area topology

@sbueringer
Copy link
Member

sbueringer commented Apr 6, 2022

Sounds good from my side as well.

I think we can start with adding the ClusterClass first and after that's done discuss how we can integrate it with clusterctl.

I'm not 100% sure but it could be realistic to just extend our existing quickstart ClusterClass with a "secure" (name TBD) variable. We have to experiment a bit, but if that works I would prefer doing that vs. having a separate ClusterClass (after all the goal of ClusterClass is to avoid requiring a lot of different ClusterClasses for different use cases)

@chrischdi
Copy link
Member

chrischdi commented Apr 6, 2022

I pretty much like the secure by default idea.

What if we just apply the baseline config by default which would be:

  • adding the admission admission-pss.yaml file to kcp
  • mounting the file into the kube-apiserver pod
  • setting the flag to activate it

I assume we would break stuff if we just improve the default cluster class generated?

We could then allow users of clusterctl to either (and document these options at the quickstart docs)

  • disable pss (would we even want to or is it enough if they could do it by writing their own clusterclass?)
  • pass their own custom admission-pss.yaml to clusterctl via a flag which then replaces the default one (this may also be used via an "allow-any" to not require the disable-pss option

@sbueringer
Copy link
Member

I assume we would break stuff if we just improve the default cluster class generated?

I'm not sure how much we actually break as it's scoped to CAPD and that's for dev purposes. But I think we should get a good impression after we have a PR which just enables it. We can do the same with our e2e test ClusterClass and see if the e2e tests are still green or not.

But I think it's easy to decide later if the variable which enables it is per default on or off.

The whole story becomes way more complicated if we consider adding a fancy integration with clusterctl. clusterctl just does an envsubst today on the template and it's easy to pass through strings that way.

Currently there is no way to do more, like passing in an object or a file, ... (although a JSON string might work to pass an object as variable value).
I'm not sure how much we want to invest for now into extending clusterctl.

@PushkarJ PushkarJ changed the title Secure Cluster Class for Cluster API Security Self-Assessment: Secure Cluster Class for Cluster API May 13, 2022
@PushkarJ PushkarJ changed the title Security Self-Assessment: Secure Cluster Class for Cluster API Security Self-Assessment: [MVP] Secure Cluster Class for Cluster API May 13, 2022
@PushkarJ
Copy link
Member Author

/assign @PushkarJ

@PushkarJ PushkarJ changed the title Security Self-Assessment: [MVP] Secure Cluster Class for Cluster API Security Self-Assessment: Secure Cluster Class for Cluster API (MVP) May 13, 2022
@PushkarJ PushkarJ changed the title Security Self-Assessment: Secure Cluster Class for Cluster API (MVP) Security Self-Assessment: [STRIDE-MULTIPLE] Secure Cluster Class for Cluster API (MVP) May 13, 2022
@fabriziopandini
Copy link
Member

@PushkarJ @chrischdi can we close this now that we have a secure cluster class MVP?

@fabriziopandini fabriziopandini added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jul 29, 2022
@fabriziopandini fabriziopandini removed this from the v1.2 milestone Jul 29, 2022
@fabriziopandini fabriziopandini removed the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jul 29, 2022
@fabriziopandini
Copy link
Member

/triage accepted
/close

@PushkarJ we can eventually re-open if more work is required

@k8s-ci-robot k8s-ci-robot added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jul 29, 2022
@k8s-ci-robot
Copy link
Contributor

@fabriziopandini: Closing this issue.

In response to this:

/triage accepted
/close

@PushkarJ we can eventually re-open if more work is required

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@PushkarJ
Copy link
Member Author

Thanks for taking care of this @fabriziopandini. @chrischdi would you be interested in writing a short blog post about this with me?

@chrischdi
Copy link
Member

Hi @PushkarJ, seems like I've missed the message.

If you are still interested in writing a short blog post, I'd be happy to help 😀

@killianmuldoon killianmuldoon added the area/clusterclass Issues or PRs related to clusterclass label May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clusterclass Issues or PRs related to clusterclass area/security Issues or PRs related to security kind/feature Categorizes issue or PR as related to a new feature. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/security Categorizes an issue or PR as relevant to SIG Security. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

6 participants