default aws-k8s-cni config doesn't work #747

bcressey · 2020-02-18T04:20:56Z

Image I'm using:
Local build.

What I expected to happen:
I applied the config for aws-k8s-cni v1.6.0 from INSTALL.md. I expected the CNI plugin to be configured on my node.

What actually happened:
aws-node ended up in CrashLoopBackoff.

How I worked around the problem:

kubectl patch daemonset aws-node -n kube-system -p \
  '{"spec": {"template": {"spec": {"volumes": [{"hostPath": {"path": "/run/containerd/containerd.sock", "type": "Socket"}, "name":"dockershim"}]}}}}'

The text was updated successfully, but these errors were encountered:

bcressey · 2020-02-18T04:21:49Z

Not sure why this problem didn't turn up in the testing for #739.

@etungsten can you take a look?

etungsten · 2020-02-19T18:40:49Z

I can't seem to replicate the issue. Launched a new cluster with bottlerocket nodes (v0.2.1 AMI) and my aws-node pods seems to come up ok:

Events:
  Type    Reason     Age    From                              Message
  ----    ------     ----   ----                              -------
  Normal  Scheduled  3m55s  default-scheduler                              Successfully assigned kube-system/aws-node-bsmvw to SNIP
  Normal  Pulling    3m55s  kubelet, SNIP                              Pulling image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:v1.6.0"
  Normal  Pulled     3m42s  kubelet, SNIP                              Successfully pulled image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:v1.6.0"
  Normal  Created    3m39s  kubelet, SNIP                              Created container aws-node
  Normal  Started    3m39s  kubelet, SNIP                              Started container aws-node

The Daemonset for aws-node I have is as follows (relevant section):

...
  "spec": {
    "selector": {
      "matchLabels": {
        "k8s-app": "aws-node"
      }
    },
    "template": {
      "metadata": {
        "creationTimestamp": null,
        "labels": {
          "k8s-app": "aws-node"
        }
      },
      "spec": {
        "volumes": [
          {
            "name": "cni-bin-dir",
            "hostPath": {
              "path": "/opt/cni/bin",
              "type": ""
            }
          },
          {
            "name": "cni-net-dir",
            "hostPath": {
              "path": "/etc/cni/net.d",
              "type": ""
            }
          },
          {
            "name": "log-dir",
            "hostPath": {
              "path": "/var/log",
              "type": ""
            }
          },
          {
            "name": "dockersock",
            "hostPath": {
              "path": "/var/run/docker.sock",
              "type": ""
            }
          },
          {
            "name": "dockershim",
            "hostPath": {
              "path": "/var/run/dockershim.sock",
              "type": ""
            }
          }
        ],
...

Are there more logs that can tell us more?

etungsten · 2020-02-19T20:45:52Z

I was able to replicate the issue by restarting the aws-node pod.
The issue was not appearing initially because ipamd returned early when no pods are returned by the API.

jahkeup · 2020-02-27T20:04:56Z

I chatted a bit more with @bcressey about the long term solution and I have another suggestion that might set us up on a path to get there.

In the short term, I think we are best off making socket paths match CNI plugin's expectation rather (ie: a socket exists at /var/run/dockershim.sock) than requiring cluster administrators to run a patch command or provide a.yaml to use the "correct" path (eg: /var/run/containerd/containerd.sock). We can later lift this path into a setting to become a user controlled path - but that's up to future us to choose how we handle. There may be other solutions that might become viable which may change how we approach exposing or changing the socket down the road.

As it stands are a few other methods, other than the ones mentioned above, that could support the default aws-k8s-cni config OOTB - @etungsten, @jhaynes, and I discussed some offline - but found none of them to be great. Some of them also would require changes to support the given approach.

Some options we considered:

We could use a containerd.socket unit to then bind mount into other socket locations (eg: /var/run/dockershim.sock, /var/run/cri.sock) but containerd doesn't support the use of systemd's .socket units.
We could not rely on containerd.sock and try to run something like mount -t none -o bind ..., but that has its own host of problems we'd have to address like process lifecycle impact and stale mounts.
We could suggest a change for the CNI plugins itself to support more socket paths (by mounting in /var/run and picking a socket) but that doesn't address the wider set of deployments that may be run on Bottlerocket.

Changing containerd's socket path to be docker's CRI socket path certainly isn't ideal, but changing the socket path to be /var/run/dockershim.sock is small in terms of lines changed and doesn't affect Bottlerocket internally.

Changing containerd's gRPC socket path to /var/run/dockershim.sock eliminates significant potential for confusion (especially as we can't know what podspecs folks are using) and starting the unix domain socket at /var/run/dockershim.sock provides a simple & straightforward way to support the AWS VPC CNI plugin (and other such pods) without additional changes. This comes only with the caveat that this works with the assumption that other software doesn't take the socket path into consideration and that they only care about having a CRI socket to talk to - not caring about which runtime is on the other end.

@bcressey let me know if I missed anything or made anything less clear than it was in our conversation.

jahkeup · 2020-02-27T20:15:55Z

Oh, and to be clear: the changes needed in Bottlerocket should be limited to:

containerd.toml
kubelet.service - just needs to match ^^

jahkeup · 2020-02-27T21:42:05Z

After #796 introduces the short term solution we will make long term goals and action items in #801

bcressey assigned etungsten Feb 18, 2020

etungsten mentioned this issue Feb 27, 2020

kubelet, containerd: replace 'containerd.sock' with 'dockershim.sock' #796

Merged

etungsten closed this as completed in #796 Feb 27, 2020

samuelkarp mentioned this issue Sep 24, 2020

Container runtime: Kubelet says its running dockershim.sock but the runtime is containerd on kubelet get nodes #1126

Closed

tomfotherby mentioned this issue Oct 30, 2020

How can I run docker build in a container running on Bottlerocket? #1189

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

default aws-k8s-cni config doesn't work #747

default aws-k8s-cni config doesn't work #747

bcressey commented Feb 18, 2020

bcressey commented Feb 18, 2020

etungsten commented Feb 19, 2020 •

edited

Loading

etungsten commented Feb 19, 2020 •

edited

Loading

jahkeup commented Feb 27, 2020

jahkeup commented Feb 27, 2020

jahkeup commented Feb 27, 2020

default aws-k8s-cni config doesn't work #747

default aws-k8s-cni config doesn't work #747

Comments

bcressey commented Feb 18, 2020

bcressey commented Feb 18, 2020

etungsten commented Feb 19, 2020 • edited Loading

etungsten commented Feb 19, 2020 • edited Loading

jahkeup commented Feb 27, 2020

jahkeup commented Feb 27, 2020

jahkeup commented Feb 27, 2020

etungsten commented Feb 19, 2020 •

edited

Loading

etungsten commented Feb 19, 2020 •

edited

Loading