Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

default aws-k8s-cni config doesn't work #747

Closed
bcressey opened this issue Feb 18, 2020 · 6 comments · Fixed by #796
Closed

default aws-k8s-cni config doesn't work #747

bcressey opened this issue Feb 18, 2020 · 6 comments · Fixed by #796
Assignees

Comments

@bcressey
Copy link
Contributor

Image I'm using:
Local build.

What I expected to happen:
I applied the config for aws-k8s-cni v1.6.0 from INSTALL.md. I expected the CNI plugin to be configured on my node.

What actually happened:
aws-node ended up in CrashLoopBackoff.

How I worked around the problem:

kubectl patch daemonset aws-node -n kube-system -p \
  '{"spec": {"template": {"spec": {"volumes": [{"hostPath": {"path": "/run/containerd/containerd.sock", "type": "Socket"}, "name":"dockershim"}]}}}}'
@bcressey
Copy link
Contributor Author

Not sure why this problem didn't turn up in the testing for #739.

@etungsten can you take a look?

@etungsten
Copy link
Contributor

etungsten commented Feb 19, 2020

I can't seem to replicate the issue. Launched a new cluster with bottlerocket nodes (v0.2.1 AMI) and my aws-node pods seems to come up ok:

Events:
  Type    Reason     Age    From                              Message
  ----    ------     ----   ----                              -------
  Normal  Scheduled  3m55s  default-scheduler                              Successfully assigned kube-system/aws-node-bsmvw to SNIP
  Normal  Pulling    3m55s  kubelet, SNIP                              Pulling image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:v1.6.0"
  Normal  Pulled     3m42s  kubelet, SNIP                              Successfully pulled image "602401143452.dkr.ecr.us-west-2.amazonaws.com/amazon-k8s-cni:v1.6.0"
  Normal  Created    3m39s  kubelet, SNIP                              Created container aws-node
  Normal  Started    3m39s  kubelet, SNIP                              Started container aws-node

The Daemonset for aws-node I have is as follows (relevant section):

...
  "spec": {
    "selector": {
      "matchLabels": {
        "k8s-app": "aws-node"
      }
    },
    "template": {
      "metadata": {
        "creationTimestamp": null,
        "labels": {
          "k8s-app": "aws-node"
        }
      },
      "spec": {
        "volumes": [
          {
            "name": "cni-bin-dir",
            "hostPath": {
              "path": "/opt/cni/bin",
              "type": ""
            }
          },
          {
            "name": "cni-net-dir",
            "hostPath": {
              "path": "/etc/cni/net.d",
              "type": ""
            }
          },
          {
            "name": "log-dir",
            "hostPath": {
              "path": "/var/log",
              "type": ""
            }
          },
          {
            "name": "dockersock",
            "hostPath": {
              "path": "/var/run/docker.sock",
              "type": ""
            }
          },
          {
            "name": "dockershim",
            "hostPath": {
              "path": "/var/run/dockershim.sock",
              "type": ""
            }
          }
        ],
...

Are there more logs that can tell us more?

@etungsten
Copy link
Contributor

etungsten commented Feb 19, 2020

I was able to replicate the issue by restarting the aws-node pod.
The issue was not appearing initially because ipamd returned early when no pods are returned by the API.

@jahkeup
Copy link
Member

jahkeup commented Feb 27, 2020

I chatted a bit more with @bcressey about the long term solution and I have another suggestion that might set us up on a path to get there.

In the short term, I think we are best off making socket paths match CNI plugin's expectation rather (ie: a socket exists at /var/run/dockershim.sock) than requiring cluster administrators to run a patch command or provide a.yaml to use the "correct" path (eg: /var/run/containerd/containerd.sock). We can later lift this path into a setting to become a user controlled path - but that's up to future us to choose how we handle. There may be other solutions that might become viable which may change how we approach exposing or changing the socket down the road.

As it stands are a few other methods, other than the ones mentioned above, that could support the default aws-k8s-cni config OOTB - @etungsten, @jhaynes, and I discussed some offline - but found none of them to be great. Some of them also would require changes to support the given approach.

Some options we considered:

  • We could use a containerd.socket unit to then bind mount into other socket locations (eg: /var/run/dockershim.sock, /var/run/cri.sock) but containerd doesn't support the use of systemd's .socket units.

  • We could not rely on containerd.sock and try to run something like mount -t none -o bind ..., but that has its own host of problems we'd have to address like process lifecycle impact and stale mounts.

  • We could suggest a change for the CNI plugins itself to support more socket paths (by mounting in /var/run and picking a socket) but that doesn't address the wider set of deployments that may be run on Bottlerocket.

Changing containerd's socket path to be docker's CRI socket path certainly isn't ideal, but changing the socket path to be /var/run/dockershim.sock is small in terms of lines changed and doesn't affect Bottlerocket internally.

Changing containerd's gRPC socket path to /var/run/dockershim.sock eliminates significant potential for confusion (especially as we can't know what podspecs folks are using) and starting the unix domain socket at /var/run/dockershim.sock provides a simple & straightforward way to support the AWS VPC CNI plugin (and other such pods) without additional changes. This comes only with the caveat that this works with the assumption that other software doesn't take the socket path into consideration and that they only care about having a CRI socket to talk to - not caring about which runtime is on the other end.


@bcressey let me know if I missed anything or made anything less clear than it was in our conversation.

@jahkeup
Copy link
Member

jahkeup commented Feb 27, 2020

Oh, and to be clear: the changes needed in Bottlerocket should be limited to:

  • containerd.toml
  • kubelet.service - just needs to match ^^

@jahkeup
Copy link
Member

jahkeup commented Feb 27, 2020

After #796 introduces the short term solution we will make long term goals and action items in #801

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants