-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rancher-Desktop [Alpine] can't create cluster with v0.20.0 [Previously Also Colima] #3277
Comments
|
EDIT: updating this early comment to note that Colima is fixed via #3277 (comment), just upgrade to v0.6.0 colima This is an issue with the host environment presumably with --cgroupns=private. colima is @abiosoft |
I still don't recommend alpine / openrc for container hosts vs essentially any distro with systemd. It's unfortunate that we can't even start the container with these options. you could probably more immediately work around this by using lima with an Ubuntu guest VM |
Oh, I'm having the same problem, my environment is in GithubAction that using colima to start docker on MacOS runner. |
@BenTheElder I've tried with ubuntu layer (colima has this flag:
These are the cgroup mounts inside the VM:
|
uname is still showing alpine kernel and openrc is still showing up even though Ubuntu doesn't use it, I don't think that flag is changing the guest VM |
From the lima FAQ I think it only provides an Ubuntu userspace environment and doesn't allow customizing the underlying Guest OS / kernel / ... So I think colima will always be alpine / openrc unfortunately and subject to bugs like this. See also past discussion abiosoft/colima#291 (comment) abiosoft/colima#163 ... I think https://github.com/lima-vm/lima/blob/master/examples/docker-rootful.yaml would be an Ubuntu + typical docker host env on lima. |
I'd also strongly recommend moving to a guest environment that uses cgroup v2 sooner than later, as the ecosystem is poised to drop v1 (I'd guess in the next year or so) and we can't do much about that. Ubuntu, Debian, Docker desktop, Fedora, ... most linux environments have switched for some time now. If we can't get this resolved with some patch to colima to enable working cgroups=private containers, we can consider reverting to not require cgroupns=private, but it adds back a third much more broken cgroups nesting environment (cgroup v1, host cgroupns) that we'd otherwise planned to phase out now that docker has supported cgroupns=private for a few years now and podman likewise (also the default on cgroups v2). |
typo: s/lima/colima/ π
The ecosystem of runc, containerd, etc. isn't likely to drop v1 before 2029 (EL8 EOL). |
sorry, yes! same comment suggests lima with ubuntu / docker guest π
Kubernetes has been discussing it already and I believe systemd but it's good to know some of the others won't. π |
Is there a KEP? |
We also have a lot of DNS issues with Lima due to use Alpine. I really wish they would move away from a musl based operating system. |
Lima defaults to Ubuntu...
Using Alpine is a choice by downstream, mostly for size reasons. I don't know of an apk distro using systemd/glibc instead of openrc/musl, but I suppose it is possible (or maybe use Debian, it is also smaller) |
I remember spending a lot of hours with For instance trying to figure out if I can use This works and I can create For full context: I use Now, I'm not sure why (haven't found the place in code that would explain the difference between details...
and the underlying network interface
with details...
this way I can't get the traffic into the cluster using EDIT: the reason for this is most likely docker in lima ubuntu VM using cgroup v2, which causes kind network to land in a separate net namespace (but that's a guess). Not sure how could I then make the traffic get routed inside kind's network (and then its container).
|
As for the issue at hand: I understand that with #3241 the ship might have already sailed but perhaps we might still consider using the provider info |
Same error happens with Rancher Desktop that is using lima under the hood |
Experiencing the same on Rancher Desktop. Downgrading to kind 0.19.0 fixes the issue for now. Would be great to get a fix for 0.20.0. The issue I see on Rancher Desktop using Kind 0.20.0 is the following: $ kind create cluster --name test-cluster --image kindest/node:v1.27.3
Boostrapping clusterβ¦
Creating cluster "test-cluster" ...
β Ensuring node image (kindest/node:v1.27.3) πΌ
β Preparing nodes π¦
Deleted nodes: ["eks-cluster-control-plane"]
ERROR: failed to create cluster: command "docker run --name test-cluster-control-plane --hostname test-cluster-control-plane --label io.x-k8s.kind.role=control-plane --privileged --security-opt seccomp=unconfined --security-opt apparmor=unconfined --tmpfs /tmp --tmpfs /run --volume /var --volume /lib/modules:/lib/modules:ro -e KIND_EXPERIMENTAL_CONTAINERD_SNAPSHOTTER --detach --tty --label io.x-k8s.kind.cluster=test-cluster --net kind --restart=on-failure:1 --init=false --cgroupns=private --publish=127.0.0.1:50566:6443/TCP -e KUBECONFIG=/etc/kubernetes/admin.conf kindest/node:v1.27.3" failed with error: exit status 125
Command Output: 82623b67d511c7e10ed075323e621ec66befa9047e3c7b56647ca99fd78e0db6
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "cgroup" to rootfs at "/sys/fs/cgroup": mount cgroup:/sys/fs/cgroup/openrc (via /proc/self/fd/7), flags: 0xe, data: openrc: invalid argument: unknown. |
Inability to create a container with this docker 20.10.0 feature from 2020-12-08 is still considered a bug in colima / rancher desktop. I'd like to hear a response from those projects before we revert anything. Ensuring private cgroupns is a big benefit for the project. |
The point of setting this flag is to ensure that this is set on cgroupv1 hosts. cgroupv2 hosts already default to this. cgroupv1 hosts are the problem. On hosts other than apline/colima/rancher desktop this works great. Alpine and colima / rancher desktop use an unusual init system that doesn't seem to set this up properly. |
You may have some eBPF component in the path (which are attached to cgroup2), which without unsharing cgroup2 will attach bits to your host namespace that were meant to go on nodes, thus creating incidental routability. I had a similar issue forwarding ports in kind with Cilium. |
Yeah, same issue here. |
Yup did same. |
Addresses: aws-controllers-k8s/community#1903 Description of changes: - Bump kind version to `0.19.0` [avoiding `0.20.0` for now - [multiple users reported a bug when using in dind ](kubernetes-sigs/kind#3277)] - Bump `k8s` to `1.28.0` - Rebuild and publish a new integration image (containing the `kind` binary) By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Colima v0.6.0 supports kind https://github.com/abiosoft/colima/releases/tag/v0.6.0 |
Thanks @abiosoft! |
@abiosoft does this mean it now also works with latest Rancher Desktop? |
@marcofranssen No, it does not. colima switched from Alpine to Ubuntu to avoid the issue, but Rancher Desktop still uses Alpine. The best you can do on Rancher Desktop right now is to use k3d instead of |
Off-topic question, but why not use Rancher Desktop's Kubernetes? π |
For me the only reason to use Eventually there should be a config setting in Rancher Desktop to allow multiple nodes. Personally I've also wanted a mixed-architecture cluster with both amd64 and arm64 nodes, but that is more for fun than actual need... π |
Multi-node is one of the common reasons I see versus the bundled k8s in containers-in-a-vm solutions, the other is more control over the k8s version used. |
To add one more data point to the issues with Alpine (under Rancher Desktop), this is the output that I get from kind after it fails to work...
|
Right, there's discussion of this above I can't run rancher desktop at work (VM policy) so I'd appreciate others that use rancher desktop debugging this issue. |
Er and to clarify we have code specifically to ensure things run smoothly on non-systemd hosts:
However, on these particular alpine based hosts we seem to be unable to make mounts, which doesn't make sense. With cgroupns enabled we're getting our own view of cgroups and with privileged we should have permission to make mounts (see e.g. the remount /sys ro earlier in the logs). It's possible we can't make this mount in any environment and receive it as a function of systemd being on the host on other hosts, this requires more root-cause debugging. I still haven't had time to dig into this myself, currently focused on some follow-ups around https://kubernetes.io/blog/2023/08/31/legacy-package-repository-deprecation/, and this is somewhat outside of @aojea's usual wheelhouse. In the meantime I recommend lima w/ ubuntu docker profile or colima as free alternatives to docker desktop that work with kind. I would appreciate help in investigating this bug. cgroupns will be default on cgroupsv2 hosts under all major container runtimes and is enabled for good reasons, so just reverting enabling cgroupns in an attempt to unbreak alpine isn't a very good option (note: rancher desktop is on v2 with cgroupns enabled by default now anyhow), but I'd love to see other suggested fixes or debugging work from anyone else invested in this support. |
Just wanted to give a quick heads-up that the issue seems to be fixed by Alpine 3.19 (most likely due to the update to OpenRC 0.51+, which has fixed the "unified" cgroups layout): $ kind create cluster
Creating cluster "kind" ...
β Ensuring node image (kindest/node:v1.27.3) πΌ
β Preparing nodes π¦
β Writing configuration π
β Starting control-plane πΉοΈ
β Installing CNI π
β Installing StorageClass πΎ
Set kubectl context to "kind-kind"
You can now use your cluster with:
kubectl cluster-info --context kind-kind
Have a nice day! π
$ k get no
NAME STATUS ROLES AGE VERSION
kind-control-plane NotReady control-plane 11s v1.27.3 So this issue can probably be closed, unless you want to wait until a version of Rancher Desktop with Alpine 3.19 is out for verification. That is probably not going to happen until early March though. |
/close let's close it here, is not anything else we can do and you provided a solution |
@aojea: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This issue is closed, but there is still an open issue in rancher desktop - it's hidden in the collapsed comments, so linking it here again rancher-sandbox/rancher-desktop#5092 |
Circling back, we have reports of rancher desktop + kind v0.23 working in https://kubernetes.slack.com/archives/CEKK1KTN2/p1723583621985329?thread_ts=1723579586.749849&cid=CEKK1KTN2 FYI @jandubois π NOTE: you may still run into issues from https://kind.sigs.k8s.io/docs/user/known-issues/, in this case with many clusters, tuning inotify limits was required https://kind.sigs.k8s.io/docs/user/known-issues/#pod-errors-due-to-too-many-open-files (it might? be reasonable to bump the defaults in rancher desktop π ) |
What happened:
After updating to v0.20.0 I cannot create a cluster anymore.
I'm using Mac with colima
What you expected to happen:
No error and cluster creates successfully
How to reproduce it (as minimally and precisely as possible):
Environment:
kind version: (use
kind version
): v0.20.0Runtime info: (use
docker info
orpodman info
):OS (e.g. from
/etc/os-release
): Mac OS with colima VM./etc/os-release
from within the VM that hosts the docker daemon:The text was updated successfully, but these errors were encountered: