Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stuck when creating a cluster by kind #3683

Closed
ss00atbupt opened this issue Jul 12, 2024 · 13 comments
Closed

stuck when creating a cluster by kind #3683

ss00atbupt opened this issue Jul 12, 2024 · 13 comments
Labels
kind/support Categorizes issue or PR as a support question.

Comments

@ss00atbupt
Copy link

ss00atbupt commented Jul 12, 2024

what happened
I tried to create a cluster by kind, it went well before i met this:

kind create cluster --image kindest/node:v1.25.3 --config kind-cluster.yaml --retain
Creating cluster "kind" ...
✓ Ensuring node image (kindest/node:v1.25.3) 🖼
✓ Preparing nodes 📦 📦 📦
⢎⡱ Configuring the external load balancer ⚖️

Then I waited for hours, it's still the same.
So next time when I'm creating a kind cluster, I used kind export logs in another command line window to see what's going on.but the process of exporting log is stuck too:

Exporting logs for cluster "kind" to:
/tmp/1019242815

I checked /tmp/1019242815/kind-worker/alternatives.log (the only file except kind-version.txt)
Here's the log content:

update-alternatives 2024-07-12 09:57:27: run with --set iptables /usr/sbin/iptables-legacy
update-alternatives 2024-07-12 09:57:27: status of link group /usr/sbin/iptables set to manual
update-alternatives 2024-07-12 09:57:27: link group iptables updated to point to /usr/sbin/iptables-legacy
update-alternatives 2024-07-12 09:57:27: run with --set ip6tables /usr/sbin/ip6tables-legacy
update-alternatives 2024-07-12 09:57:27: status of link group /usr/sbin/ip6tables set to manual
update-alternatives 2024-07-12 09:57:27: link group ip6tables updated to point to /usr/sbin/ip6tables-legacy

Then I run top -u admin:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
426466 admin 20 0 162952 5420 3812 R 0.7 0.0 0:00.05 top
386407 admin 20 0 718500 8888 5692 S 0.3 0.0 0:00.66 kind
208867 admin 20 0 721080 8744 4488 S 0.0 0.0 0:06.49 test
208981 admin 20 0 116616 4632 3152 S 0.0 0.0 0:00.11 bash
287255 admin 20 0 718136 8272 5476 S 0.0 0.0 0:00.08 test
287318 admin 20 0 116616 4740 3268 S 0.0 0.0 0:00.07 bash
388462 admin 20 0 1882128 36508 25572 S 0.0 0.0 0:00.10 docker
388463 admin 20 0 1939724 35764 25248 S 0.0 0.0 0:00.09 docker
388464 admin 20 0 1980960 36832 26416 S 0.0 0.0 0:00.12 docker

It turns out no process is running,

What i expected to happen:
Cluster can be created successfully.

How to reproduce it
My kind-cluster.yaml:

#cluster with 3 control-plane nodes and 3 workers
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:

  • role: control-plane
  • role: worker
  • role: worker

Run kind create cluster --image kindest/node:v1.25.3 --config kind-cluster.yaml

Environment:

  1. docker:20.10.24
  2. kind:v0.17.0
  3. kubectl:1.27.4
  4. Cluster API:1.6.3
  5. OS: alios 7.2
    no special environment settings

Thanks for any help

@ss00atbupt ss00atbupt changed the title stuck stuck when creating a cluster by kind Jul 12, 2024
@stmcginnis
Copy link
Contributor

Please fill in the requirement info from the issue template as this is more than likely something specific to your environment and it will be hard to diagnose otherwise.

**Environment:**

- kind version: (use `kind version`):
- Runtime info: (use `docker info`, `podman info` or `nerdctl info`):
- OS (e.g. from `/etc/os-release`):
- Kubernetes version: (use `kubectl version`):
- Any proxies or other special environment settings?:

Though I do see you mention you are running kind v0.17.0. That is a quite old version at this point, and I would not be surprised if it no longer works reliably. So my first suggestion would be to upgrade to the latest release and use a newer version of k8s if possible. v1.25.3 is quite old as well at this point.

@ss00atbupt
Copy link
Author

Hello, I updated my Environment part, thanks for your reply!

@ss00atbupt
Copy link
Author

I updated my kind version to 0.23.0,my environment is as follow:

  • kind version: 0.23.0
  • Runtime info: Docker: 20.10.24
  • OS : alios 7.2
  • Kubectl version: 1.27.4

After I updated, I run kind create cluster --image kindest/node:v1.25.3 --config kind-cluster.yaml again, now kind is running but stuck half an hour at :

Creating cluster "kind" ...
✓ Ensuring node image (kindest/node:v1.25.3) 🖼
✓ Preparing nodes 📦 📦 📦
⠎⠁ Writing configuration 📜

What should I do now, thanks for any help.

@aojea
Copy link
Contributor

aojea commented Jul 15, 2024

do you have enough resources as in RAM memory and CPU?

@ss00atbupt
Copy link
Author

I run top, results are as follow:

top - 17:09:31 up 276 days, 2:03, 0 users, load average: 2.73, 1.43, 0.88
Tasks: 974 total, 1 running, 492 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.5 us, 0.2 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 19715419+total, 88391392 free, 73011104 used, 35751700 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 12273000+avail Mem

PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND   

14157 root 20 0 8046352 154788 66116 S 9.9 0.1 6762:13 uniagent
79041 root 20 0 1047104 238428 35836 S 9.6 0.1 8089:29 manager
47157 root 20 0 6110472 57284 35048 S 7.6 0.0 10866:11 calico-n+
89582 root 10 -10 345544 112380 22364 S 5.0 0.1 8448:59 AliYunDu+
40053 polkitd 20 0 6422804 506952 35432 S 4.3 0.3 3361:38 kube-con+
14854 root 20 0 7067404 122584 63644 S 2.6 0.1 13254:09 kubelet
3243 root 39 19 0 0 0 S 1.3 0.0 1255:42 kipmi0
33761 65532 20 0 744968 43084 22196 S 1.3 0.0 1304:24 manager
14368 root 20 0 778488 58644 25052 S 1.0 0.0 2656:57 walle
24319 root 20 0 1385848 660352 21504 S 1.0 0.3 547:24.37 rapt-dae+
31687 root 20 0 6487820 42512 22944 S 1.0 0.0 1197:44 pytorch-+
4197 root 20 0 5534000 105332 40708 S 0.7 0.1 7334:01 containe+
46259 root 20 0 722756 16840 10156 S 0.7 0.0 762:14.57 containe+
2439511 admin 20 0 720056 8884 4616 S 0.7 0.0 0:16.04 test
2446513 admin 20 0 2325288 35064 26348 S 0.7 0.0 0:00.98 docker
2796840 root 20 0 757612 51760 28012 S 0.7 0.0 0:08.04 dfget
2858585 admin 20 0 162944 5320 3724 R 0.7 0.0 0:00
Run free:

          total        used        free      shared  buff/cache   available

Mem: 197154196 73050068 88352092 32152 35752036 122691044
Swap: 0 0 0
It means I have enough resources for kind, right?
I retried a lot of times, log for kind is always the same like I posted

update-alternatives 2024-07-12 09:57:27: run with --set iptables /usr/sbin/iptables-legacy
update-alternatives 2024-07-12 09:57:27: status of link group /usr/sbin/iptables set to manual
update-alternatives 2024-07-12 09:57:27: link group iptables updated to point to /usr/sbin/iptables-legacy
update-alternatives 2024-07-12 09:57:27: run with --set ip6tables /usr/sbin/ip6tables-legacy
update-alternatives 2024-07-12 09:57:27: status of link group /usr/sbin/ip6tables set to manual
update-alternatives 2024-07-12 09:57:27: link group ip6tables updated to point to /usr/sbin/ip6tables-legacy

@aojea
Copy link
Contributor

aojea commented Jul 15, 2024

do you have the image locally kindest/node:v1.25.3 or is still pulling it?

@BenTheElder
Copy link
Member

After I updated, I run kind create cluster --image kindest/node:v1.25.3 --config kind-cluster.yaml again, now kind is running but stuck half an hour at :

This isn't working because you're using an outdated image, please see the release notes and the note in the quickstart guide about image selection.

@BenTheElder
Copy link
Member

That image was published with kind v0.17, there have been major changes since then noted in the release notes.

@BenTheElder BenTheElder added the kind/support Categorizes issue or PR as a support question. label Jul 15, 2024
@stmcginnis
Copy link
Contributor

To expand on that, in the v0.23.0 release under the New Features heading you can see a list of the supported versions. v1.25.16 is still supported by kind, but that k8s release in general is no longer supported upstream. So I would recommend moving to a newer one if at all possible.

To do a little more validation of your environment, after installing the latest kind version, I would recommend just running a plain kind create cluster to verify with no extra configuration and customization it is able to spin up a cluster. If that works, delete it with kind delete cluster. If it is not able to work with the defaults, then there are likely environment issues on your host that need to be sorted out first before adding any extra configuration into the mix.

Then you could try kind create cluster --image kindest/node:v1.25.16 to still keep things simple, but using the major/minor release that you want. Again, if there are failures at this point, they would need to be investigated before adding more variables into the mix.

If both of those work, only then would I add in your configuration file to try to set up the full cluster config you are trying to use. If that fails you already know most of the basics are fine from those previous steps, so that will narrow down where the failure could be coming from and hopefully make it easier to focus your troubleshooting to find the root cause of the failure.

You didn't add the docker info output, but just from the version you noted, I think that's an older Docker release too. It looks like the current release is 27.0.3. You may end up needing to update that to something newer as well.

I've also had strange issues in the past where I needed to clean up my Docker environment with a docker system prune -a --volumes to get rid of old stopped containers, volumes, and other cruft left behind. That could be another useful troubleshooting step.

I'm also unsure of AliOS. I've never used it, but from what I understand it is sort of a fork of Android? If so, there may be some other challenges with using kind here. But some of the above steps can help to see if that is a problem.

@ss00atbupt
Copy link
Author

do you have the image locally kindest/node:v1.25.3 or is still pulling it?

Yes, I have the image locally kindest/node:v1.25.3.

@ss00atbupt
Copy link
Author

That image was published with kind v0.17, there have been major changes since then noted in the release notes.

Thanks, I'll try latest version.

@stmcginnis
Copy link
Contributor

I'm going to close this since it doesn't seem to be anything that requires action on the kind side. But it would be great to hear if updating to a newer image worked for you. If you did update but it did not solve your issue, please feel free to reopen and add more details.

/close

@k8s-ci-robot
Copy link
Contributor

@stmcginnis: Closing this issue.

In response to this:

I'm going to close this since it doesn't seem to be anything that requires action on the kind side. But it would be great to hear if updating to a newer image worked for you. If you did update but it did not solve your issue, please feel free to reopen and add more details.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question.
Projects
None yet
Development

No branches or pull requests

5 participants