Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OKD4.4-Installation fails on the latest version(32.20200601.3.0) of FCOS #229

Closed
ssoor opened this issue Jun 27, 2020 · 11 comments
Closed
Labels
triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@ssoor
Copy link

ssoor commented Jun 27, 2020

Installation fails on the latest version(32.20200601.3.0) of FCOS

After booting FCOS using bootstrap.ign, there is no response for a long time.
After logging in with ssh, the log shows that the "Permission denied" and "No such file or directory".

OKD version:
4.4.0-0.okd-2020-05-23-055148-beta5

iPXE config:

set base-url http://${next-server}/pxe/fedora-coreos

set kernal-url ${base-url}/fedora-coreos-32.20200601.3.0-live-kernel-x86_64
set initrd-url ${base-url}/fedora-coreos-32.20200601.3.0-live-initramfs.x86_64.img

set ign-url ${base-url}/bootstrap.ign
set image-url ${base-url}/fedora-coreos-32.20200601.3.0-metal.x86_64.raw.xz

kernel ${kernal-url} initrd=${initrd-url} console=tty0 console=ttyS1 ip=dhcp rd.neednet=1 coreos.inst=yes coreos.inst.install_dev=sda coreos.inst.image_url=${image-url} coreos.inst.ignition_url=${ign-url}
initrd ${initrd-url}
boot

How reproducible

100%

Log bundle

"Permission denied"
Here I solved this problem by executing the command: sudo chmod 0644 /etc/zincati/config.d/90-disable-feature.toml

Jun 27 02:32:14 localhost zincati[1585]: [INFO ] starting update agent (zincati 0.0.11)
Jun 27 02:32:14 localhost zincati[1585]: Error: Os { code: 13, kind: PermissionDenied, message: "Permission denied" }
Jun 27 02:32:14 localhost zincati[1585]: failed to open file '/etc/zincati/config.d/90-disable-feature.toml'
Jun 27 02:32:14 localhost zincati[1585]: failed to assemble configuration settings

"No such file or directory"
I don't know where to look for this file

Jun 27 02:38:43 localhost systemd[1]: Starting Kubernetes Kubelet...
Jun 27 02:38:43 localhost systemd[3641]: kubelet.service: Failed to execute command: No such file or directory
Jun 27 02:38:43 localhost systemd[3641]: kubelet.service: Failed at step EXEC spawning /usr/bin/hyperkube: No such file or directory
Jun 27 02:38:43 localhost systemd[1]: kubelet.service: Main process exited, code=exited, status=203/EXEC
Jun 27 02:38:43 localhost systemd[1]: kubelet.service: Failed with result 'exit-code'.
Jun 27 02:38:43 localhost systemd[1]: Failed to start Kubernetes Kubelet.
@LorbusChris
Copy link
Contributor

LorbusChris commented Jun 27, 2020

It looks like you're not using the installer but are instead trying to bootstrap the cluster directly from FCOS, before pivoting into OKD's machine-os-content? hyperkube is not present in the standard FCOS images, so before starting the bootstrap, one has to pivot into OKD's FCOS-based machine-os-content ostree, which contains it.

The Zincati problem seems like a second issue, probably related to coreos/fedora-coreos-tracker#392.
@vrutkovs it seems we are also missing https://github.com/openshift/machine-config-operator/pull/1297/files in MCO for masters/workers now after the recent rebase.

@ssoor
Copy link
Author

ssoor commented Jun 28, 2020

machine-os-content

@LorbusChris Thanks for your attention.
But I didn't find anything about machine-os-content in documentation.
The documents I refer to are mainly the following two links. Am I missing something?

I confirmed it again, and it seemed that nothing was missing.

https://github.com/openshift/installer/blob/master/docs/user/metal/install_upi.md
https://docs.okd.io/latest/installing/installing_bare_metal/installing-bare-metal.html#installing-bare-metal-three-node

@vrutkovs
Copy link
Member

vrutkovs commented Jun 28, 2020

Dupe of #215 for Zincati issue (it can safely be ignored - due to invalid config Zincati won't unexpectedly update the machine).

Kubelet problem needs more logs - see https://docs.okd.io/latest/installing/installing-troubleshooting.html, please attach resulting log bundle to the bug report

@vrutkovs vrutkovs added the triage/needs-information Indicates an issue needs more information in order to work on it. label Jun 28, 2020
@ssoor
Copy link
Author

ssoor commented Jul 1, 2020

@vrutkovs
Sorry for the insufficient information provided.
But I am not using the openshift-install create cluster command to create a cluster, I am installing bare metal.
Manually boot after generating files through the openshift-install create ignition-configs --dir=xxx command according to the documentation.

@cgruver
Copy link

cgruver commented Jul 1, 2020

@ssoor

Try this to get some logs: (Substitute the IP addresses of your Bootstrap and three Master Nodes, and use the install-dir from your note above)

openshift-install --dir=<your-install-dir> gather bootstrap --bootstrap 10.11.11.49 --master 10.11.11.60 --master 10.11.11.61 --master 10.11.11.62

I also had some issues deploying a cluster with OKD 4.5 and FCOS 32 this past weekend. I haven't had time to open an issue yet, so it will be interesting to see if your symptoms are similar.

@cgruver
Copy link

cgruver commented Jul 1, 2020

This is likely not related to the OKD 4.5 issue that I mentioned above.

I just ran a successful Bare Metal UPI install of OKD 4.4 Beta 5 using iPXE and booting from the FCOS 32 latest stable bare metal images. The cluster deployed as expected.

Try to get logs from the bootstrap & master nodes as described above.

@ssoor
Copy link
Author

ssoor commented Jul 3, 2020

@cgruver Thank.
I tried to install with version 4.5 and everything is normal, you can install successfully.

@ssoor ssoor closed this as completed Jul 3, 2020
@ahachmann
Copy link

I have the exact same problem. Even with OKD 4.5. Once started the bootstrap, I get the following error:
The unit var-lib-containers-storage-overlay.mount has successfully entered the 'dead' state. Mär 26 07:23:56 okd-bootstrap.lab.okd.hachmann.hamburg systemd[1]: kubelet.service: Found left-over process 4868 (conmon) in control group while starting unit. Ignoring. Mär 26 07:23:56 okd-bootstrap.lab.okd.hachmann.hamburg systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Mär 26 07:23:56 okd-bootstrap.lab.okd.hachmann.hamburg systemd[4917]: kubelet.service: Failed to execute command: No such file or directory Mär 26 07:23:56 okd-bootstrap.lab.okd.hachmann.hamburg systemd[4917]: kubelet.service: Failed at step EXEC spawning /usr/bin/hyperkube: No such file or directory ░░ Subject: Process /usr/bin/hyperkube could not be executed ░░ Defined-By: systemd ░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel

After having run started the fcos with bootstrap.ign file, it does not find the hypercube. I am not in the live environment anymore but on what has been placed on /dev/sda by coreos-installer install.

What is going wrong here? I have also tried OKD 4.6 and 4.7. All having the same issue. Also older versions of fedora are not working.

I was using the sudo coreos-installer command after live cd was booted.

How can I resolve this problem? What steps am I doing wrong?

Any help highly appreciated!

Regards,
Alex

@shaihulud-eu
Copy link

For me, the problem was in trying to cheat minimum requirements for bootsrap machine: as /run partition is scaled according to available RAM so installer wasn't able to copy files to /run/<..>machine-os-content posting: no space left on device.
Hope this helps.

Sorry for necroposting.

@riteshmishra00
Copy link

In my case bootstrap is working fine but whenever we're provisioning the master node, we're getting unknown error.
api-int.okd.inosys.com is connecting to IPv6 by default.
FCOS version: fedora-coreos-34.20210904.3.0-live.x86_64.iso
OKD Version: 4.8
Openshift Client Version: Client Version: 4.8.0-0.okd-2021-10-01-221835
Openshift Install Version: openshift-install 4.8.0-0.okd-2021-10-01-221835
built from commit 225720cc2adce8c2ee18aea45d474f442cbe0f78
release image quay.io/openshift/okd@sha256:8fac6b281a9a319be5134955403f58b8c1644816ec9f7d28521276ec2ee1e2d3

We have also tried to install OKD 4.6 and 4.7 using FCOS version 35,34 and 33 but everywhere it is throwing the same issue.

Please help me on this if you can.

image

@cgruver
Copy link

cgruver commented Oct 7, 2021

@riteshmishra00

This is a closed issue. It looks like your problem is unrelated to this original issue. You will get more views on this if you open a new issue. Include log files and your config info.

Also, do a quick search through the other open issues. You might be hitting something that others are too, or you might just have a misconfiguration on your network.

binnes added a commit to binnes/okd that referenced this issue Jul 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

7 participants