-
Notifications
You must be signed in to change notification settings - Fork 296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Installer fail with IPI on vSphere with OKD 4.6 #392
Comments
Worked fine here on AWS - https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-launch-aws/1332245347673575424
What was the error message? Was bootstrap reachable? What was operator state? |
By "http_proxy", do you mean an egress proxy? |
@vrutkovs please find the error during the deployment below. In addition the hostname of the bootstrap and master node is fedora (it should be the name of node). time="2020-11-30T17:12:21+01:00" level=debug msg="module.master.vsphere_virtual_machine.vm[2]: Creation complete after 21s [id=42146e11-490d-1197-646d-70a724bc0b40]" |
@timbrd by http proxy I mean https://github.com/openshift/installer/blob/master/docs/user/customization.md
Btw, the same deployment work with 4.5.0-0.okd-2020-10-15-235428 |
Please find in attachment a 4.5 install that was successful "4.5.0-0.okd-2020-10-15-235428_openshift_install.log" Let me know if I can provide additional logs 4.6.0-0.okd-2020-11-27-200126_openshift_install.log |
and here is the output of hostnamectl on a working system: [root@iikstest-mjc75-master-0 ~]# hostnamectl |
and on the failed 4.6 install: [root@fedora ~]# hostnamectl |
and in attachement the output of journalctl -xfe on the 1st master (which has the vip of the api service) |
Invalid DNS namespace passed to master? Why can't it find vSphere API endpoint? |
What do you mean by dns namespace ? |
Are you referring to this error message 2020-11-30T17:46:37+01:00" level=error msg="Attempted to gather debug logs after installation failure: failed to get bootstrap and control plane host addresses from "/home/admin/okd_labhtest_install/terraform.tfstate": failed to lookup bootstrap ipv4 address: Post "https://gvavcenterva01p.gva.icrc.priv/sdk\": context deadline exceeded" If yes my understanding (and correct me if I m wrong) is that when the timeout is reached and the installation is considered as failed the installer will try to gather log file by connecting to the bootstrap and master nodes, and to connect to the node it try to get this information from the vcenter api but the vcenter api request doesn’t sucées. (Which is not really surprising as the node are not reporting the correct hostname to vcenter through their vmtools because the hostname is set to fedora) Maybe somebody can confirm if my understanding of this error message is correct ? |
I checked this error message in the source code of the installer and my understanding seems to be correct, please see the code below: ip, err := waitForVirtualMachineIP(client, moid) The function “waitforvirtualmachineip” query the vcenter api until it get an ip and if it fail will output the error message you have mentionned in you comment. And this is in line with what I see in vcenter, the bootstrap doesn’t report its ip in vcenter, maybe vmtools is not started ? |
I'm not too familiar with vSphere API details on this. Lets check this again when a valid hostname is set (see #394) |
I'll deploy a new cluster and set the hostname correclty in each VM with hostnamectl and let you know |
Upgrade from 4.5 to 4.6 breaks the machineset feature, we cant recreate node or scale up, all the new node are named fedora. (IPI with DHCP). There is no name reserv or hardcode to the DNS like in UPI. |
I'm not doing an upgrade, it is an install from scratch. |
I have done an updrade from 4.5 to 4.6, after the upgrade no new machine can get created because of the hostname (IPI with DHCP not sending the DNS name). |
@amelie1979, thanks for the input but I m not sure to understand how it is related to my issue ? |
I only add more info for the issue. :) |
@amelie1979 ah ok, sorry about my message. Thanks for your feedback. (i'm not used to open issue on github) |
I've re deployed a new cluster in 4.6 and i've set the hostname as soon as the vm finished to boot and the result is excatly the same.
I've attached the output of journalctl -xfe on the bootstrap node. Any idee on why the bootrap node doesn't start the vmtool service ? Here is the status of the vmtoolsd on one of the master node:
and the state on the bootstrap node:
|
Please don't dump all found issues in a single ticket, its really hard to track what's going on here.
This service doesn't get installed there. Why does bootstrap node need it? |
Ok no problem, please let me know how I should proceed to help to solve this issue.
In 4.5 it was installed, and I think it is "needed" by the installer to get the bootstrap ip to gather the log when the install fails and this is why we see the error message "Attempted to gather debug logs after installation failure: failed to get bootstrap and control plane host addresses from" |
Me too - 4.5 worked for me, 4.6 fails
|
I've tried the workaround mentioned in another issue (renaming the hostname with hostnamectl) and after a reboot of the 3 master the installation is proceeding. @vrutkovs Is there a estimated deadline for the fix for the hostname problem on vSphere ? |
FYI, this is 100% related to this: openshift/machine-config-operator#2289 |
Solved by #422 Success !!!
|
Upgrade worked (after running quite some time) but 4.6.0-0.okd-2020-12-12-135354 installer fails for me where 4.5.0-0.okd-2020-08-12-020541 was working.
|
Describe the bug
Installer fail with IPI on vSphere with OKD 4.6 with http_proxy. After review of the state of the master nodes it seems that the hostname is not set.
Version
4.6.0-0.okd-2020-11-27-135746
How reproducible
Can be reproduced 100% on my infrastructure
Log bundle
The installer is unable to gather the log bundle but I can get log directly from the node if needed
The text was updated successfully, but these errors were encountered: