-
Notifications
You must be signed in to change notification settings - Fork 40.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vSphere cloud provider fails to attach a volume due to Unable to find VM by UUID #58927
Comments
@Xuxe You should remove Can you provide me the output of followings
From kbnnode01 VM, provide the output of following file.
For your reference, look at the following code
Later we are finding the VM using this UUID.
So nodeUUID should match with UUID set in the Regarding kubespray documentation - https://github.com/kubernetes-incubator/kubespray/blob/master/docs/vsphere.md, It is not applicable to 1.9 releases. Please refer https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/existing.html for up-to-date documentations. |
/assign divyenpatel |
Hey @divyenpatel ! Thanks so far :) I already found the deprecated "vm-uuid" in one issue while browsing the VMware Kubernetes repo and updated my config. But the issue persist.
Your requested outputs: Worker:
Master:
You see, the IDs are different. Do i miss something here? |
@Xuxe we need to figure out how nodes are getting registered with this UUID. |
here you are! The systemd file:
And the environment file for the master kubelet:
The env file for the worker node kubelet:
|
Hi, I have some issue. I'm deploy cluster using kubeadm (It's updated version from 1.8.x to 1.9.2) and vmware guide https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/existing.html |
@Xuxe I have installed Kubernetes Cluster using
I did not face any issue regarding "Unable to find VM by UUID", while attaching volume with node VM.
vSphere Cloud Provider is also able to locate the node VM using
|
For some linux distro we observed In 1.9 release, VCP no longer read and parse this UUID, VCP just relies on the |
In my case it has different UUID: [root@dpcloud01-et.ftc.ru ~]
# cat /sys/class/dmi/id/product_uuid
736E0342-777A-25A3-F32F-D3728309CFA8
[root@dpcloud01-et.ftc.ru ~]
# cat /sys/class/dmi/id/product_serial
VMware-42 03 6e 73 7a 77 a3 25-f3 2f d3 72 83 09 cf a8
[root@dpcloud01-et.ftc.ru ~]
# kubectl describe node dpcloud01-et | grep "System UUID"
System UUID: 736E0342-777A-25A3-F32F-D3728309CFA8
[root@dpcloud01-et.ftc.ru ~] May be it's vmware template settings. |
@sberd Can you share OS detail? Do you have VMware tools installed on the VM? |
One thing i have observed is my uuid from the dmidecode command is different from /sys/class/dmi/id/product_serial. From my master:
My product id matches the system uuid from kubectl describe:
I have installed the open-vm-tools and they are active. |
@divyenpatel Is it enough?
|
One thing we are sure that we are not able to locate VMs from vCenter using UUID set in Let me get back to you after consulting vSphere experts on this. We need to know in what situations |
@sberd Which ESXi compatibility do you use? I discovered the issue seems to come from the ESXi compatibility 6.5+. I tested a CoreOS production iso to exclude a issue with Ubuntu. With ESXi compatibility v11 (6.0) both product_serial and product_uuid matching. But on a other CoreOS VM with ESXi compatibility v13 (6.5+) the serial and uuid does not match. Photon OS with Hardware compatibility v11 also works, both serial and uuid matching. |
@Xuxe @divyenpatel |
@Xuxe I have quickly tried out upgrading compatibility to "ESXi 6.5 and later (VM version 13)" on the VM after cloning from the template VM. I see serial and UUID match.
Guest OS is |
Environment
Test
It seems the UUIDs are being saved within the
So I guess when upgrading the VM version from anything older to the latest v13 the UUIDs within the VM stay the same, as it's saved within the .VMX file on the host. According to the VMware KB article:
I'm just wondering about where the different value of |
yep, after upgrading my v11 coreos box to v13 the UUID stays constant and both serial and uuid matching. But at least, a temporary workaround was found. Patchlevel:
|
@Xuxe I am not able to reproduce the issue you have observed with "ESXi 6.0 and later (VM version 11)" compatibility on coreos. Deployed core using the OVA - https://stable.release.core-os.net/amd64-usr/current/coreos_production_vmware_ova.ova Core OS Version
Checked
Both
I see no issue with How are you cloning VMs for creating Node VMs? I am manually cloning VMs from vCenter. |
I have not cloned my ubuntu vms both are fresh setups. To be clear, if you upgrade a vm from 6.0 and later to 6.5 there is no issue (the ids stay constant and good, i have not tested to clone a upgraded vm). Please try to create a new vm from begining based on 6.5 and later, now you should see the ids don't match. |
@divyenpatel @Xuxe [root@centos-test ~]# cat /sys/class/dmi/id/product_uuid
42037CBD-4CA8-BC5B-5509-0FC4D4B3223F
[root@centos-test ~]# cat /sys/class/dmi/id/product_serial
VMware-42 03 7c bd 4c a8 bc 5b-55 09 0f c4 d4 b3 22 3f
[root@centos-test ~]# cat /sys/class/dmi/id/product_serial | sed -e 's/^VMware-//' -e 's/-/ /' | awk '{ print toupper($1$2$3$4 "-" $5$6 "-" $7$8 "-" $9$10 "-" $11$12$13$14$15$16) }'
42037CBD-4CA8-BC5B-5509-0FC4D4B3223F |
As workaround I changed uuid.bios in vm, and everything worked well. May be it's not good solution for production, but it was my test installation. |
@sberd @Xuxe We installed SLE 12.3 from ISO with Reinstalled SLE 12.3 from ISO with |
@sberd Agree. Kubernetes vSphere Cloud Provider team will come up with the solution and update this issue. cc: @kubernetes/vmware |
We are working on this issue. Here is the internal VMware Issue: vmware-archive#450 |
I suffered the same issue on a clean install of kubernetes. I rebuilt the machines using the old Compatibility layer and all seemed to work I could provision disks. One question though is disk.enableUUID still required it seemed to work without it |
@plenderyou |
/assign abrarshivani |
/unassign divyenpatel |
/area platform/vsphere |
/sig area/platform/vsphere |
Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Report InstanceID for vSphere Cloud Provider as UUID obtained from product_serial file **What this PR does / why we need it**: vSphere Cloud Provider is not able to find the nodes for VMs created on vSphere v1.6.5. Kubelet fetches SystemUUID from file ```/sys/class/dmi/id/product_uuid```. vSphere Cloud Provider uses this uuid as VM identifier to get node information from vCenter. vCenter v1.6.5 doesn't recognize this uuids, as a result, nodes are not found. UUID present in file ```/sys/class/dmi/id/product_serial``` is recognized by vCenter. Yet, Kubelet doesn't report this. Therefore, in this PR InstanceID is reported as UUID which is fetched from file ```/sys/class/dmi/id/product_serial```. **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: Fixes #58927 **Special notes for your reviewer**: Internally review here: vmware-archive#452 Tested: Launched K8s cluster using kubeadm (Used Ubuntu VM compatible with vSphere version 6.5.) _**Note: Installed Ubuntu from ISO**_ Observed following: ``` Master > cat /sys/class/dmi/id/product_uuid 743F0E42-84EA-A2F9-7736-6106BB5DBF6B > cat /sys/class/dmi/id/product_serial VMware-42 0e 3f 74 ea 84 f9 a2-77 36 61 06 bb 5d bf 6b Node > cat /sys/class/dmi/id/product_uuid 956E0E42-CC9D-3D89-9757-F27CEB539B76 > cat /sys/class/dmi/id/product_serial VMware-42 0e 6e 95 9d cc 89 3d-97 57 f2 7c eb 53 9b 76 ``` With this fix controller manager was able to find the nodes. **controller manager logs** ``` {"log":"I0205 22:43:00.106416 1 nodemanager.go:183] Found node ubuntu-node as vm=VirtualMachine:vm-95 in vc=10.161.120.115 and datacenter=vcqaDC\n","stream":"stderr","time":"2018-02-05T22:43:00.421010375Z"} ``` **Release note**: ```release-note vSphere Cloud Provider supports VMs provisioned on vSphere v1.6.5 ```
@abrarshivani we should cherry pick this change to 1.9 release. |
Looks like the issue is still valid with vm compability 6.7.0 u3. |
I see System UUID set in the System Info of Node is matching with product_uuid and product_serial, however it does not match with ProvideID of the node. Due to this, i am getting no VM found error and node is unable to mound the volume. below are output for provider id and SystemUUID. all nodes were working fine but since last week, I started getting no VM found error only for two worker nodes. How do I update node ProviderID with right UUID? kubectl describe nodes | grep UUIDSystem UUID: 420B0194-7D17-98FD-BED3-42E7B8C0234D kubectl describe nodes | grep ProviderIDProviderID: vsphere://420b0194-7d17-98fd-bed3-42e7b8c0234d kubectl get nodes -o json | jq '.items[]|[.metadata.name, .spec.providerID, .status.nodeInfo.systemUUID]'[ |
This issue is still present in VSphere 6.5, the issue seems that in come cases product_uuid seems the source of truth and in other cases product_serial. There is one pattern that I could spot looking reading this thread though, and that is that the pairs are reversed. The first group of eight are reversed and then after that the next two groups of four are reversed. After that the last sixteen letters are in the correct order. |
us too, we encounter this issue. With SMBIOS 2.4, we get the same ids. Your can see your smbios version : we are running RHEL 7.6 on esxi 6.5.0. Here are some notes related to this issue : |
We see a very similar issue when F5 BigIp nodes are in the cluster. Removing the nodes resolves the issue. Not sure if our particular situation is a K8s, vSphere, or BigIP bug |
We have that issue with centos 7 and vSphere 6.7u3. We also have a f5 bigip VM on that VMware cluster, maybe it's linked, but I don't see how! This is a deep one. |
Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug
What happened:
I'm trying to deploy a Kubernetes v1.9.2 cluster on vSphere with the vSphere-Cloud-Provider.
All works fine except the persistent volume attachment.
With Kubernetes v1.8.7 the attachment works fine on the same VMs and vSphere environment.
What you expected to happen:
The volume attachment works without errors like "Cannot find node "kbnnode01" in cache. Node not found!!!" or "[datacenter.go:78] Unable to find VM by UUID. VM UUID: f7f53642-5cc2-ced1-37f1-c6b04522a27e" in the kube-controller log.
How to reproduce it (as minimally and precisely as possible):
Deploy a Kubernetes Cluster via Kubespray on vSphere based off the official kubernetes docs and kubespray docs:
And try to deploy the offical vSphere-Cloud-Provider test pods (persistent volume) provided here:
https://github.com/kubernetes/kubernetes/tree/master/examples/volumes/vsphere
Anything else we need to know?:
With Kubernets v1.8.7 the volume attachment works on the same VM and vSphere environment.
Both the CoreOS and vanilla Hyperkube images from Google are affected.
The issue remains after i have added the VM UUID to the cloud config.
Based on the VMware docs for the vSphere-Cloud-Provider (https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/existing.html) the provider should pick the UUID from /sys/class/dmi/id/product_serial if vm-uuid is unset or empty. But this ID is different from the IDs in my logs so it can't find the VM in vSphere.
product_serial from kbnnode01 is 42365F38-CF20-C79C-80D3-52363D75A0EF and from logs it is
f7f53642-5cc2-ced1-37f1-c6b04522a27e
Environment:
kubectl version
): v1.9.2uname -a
): Linux kbnmaster01 4.4.0-112-genericMy cloud config:
The text was updated successfully, but these errors were encountered: