vSphere cloud provider fails to attach a volume due to Unable to find VM by UUID #58927

Xuxe · 2018-01-28T14:29:29Z

Is this a BUG REPORT or FEATURE REQUEST?:
/kind bug

What happened:
I'm trying to deploy a Kubernetes v1.9.2 cluster on vSphere with the vSphere-Cloud-Provider.
All works fine except the persistent volume attachment.
With Kubernetes v1.8.7 the attachment works fine on the same VMs and vSphere environment.

What you expected to happen:
The volume attachment works without errors like "Cannot find node "kbnnode01" in cache. Node not found!!!" or "[datacenter.go:78] Unable to find VM by UUID. VM UUID: f7f53642-5cc2-ced1-37f1-c6b04522a27e" in the kube-controller log.

How to reproduce it (as minimally and precisely as possible):
Deploy a Kubernetes Cluster via Kubespray on vSphere based off the official kubernetes docs and kubespray docs:

And try to deploy the offical vSphere-Cloud-Provider test pods (persistent volume) provided here:
https://github.com/kubernetes/kubernetes/tree/master/examples/volumes/vsphere

Anything else we need to know?:
With Kubernets v1.8.7 the volume attachment works on the same VM and vSphere environment.
Both the CoreOS and vanilla Hyperkube images from Google are affected.
The issue remains after i have added the VM UUID to the cloud config.

Based on the VMware docs for the vSphere-Cloud-Provider (https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/existing.html) the provider should pick the UUID from /sys/class/dmi/id/product_serial if vm-uuid is unset or empty. But this ID is different from the IDs in my logs so it can't find the VM in vSphere.
product_serial from kbnnode01 is 42365F38-CF20-C79C-80D3-52363D75A0EF and from logs it is
f7f53642-5cc2-ced1-37f1-c6b04522a27e

Environment:

Kubernetes version (use kubectl version): v1.9.2
Cloud provider or hardware configuration: vSphere-Cloud-Provider on ESXi 6.5 and vCenter 6.5 with latest updates
OS (e.g. from /etc/os-release): Ubuntu 16.04
Kernel (e.g. uname -a): Linux kbnmaster01 4.4.0-112-generic
Install tools: Kubespray
Others: Used the RBAC Role provided in issue vSphere cloud provider broken with controller-manager v1.9.0 #57279 to fix the "Failed to list *v1.Node: nodes is forbidden" error

My cloud config:

[Global]
datacenter = "Falkenstein"
datastore = "datastore1"
insecure-flag = 1
password = "SECRET"
port = 443
server = "vcenter.xnet.local"
user = "kubernetes_svc@vsphere.local"
working-dir = "/Falkenstein/vm/Kubernetes/"
vm-uuid =

[Disk]
scsicontrollertype = pvscsi

The text was updated successfully, but these errors were encountered:

divyenpatel · 2018-01-29T17:43:48Z

@Xuxe vm-uuid in vsphere.conf is no longer used in Kubernetes 1.9.2 release. We have removed the unused code as in the PR: #58230

You should remove vm-uuid from vsphere.conf file.

Can you provide me the output of followings

kubectl describe node kbnnode01 | grep  "System UUID"

From kbnnode01 VM, provide the output of following file.

# cat /sys/class/dmi/id/product_serial

For your reference, look at the following code
https://github.com/kubernetes/kubernetes/blob/v1.9.2/pkg/cloudprovider/providers/vsphere/nodemanager.go#L77

nodeUUID := node.Status.NodeInfo.SystemUUID

Later we are finding the VM using this UUID.
https://github.com/kubernetes/kubernetes/blob/v1.9.2/pkg/cloudprovider/providers/vsphere/nodemanager.go#L165

vm, err := res.datacenter.GetVMByUUID(ctx, nodeUUID)

So nodeUUID should match with UUID set in the /sys/class/dmi/id/product_serial, else we will not be able to find the node VM.

Regarding kubespray documentation - https://github.com/kubernetes-incubator/kubespray/blob/master/docs/vsphere.md, It is not applicable to 1.9 releases.

Please refer https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/existing.html for up-to-date documentations.

divyenpatel · 2018-01-29T17:58:19Z

/assign divyenpatel

Xuxe · 2018-01-29T18:22:00Z

Hey @divyenpatel ! Thanks so far :)

I already found the deprecated "vm-uuid" in one issue while browsing the VMware Kubernetes repo and updated my config. But the issue persist.

[Global]
datacenters = "Falkenstein"
insecure-flag = 1
password = "SECRET"
port = 443
user = "kubernetes_svc@vsphere.local"

[VirtualCenter "vcenter.xnet.local"]
[Workspace]
datacenter = "Falkenstein"
server = "vcenter.xnet.local"
folder = "Kubernetes"
default-datastore = "datastore1"
resourcepool-path = "K8s-Pool"

[Disk]
scsicontrollertype = pvscsi

Your requested outputs:

Worker:

root@kbnmaster01:~# kubectl describe node kbnnode01 | grep  "System UUID"
 System UUID:                F7F53642-5CC2-CED1-37F1-C6B04522A27E

root@kbnnode01:~# cat /sys/class/dmi/id/product_serial
VMware-42 36 f5 f7 c2 5c d1 ce-37 f1 c6 b0 45 22 a2 7e

Master:

root@kbnmaster01:~# kubectl describe node kbnmaster01 | grep  "System UUID"
 System UUID:                385F3642-20CF-9CC7-80D3-52363D75A0EF

root@kbnmaster01:~# cat /sys/class/dmi/id/product_serial
VMware-42 36 5f 38 cf 20 c7 9c-80 d3 52 36 3d 75 a0 ef

You see, the IDs are different. Do i miss something here?

divyenpatel · 2018-01-29T20:45:17Z

@Xuxe we need to figure out how nodes are getting registered with this UUID.
Can you share parameters kubespray is setting up on the kubelet service on nodes?

Xuxe · 2018-01-29T20:59:19Z

@divyenpatel

here you are!

The systemd file:

[Unit]
Description=Kubernetes Kubelet Server
Documentation=https://github.com/GoogleCloudPlatform/kubernetes
After=docker.service
Wants=docker.socket

[Service]
EnvironmentFile=-/etc/kubernetes/kubelet.env
ExecStartPre=-/bin/mkdir -p /var/lib/kubelet/volume-plugins
ExecStart=/usr/local/bin/kubelet \
                $KUBE_LOGTOSTDERR \
                $KUBE_LOG_LEVEL \
                $KUBELET_API_SERVER \
                $KUBELET_ADDRESS \
                $KUBELET_PORT \
                $KUBELET_HOSTNAME \
                $KUBE_ALLOW_PRIV \
                $KUBELET_ARGS \
                $DOCKER_SOCKET \
                $KUBELET_NETWORK_PLUGIN \
                $KUBELET_VOLUME_PLUGIN \
                $KUBELET_CLOUDPROVIDER
Restart=always
RestartSec=10s

[Install]
WantedBy=multi-user.target

And the environment file for the master kubelet:

root@kbnmaster01:/etc/kubernetes/manifests# cat /etc/kubernetes/kubelet.env
# logging to stderr means we get it in the systemd journal
KUBE_LOGTOSTDERR="--logtostderr=true"
KUBE_LOG_LEVEL="--v=2"
# The address for the info server to serve on (set to 0.0.0.0 or "" for all interfaces)
KUBELET_ADDRESS="--address=0.0.0.0 --node-ip=10.125.75.170"
# The port for the info server to serve on
# KUBELET_PORT="--port=10250"
# You may leave this blank to use the actual hostname
KUBELET_HOSTNAME="--hostname-override=kbnmaster01"






KUBELET_ARGS="--pod-manifest-path=/etc/kubernetes/manifests \
--cadvisor-port=0 \
--pod-infra-container-image=gcr.io/google_containers/pause-amd64:3.0 \
--node-status-update-frequency=10s \
--docker-disable-shared-pid=True \
--client-ca-file=/etc/kubernetes/ssl/ca.pem \
--tls-cert-file=/etc/kubernetes/ssl/node-kbnmaster01.pem \
--tls-private-key-file=/etc/kubernetes/ssl/node-kbnmaster01-key.pem \
--anonymous-auth=false \
--cgroup-driver=cgroupfs \
--cgroups-per-qos=True \
--fail-swap-on=True \
--enforce-node-allocatable=""  --cluster-dns=10.233.0.3 --cluster-domain=cluster.local --resolv-conf=/etc/resolv.conf --kubeconfig=/etc/kubernetes/node-kubeconfig.yaml --register-with-taints=node-role.kubernetes.io/master=:NoSchedule --kube-reserved cpu=200m,memory=512M --node-labels=node-role.kubernetes.io/master=true  --feature-gates=Initializers=False,PersistentLocalVolumes=False  "
KUBELET_NETWORK_PLUGIN="--network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin"

KUBELET_VOLUME_PLUGIN="--volume-plugin-dir=/var/lib/kubelet/volume-plugins"

# Should this cluster be allowed to run privileged docker containers
KUBE_ALLOW_PRIV="--allow-privileged=true"
KUBELET_CLOUDPROVIDER="--cloud-provider=vsphere --cloud-config=/etc/kubernetes/cloud_config"

PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

The env file for the worker node kubelet:

root@kbnnode01:~# cat /etc/kubernetes/kubelet.env
# logging to stderr means we get it in the systemd journal
KUBE_LOGTOSTDERR="--logtostderr=true"
KUBE_LOG_LEVEL="--v=2"
# The address for the info server to serve on (set to 0.0.0.0 or "" for all interfaces)
KUBELET_ADDRESS="--address=0.0.0.0 --node-ip=10.125.75.171"
# The port for the info server to serve on
# KUBELET_PORT="--port=10250"
# You may leave this blank to use the actual hostname
KUBELET_HOSTNAME="--hostname-override=kbnnode01"






KUBELET_ARGS="--pod-manifest-path=/etc/kubernetes/manifests \
--cadvisor-port=0 \
--pod-infra-container-image=gcr.io/google_containers/pause-amd64:3.0 \
--node-status-update-frequency=10s \
--docker-disable-shared-pid=True \
--client-ca-file=/etc/kubernetes/ssl/ca.pem \
--tls-cert-file=/etc/kubernetes/ssl/node-kbnnode01.pem \
--tls-private-key-file=/etc/kubernetes/ssl/node-kbnnode01-key.pem \
--anonymous-auth=false \
--cgroup-driver=cgroupfs \
--cgroups-per-qos=True \
--fail-swap-on=True \
--enforce-node-allocatable=""  --cluster-dns=10.233.0.3 --cluster-domain=cluster.local --resolv-conf=/etc/resolv.conf --kubeconfig=/etc/kubernetes/node-kubeconfig.yaml --kube-reserved cpu=100m,memory=256M --node-labels=node-role.kubernetes.io/node=true  --feature-gates=Initializers=False,PersistentLocalVolumes=False  "
KUBELET_NETWORK_PLUGIN="--network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin"

KUBELET_VOLUME_PLUGIN="--volume-plugin-dir=/var/lib/kubelet/volume-plugins"

# Should this cluster be allowed to run privileged docker containers
KUBE_ALLOW_PRIV="--allow-privileged=true"
KUBELET_CLOUDPROVIDER="--cloud-provider=vsphere --cloud-config=/etc/kubernetes/cloud_config"

PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

sberd · 2018-01-30T05:14:50Z

Hi, I have some issue. I'm deploy cluster using kubeadm (It's updated version from 1.8.x to 1.9.2) and vmware guide https://vmware.github.io/vsphere-storage-for-kubernetes/documentation/existing.html
Node System UUID is equal /sys/class/dmi/id/product_uuid, not product_serial.

divyenpatel · 2018-01-30T06:15:11Z

@Xuxe I have installed Kubernetes Cluster using kubespray.

root@node-1:~# kubectl get nodes
NAME      STATUS    ROLES     AGE       VERSION
node-1    Ready     master    1h        v1.9.2+coreos.0
node-2    Ready     node      1h        v1.9.2+coreos.0
node-3    Ready     node      1h        v1.9.2+coreos.0
node-4    Ready     node      1h        v1.9.2+coreos.0

I did not face any issue regarding "Unable to find VM by UUID", while attaching volume with node VM.
I see System UUID set in the System Info of Node is matching with product_uuid and product_serial .

root@node-1:~# cat /sys/class/dmi/id/product_uuid
421C2FF3-C92F-05E6-16EF-E07D97394978

root@node-1:~# cat /sys/class/dmi/id/product_serial 
VMware-42 1c 2f f3 c9 2f 05 e6-16 ef e0 7d 97 39 49 78

root@node-1:~# kubectl describe node node-1 | grep "System UUID"
 System UUID:                421C2FF3-C92F-05E6-16EF-E07D97394978

vSphere Cloud Provider is also able to locate the node VM using System UUID. Verified volume attachment with node VM was successful.

{"log":"I0130 04:42:19.436744       1 operation_generator.go:308] AttachVolume.Attach succeeded for volume \"pvc-bad2c362-0577-11e8-bd99-0050569c2f8c\" (UniqueName: \"kubernetes.io/vsphere-volume/[vsanDatastore] f1ec5f5a-5c90-ce74-8d69-02002a623c85/kubernetes-dynamic-pvc-bad2c362-0577-11e8-bd99-0050569c2f8c.vmdk\") from node \"node-3\" \n","stream":"stderr","time":"2018-01-30T04:42:19.437109739Z"}

divyenpatel · 2018-01-30T06:24:19Z

@sberd

Node System UUID is equal /sys/class/dmi/id/product_uuid, not product_serial.

For some linux distro we observed /sys/class/dmi/id/product_uuid was not reported correctly, so we changed logic to read uuid from product_serial - See https://github.com/kubernetes/kubernete//pull/45311

In 1.9 release, VCP no longer read and parse this UUID, VCP just relies on the System UUID set on the node.

sberd · 2018-01-30T06:27:50Z

In my case it has different UUID:

[root@dpcloud01-et.ftc.ru ~]
# cat /sys/class/dmi/id/product_uuid
736E0342-777A-25A3-F32F-D3728309CFA8
[root@dpcloud01-et.ftc.ru ~]
# cat /sys/class/dmi/id/product_serial 
VMware-42 03 6e 73 7a 77 a3 25-f3 2f d3 72 83 09 cf a8
[root@dpcloud01-et.ftc.ru ~]
#  kubectl describe node dpcloud01-et | grep "System UUID"
 System UUID:                736E0342-777A-25A3-F32F-D3728309CFA8
[root@dpcloud01-et.ftc.ru ~]

May be it's vmware template settings.

divyenpatel · 2018-01-30T06:59:13Z

@sberd Can you share OS detail? Do you have VMware tools installed on the VM?

Xuxe · 2018-01-30T07:05:59Z

@divyenpatel

One thing i have observed is my uuid from the dmidecode command is different from /sys/class/dmi/id/product_serial.

From my master:

huebju@kbnmaster01:~$ sudo cat /sys/class/dmi/id/product_serial
VMware-42 36 5f 38 cf 20 c7 9c-80 d3 52 36 3d 75 a0 ef
huebju@kbnmaster01:~$ sudo dmidecode | grep "UUID"
        UUID: 385F3642-20CF-9CC7-80D3-52363D75A0EF
huebju@kbnmaster01:~$ kubectl describe node kbnmaster01 | grep "System UUID"
 System UUID:                385F3642-20CF-9CC7-80D3-52363D75A0EF
huebju@kbnmaster01:~$

My product id matches the system uuid from kubectl describe:

huebju@kbnmaster01:~$ sudo cat /sys/class/dmi/id/product_uuid
385F3642-20CF-9CC7-80D3-52363D75A0EF

root@kbnmaster01:~# /usr/bin/vmtoolsd -v
VMware Tools daemon, version 10.0.7.52125 (build-3227872)

I have installed the open-vm-tools and they are active.
What could cause this issue?

sberd · 2018-01-30T07:07:53Z

@divyenpatel Is it enough?

CentOS Linux release 7.4.1708 (Core) 
open-vm-tools.x86_64       10.1.5-3.el7

divyenpatel · 2018-01-30T07:22:39Z

Thank you @Xuxe @sberd

One thing we are sure that we are not able to locate VMs from vCenter using UUID set in /sys/class/dmi/id/product_uuid.

Let me get back to you after consulting vSphere experts on this. We need to know in what situations /sys/class/dmi/id/product_uuid and /sys/class/dmi/id/product_serial can differ or get synced?

Xuxe · 2018-01-30T10:40:18Z

@divyenpatel

@sberd Which ESXi compatibility do you use?

I discovered the issue seems to come from the ESXi compatibility 6.5+.

I tested a CoreOS production iso to exclude a issue with Ubuntu. With ESXi compatibility v11 (6.0) both product_serial and product_uuid matching. But on a other CoreOS VM with ESXi compatibility v13 (6.5+) the serial and uuid does not match.

Photon OS with Hardware compatibility v11 also works, both serial and uuid matching.
I can't test the v13 OVA for now, because i get a error during ova import. I have to check this.

sberd · 2018-01-30T10:45:38Z

@Xuxe @divyenpatel
I have the same compatibility 6.5+.

divyenpatel · 2018-01-30T15:38:23Z

@Xuxe
Interesting. I have "ESXi 6.0 and later (VM version 11)" compatibility set on node VMs and template VM used for cloning node VMs.

I have quickly tried out upgrading compatibility to "ESXi 6.5 and later (VM version 13)" on the VM after cloning from the template VM.

I see serial and UUID match.

# cat /sys/class/dmi/id/product_uuid
421C7FD2-223C-0287-FF6A-D12CB461DFA2
# cat /sys/class/dmi/id/product_serial
VMware-42 1c 7f d2 22 3c 02 87-ff 6a d1 2c b4 61 df a2

Guest OS is Ubuntu Linux (64-bit). I will check this out with CoreOS.

patschi · 2018-01-30T17:01:26Z

Environment
I was also testing some stuff with:

VMware vSphere Hypervisor 6.5

Host patchlevel:

[root@vmw-hv-01:~] vmware -vl
VMware ESXi 6.5.0 build-5969303
VMware ESXi 6.5.0 Update 1

Ubuntu 16.04 Desktop
- with VM version v13, so ESXi 6.5 and later

Test
The output of product_uuid seems not to be the same as product_serial:

root@lnx-ubuntu-gui-1604:~# cat /sys/class/dmi/id/product_uuid
CE9F4D56-BC1B-C44E-1621-AAC76A4FA94D
root@lnx-ubuntu-gui-1604:~# cat /sys/class/dmi/id/product_serial
VMware-56 4d 9f ce 1b bc 4e c4-16 21 aa c7 6a 4f a9 4d

It seems the UUIDs are being saved within the .vmx file of the virtual machine:

uuid.bios = "56 4d 9f ce 1b bc 4e c4-16 21 aa c7 6a 4f a9 4d"
uuid.location = "56 4d 9f ce 1b bc 4e c4-16 21 aa c7 6a 4f a9 4d"

So I guess when upgrading the VM version from anything older to the latest v13 the UUIDs within the VM stay the same, as it's saved within the .VMX file on the host. According to the VMware KB article:

The UUID is based on the physical computer's identifier and the path to the virtual machine's  
configuration file. This UUID is generated when you power on or reset the virtual machine.  
As long as you do not move or copy the virtual machine to another location,  
the UUID remains constant.

I'm just wondering about where the different value of product_uuid comes from...

Xuxe · 2018-01-30T17:27:37Z

@patschi @divyenpatel

yep, after upgrading my v11 coreos box to v13 the UUID stays constant and both serial and uuid matching.
So it seems to be a (bug?) in ESXi and compatibility v13 it self?

But at least, a temporary workaround was found.

Patchlevel:

[root@esxi01:~] vmware -vl
VMware ESXi 6.5.0 build-6765664
VMware ESXi 6.5.0 Update 1

divyenpatel · 2018-01-30T18:47:13Z

@Xuxe I am not able to reproduce the issue you have observed with "ESXi 6.0 and later (VM version 11)" compatibility on coreos.

Deployed core using the OVA - https://stable.release.core-os.net/amd64-usr/current/coreos_production_vmware_ova.ova

Core OS Version

localhost ~ # uname -a
Linux localhost 4.14.11-coreos #1 SMP Fri Jan 5 11:00:14 UTC 2018 x86_64 Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz GenuineIntel GNU/Linux
localhost ~ # cat /etc/os-release
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1576.5.0
VERSION_ID=1576.5.0
BUILD_ID=2018-01-05-1121
PRETTY_NAME="Container Linux by CoreOS 1576.5.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
COREOS_BOARD="amd64-usr"

Checked /sys/class/dmi/id/product_uuid and cat /sys/class/dmi/id/product_serial in the VM.
Both are in sync.

localhost ~ # cat /sys/class/dmi/id/product_uuid
421C827E-0915-AAF2-B154-1FAD4A6297FB
localhost ~ # cat /sys/class/dmi/id/product_serial
VMware-42 1c 82 7e 09 15 aa f2-b1 54 1f ad 4a 62 97 fb

Both /sys/class/dmi/id/product_uuid and cat /sys/class/dmi/id/product_serial files are getting updated with new UUIDs in the new clones created out of this VM.

localhost ~ # cat /sys/class/dmi/id/product_uuid
421C77A8-8EF3-11BB-A764-9DF719B74686
localhost ~ # cat /sys/class/dmi/id/product_serial
VMware-42 1c 77 a8 8e f3 11 bb-a7 64 9d f7 19 b7 46 86

I see no issue with ESXi 6.0 and later (VM version 11) compatiblity.

How are you cloning VMs for creating Node VMs? I am manually cloning VMs from vCenter.

Xuxe · 2018-01-30T19:18:59Z

@divyenpatel

I have not cloned my ubuntu vms both are fresh setups.
CoreOS was also not cloned, installed via ISO not the OVA.

To be clear, if you upgrade a vm from 6.0 and later to 6.5 there is no issue (the ids stay constant and good, i have not tested to clone a upgraded vm).

Please try to create a new vm from begining based on 6.5 and later, now you should see the ids don't match.
As you see @patschi above does have the same issue, with a vm from begining based on 6.5 and later.

sberd · 2018-01-31T03:20:50Z

@divyenpatel @Xuxe
Hi all, I recreated vm with compabiltiy 6.0. And numbers are equal.

[root@centos-test ~]# cat /sys/class/dmi/id/product_uuid
42037CBD-4CA8-BC5B-5509-0FC4D4B3223F
[root@centos-test ~]# cat /sys/class/dmi/id/product_serial
VMware-42 03 7c bd 4c a8 bc 5b-55 09 0f c4 d4 b3 22 3f
[root@centos-test ~]# cat /sys/class/dmi/id/product_serial | sed -e 's/^VMware-//' -e 's/-/ /' | awk '{ print toupper($1$2$3$4 "-" $5$6 "-" $7$8 "-" $9$10 "-" $11$12$13$14$15$16) }'
42037CBD-4CA8-BC5B-5509-0FC4D4B3223F

sberd · 2018-01-31T04:33:28Z

As workaround I changed uuid.bios in vm, and everything worked well. May be it's not good solution for production, but it was my test installation.

divyenpatel · 2018-01-31T04:38:17Z

@sberd @Xuxe
We observed this issue while testing VCP on SUSE Linux 12.3.

We installed SLE 12.3 from ISO with ESXi 6.5 and later (VM version 13) compatibility. product_serial and product_uuid did not match.

Reinstalled SLE 12.3 from ISO with ESXi 6.0 and later (VM version 11) compatibility. Verified product_serial and product_uuid matched.

divyenpatel · 2018-01-31T04:41:02Z

May be it's not good solution for production, but it was my test installation.

@sberd Agree. Kubernetes vSphere Cloud Provider team will come up with the solution and update this issue.

cc: @kubernetes/vmware

divyenpatel · 2018-02-01T04:32:29Z

We are working on this issue. Here is the internal VMware Issue: vmware-archive#450

plenderyou · 2018-02-02T08:16:11Z

I suffered the same issue on a clean install of kubernetes. I rebuilt the machines using the old Compatibility layer and all seemed to work I could provision disks. One question though is disk.enableUUID still required it seemed to work without it

divyenpatel · 2018-02-02T17:33:05Z

One question though is disk.enableUUID still required it seemed to work without it

@plenderyou
Yes we need to set disk.enableUUID to 1, so that disks can be identified in the container host using its UUID.

divyenpatel · 2018-02-03T01:06:52Z

/assign abrarshivani

divyenpatel · 2018-02-03T01:07:20Z

/unassign divyenpatel

divyenpatel · 2018-02-03T01:08:27Z

/area platform/vsphere

divyenpatel · 2018-02-03T01:09:14Z

/sig area/platform/vsphere

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>. Report InstanceID for vSphere Cloud Provider as UUID obtained from product_serial file **What this PR does / why we need it**: vSphere Cloud Provider is not able to find the nodes for VMs created on vSphere v1.6.5. Kubelet fetches SystemUUID from file ```/sys/class/dmi/id/product_uuid```. vSphere Cloud Provider uses this uuid as VM identifier to get node information from vCenter. vCenter v1.6.5 doesn't recognize this uuids, as a result, nodes are not found. UUID present in file ```/sys/class/dmi/id/product_serial``` is recognized by vCenter. Yet, Kubelet doesn't report this. Therefore, in this PR InstanceID is reported as UUID which is fetched from file ```/sys/class/dmi/id/product_serial```. **Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*: Fixes #58927 **Special notes for your reviewer**: Internally review here: vmware-archive#452 Tested: Launched K8s cluster using kubeadm (Used Ubuntu VM compatible with vSphere version 6.5.) _**Note: Installed Ubuntu from ISO**_ Observed following: ``` Master > cat /sys/class/dmi/id/product_uuid 743F0E42-84EA-A2F9-7736-6106BB5DBF6B > cat /sys/class/dmi/id/product_serial VMware-42 0e 3f 74 ea 84 f9 a2-77 36 61 06 bb 5d bf 6b Node > cat /sys/class/dmi/id/product_uuid 956E0E42-CC9D-3D89-9757-F27CEB539B76 > cat /sys/class/dmi/id/product_serial VMware-42 0e 6e 95 9d cc 89 3d-97 57 f2 7c eb 53 9b 76 ``` With this fix controller manager was able to find the nodes. **controller manager logs** ``` {"log":"I0205 22:43:00.106416 1 nodemanager.go:183] Found node ubuntu-node as vm=VirtualMachine:vm-95 in vc=10.161.120.115 and datacenter=vcqaDC\n","stream":"stderr","time":"2018-02-05T22:43:00.421010375Z"} ``` **Release note**: ```release-note vSphere Cloud Provider supports VMs provisioned on vSphere v1.6.5 ```

divyenpatel · 2018-02-08T18:09:42Z

@abrarshivani we should cherry pick this change to 1.9 release.

Harsun · 2019-10-07T12:14:23Z

Looks like the issue is still valid with vm compability 6.7.0 u3.
For guest OS ubuntu 18.04.3, with Kubernetes 1.15.3, Kubespray 2.11.0
unfortunately, UUID and product_serial are not same.

dheerajinganti · 2019-10-11T17:41:28Z

I see System UUID set in the System Info of Node is matching with product_uuid and product_serial, however it does not match with ProvideID of the node. Due to this, i am getting no VM found error and node is unable to mound the volume. below are output for provider id and SystemUUID.

all nodes were working fine but since last week, I started getting no VM found error only for two worker nodes. How do I update node ProviderID with right UUID?

kubectl describe nodes | grep UUID

System UUID: 420B0194-7D17-98FD-BED3-42E7B8C0234D
System UUID: 420BE0F9-6E13-06E8-193B-05D28876E082
System UUID: 420B68A9-961F-993F-2740-AEF065F19C48
System UUID: 8FCF0B42-F2A2-54D1-329B-575D9BC5E19C
System UUID: 420B78CF-33E0-600A-E11F-EBC16D99C91E
System UUID: 18480B42-604D-A838-3245-833DA9B5DFF3

kubectl describe nodes | grep ProviderID

ProviderID: vsphere://420b0194-7d17-98fd-bed3-42e7b8c0234d
ProviderID: vsphere://420be0f9-6e13-06e8-193b-05d28876e082
ProviderID: vsphere://420b68a9-961f-993f-2740-aef065f19c48
ProviderID: vsphere://420bcf8f-a2f2-d154-329b-575d9bc5e19c
ProviderID: vsphere://420b78cf-33e0-600a-e11f-ebc16d99c91e
ProviderID: vsphere://420b4818-4d60-38a8-3245-833da9b5dff3

kubectl get nodes -o json | jq '.items[]|[.metadata.name, .spec.providerID, .status.nodeInfo.systemUUID]'

[
"k8s-kubespray-master-0",
"vsphere://420b0194-7d17-98fd-bed3-42e7b8c0234d",
"420B0194-7D17-98FD-BED3-42E7B8C0234D"
]
[
"k8s-kubespray-master-1",
"vsphere://420be0f9-6e13-06e8-193b-05d28876e082",
"420BE0F9-6E13-06E8-193B-05D28876E082"
]
[
"k8s-kubespray-master-2",
"vsphere://420b68a9-961f-993f-2740-aef065f19c48",
"420B68A9-961F-993F-2740-AEF065F19C48"
]
[
"k8s-kubespray-worker-0",
"vsphere://420bcf8f-a2f2-d154-329b-575d9bc5e19c",
"8FCF0B42-F2A2-54D1-329B-575D9BC5E19C"
]
[
"k8s-kubespray-worker-1",
"vsphere://420b78cf-33e0-600a-e11f-ebc16d99c91e",
"420B78CF-33E0-600A-E11F-EBC16D99C91E"
]
[
"k8s-kubespray-worker-2",
"vsphere://420b4818-4d60-38a8-3245-833da9b5dff3",
"18480B42-604D-A838-3245-833DA9B5DFF3"
]

VariableDeclared · 2019-11-07T08:32:38Z

This issue is still present in VSphere 6.5, the issue seems that in come cases product_uuid seems the source of truth and in other cases product_serial.

There is one pattern that I could spot looking reading this thread though, and that is that the pairs are reversed.

The first group of eight are reversed and then after that the next two groups of four are reversed. After that the last sixteen letters are in the correct order.

hamelg · 2019-11-07T13:02:08Z

us too, we encounter this issue.
It seems it is a related to the smbios version.

With SMBIOS 2.4, we get the same ids.
With SMBIOS 2.7, the ids are different.

Your can see your smbios version :
dmidecode |grep SMBIOS

we are running RHEL 7.6 on esxi 6.5.0.

Here are some notes related to this issue :
https://kb.vmware.com/s/article/53609
https://access.redhat.com/solutions/82203

Tejeev · 2019-11-21T13:41:01Z

We see a very similar issue when F5 BigIp nodes are in the cluster. Removing the nodes resolves the issue. Not sure if our particular situation is a K8s, vSphere, or BigIP bug

jorioux · 2020-02-25T17:45:43Z

We have that issue with centos 7 and vSphere 6.7u3. We also have a f5 bigip VM on that VMware cluster, maybe it's linked, but I don't see how! This is a deep one.

Xuxe changed the title ~~vSphere Cloud Provider fails to attach a volume due to Unable to find VM by UUID~~ vSphere cloud provider fails to attach a volume due to Unable to find VM by UUID Jan 29, 2018

k8s-ci-robot assigned divyenpatel Jan 29, 2018

divyenpatel mentioned this issue Jan 31, 2018

Qualify VCP against SLES 12.3 vmware-archive/kubernetes-archived#445

Closed

Xuxe mentioned this issue Jan 31, 2018

Updated vSphere cloud provider config for Kubernetes >= v1.9.2 and added resource pool deployment variable kubernetes-sigs/kubespray#2221

Merged

k8s-ci-robot assigned abrarshivani Feb 3, 2018

k8s-ci-robot unassigned divyenpatel Feb 3, 2018

abrarshivani mentioned this issue Feb 8, 2018

Report InstanceID for vSphere Cloud Provider as UUID obtained from product_serial file #59519

Merged

k8s-github-robot closed this as completed in #59519 Feb 8, 2018

divyenpatel mentioned this issue Feb 12, 2018

Automated cherry pick of #59519: Report InstanceID for vSphere Cloud Provider as UUID obtained from product_serial file #59602

Merged

ruedigers mentioned this issue Mar 16, 2018

vSphere cloud provider fails to attach a volume by pvc on CentOS 7.4 #61279

Closed

Reamer mentioned this issue May 3, 2018

origin 3.9.0 - Unable to create vSphere storage - nodeVmDetail is empty openshift/origin#19605

Closed

superseb mentioned this issue Oct 11, 2018

VMware hardware version 13 cloud provider problems rancher/rke#950

Closed

rhockenbury mentioned this issue Feb 28, 2019

Support retrieving the VM UUID on Windows #71147

Merged

rgevaert mentioned this issue May 3, 2019

make attaching of a disk work gig-tech/ovc-disk-csi-driver#4

Closed

vSphere cloud provider fails to attach a volume due to Unable to find VM by UUID #58927

vSphere cloud provider fails to attach a volume due to Unable to find VM by UUID #58927

Comments

Xuxe commented Jan 28, 2018 • edited Loading

divyenpatel commented Jan 29, 2018 • edited Loading

divyenpatel commented Jan 29, 2018

Xuxe commented Jan 29, 2018

divyenpatel commented Jan 29, 2018

Xuxe commented Jan 29, 2018

sberd commented Jan 30, 2018

divyenpatel commented Jan 30, 2018

divyenpatel commented Jan 30, 2018

sberd commented Jan 30, 2018

divyenpatel commented Jan 30, 2018

Xuxe commented Jan 30, 2018 • edited Loading

sberd commented Jan 30, 2018

divyenpatel commented Jan 30, 2018

Xuxe commented Jan 30, 2018 • edited Loading

sberd commented Jan 30, 2018

divyenpatel commented Jan 30, 2018 • edited Loading

patschi commented Jan 30, 2018

Xuxe commented Jan 30, 2018 • edited Loading

divyenpatel commented Jan 30, 2018

Xuxe commented Jan 30, 2018

sberd commented Jan 31, 2018 • edited Loading

sberd commented Jan 31, 2018

divyenpatel commented Jan 31, 2018

divyenpatel commented Jan 31, 2018

divyenpatel commented Feb 1, 2018

plenderyou commented Feb 2, 2018

divyenpatel commented Feb 2, 2018

divyenpatel commented Feb 3, 2018

divyenpatel commented Feb 3, 2018

divyenpatel commented Feb 3, 2018

divyenpatel commented Feb 3, 2018

divyenpatel commented Feb 8, 2018

Harsun commented Oct 7, 2019

dheerajinganti commented Oct 11, 2019

kubectl describe nodes | grep UUID

kubectl describe nodes | grep ProviderID

kubectl get nodes -o json | jq '.items[]|[.metadata.name, .spec.providerID, .status.nodeInfo.systemUUID]'

VariableDeclared commented Nov 7, 2019 • edited Loading

hamelg commented Nov 7, 2019

Tejeev commented Nov 21, 2019

jorioux commented Feb 25, 2020

Xuxe commented Jan 28, 2018 •

edited

Loading

divyenpatel commented Jan 29, 2018 •

edited

Loading

Xuxe commented Jan 30, 2018 •

edited

Loading

Xuxe commented Jan 30, 2018 •

edited

Loading

divyenpatel commented Jan 30, 2018 •

edited

Loading

Xuxe commented Jan 30, 2018 •

edited

Loading

sberd commented Jan 31, 2018 •

edited

Loading

VariableDeclared commented Nov 7, 2019 •

edited

Loading