ControllerPublishVolume on Node called twice with same VolumeID #103

okartau · 2018-12-11T17:11:59Z

I spotted a creation error in Node log right after cluster reference test cycle, i.e. deployment steps up to my-csi-app pod. It appears that ControllerPublishVolume on the Node is called twice in a row without much delay (almost immediately). The first call created device:

I1211 15:57:44.512119       1 glog.go:58] GRPC call: /csi.v0.Controller/ControllerPublishVolume
I1211 15:57:44.512138       1 glog.go:58] GRPC request: volume_id:"5220d768-fd5d-11e8-998e-2e150e4311c3" node_id:"host-3" volume_capability:<mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > > volume_attributes:<key:"name" value:"pvc-5095c5eb-fd5d-11e8-84fd-deadbeef0100" > volume_attributes:<key:"nsmode" value:"fsdax" > volume_attributes:<key:"size" value:"8589934592" > volume_attributes:<key:"storage.kubernetes.io/csiProvisionerIdentity" value:"1544543564005-8081-pmem-csi" > 
I1211 15:57:44.512250       1 glog.go:79] ControllerPublishVolume: cs.Node: host-3 req.volume_id: 5220d768-fd5d-11e8-998e-2e150e4311c3, req.node_id: host-3 
I1211 15:57:44.512259       1 glog.go:79] ControllerPublishVolume: volumeName:pvc-5095c5eb-fd5d-11e8-84fd-deadbeef0100 volumeSize:8589934592 nsmode:fsdax
I1211 15:57:44.512267       1 glog.go:58] Executing: vgs --noheadings --nosuffix -o vg_name,vg_size,vg_free,vg_tags --units B ndbus0region0fsdax ndbus0region0sector
I1211 15:57:44.537471       1 glog.go:58] Output:   ndbus0region0fsdax  12670992384 12670992384 fsdax  
  ndbus0region0sector 12859736064 12859736064 sector 
I1211 15:57:44.537493       1 glog.go:58] Executing: lvcreate -L 8192 -n 5220d768-fd5d-11e8-998e-2e150e4311c3 ndbus0region0fsdax
I1211 15:57:44.564560       1 glog.go:58] Output: WARNING: ext4 signature detected on /dev/ndbus0region0fsdax/5220d768-fd5d-11e8-998e-2e150e4311c3 at offset 1080. Wipe it? [y/n]: [n]
  Aborted wiping of ext4.
  1 existing signature left on the device.
  Logical volume "5220d768-fd5d-11e8-998e-2e150e4311c3" created.
I1211 15:57:44.564583       1 glog.go:58] GRPC response: publish_info:<key:"name" value:"pvc-5095c5eb-fd5d-11e8-84fd-deadbeef0100" > publish_info:<key:"size" value:"8589934592" >

immediately followed by other with same params which fails, as there is no more 8G space:

I1211 15:57:44.569451       1 glog.go:58] GRPC call: /csi.v0.Controller/ControllerPublishVolume
I1211 15:57:44.569463       1 glog.go:58] GRPC request: volume_id:"5220d768-fd5d-11e8-998e-2e150e4311c3" node_id:"host-3" volume_capability:<mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > > volume_attributes:<key:"name" value:"pvc-5095c5eb-fd5d-11e8-84fd-deadbeef0100" > volume_attributes:<key:"nsmode" value:"fsdax" > volume_attributes:<key:"size" value:"8589934592" > volume_attributes:<key:"storage.kubernetes.io/csiProvisionerIdentity" value:"1544543564005-8081-pmem-csi" > 
I1211 15:57:44.569505       1 glog.go:79] ControllerPublishVolume: cs.Node: host-3 req.volume_id: 5220d768-fd5d-11e8-998e-2e150e4311c3, req.node_id: host-3 
I1211 15:57:44.569510       1 glog.go:79] ControllerPublishVolume: volumeName:pvc-5095c5eb-fd5d-11e8-84fd-deadbeef0100 volumeSize:8589934592 nsmode:fsdax
I1211 15:57:44.569521       1 glog.go:58] Executing: vgs --noheadings --nosuffix -o vg_name,vg_size,vg_free,vg_tags --units B ndbus0region0fsdax ndbus0region0sector
I1211 15:57:44.594446       1 glog.go:58] Output:   ndbus0region0fsdax  12670992384  4081057792 fsdax  
  ndbus0region0sector 12859736064 12859736064 sector 
E1211 15:57:44.594467       1 glog.go:119] GRPC error: rpc error: code = Internal desc = Failed to create volume: No region is having enough space required(8589934592)

I need to try with smaller Volume request, does the node create double volumes then.
Not sure why driver gets called twice in that case (something in master, or CSI ?).
Driver at that point could check for registered VolumeID, if already created, should refuse.

The text was updated successfully, but these errors were encountered:

pohly · 2018-12-11T17:31:38Z

Olev Kartau <notifications@github.com> writes:

I spotted a creation error in Node log right after cluster reference test cycle, i.e. deployment steps up to _my-csi-app_ pod. It appears that ControllerPublishVolume on the Node is called twice in a row without much delay (almost immediately). The first call created device:

...

I need to try with smaller Volume request, does the node create double volumes then. Not sure why driver gets called twice in that case (something in master, or CSI ?). Driver at that point could check for registered VolumeID, if already created, should refuse.

This happens because the caller has a very low timeout for the gRPC call. This has tripped up ceph-csi before: kubernetes-csi/external-provisioner#131 When the call times out, it just gets retried until it succeeds. According to the standard, that is okay, because each call must be idempotent. In practice that means that a driver must: - serialize gRPC calls (see for example ceph/ceph-csi#92) - detect when a request was already completed earlier and then return without error

okartau · 2018-12-11T17:39:36Z

just for reference, here is master log of same, indeed Master calls it again after like, hm 2,8 ms ?
but thats insanely short. Is our case contributing by the fact that we deal with actual device creation
and CSI spec thinks device is already created and just need publish

I1211 15:57:44.506948       1 glog.go:58] GRPC call: /csi.v0.Controller/ControllerPublishVolume
I1211 15:57:44.506970       1 glog.go:58] GRPC request: volume_id:"5220d768-fd5d-11e8-998e-2e150e4311c3" node_id:"host-3" volume_capability:<mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > > volume_attributes:<key:"name" value:"pvc-5095c5eb-fd5d-11e8-84fd-deadbeef0100" > volume_attributes:<key:"nsmode" value:"fsdax" > volume_attributes:<key:"size" value:"8589934592" > volume_attributes:<key:"storage.kubernetes.io/csiProvisionerIdentity" value:"1544543564005-8081-pmem-csi" > 
I1211 15:57:44.507071       1 glog.go:79] ControllerPublishVolume: cs.Node: host-2 req.volume_id: 5220d768-fd5d-11e8-998e-2e150e4311c3, req.node_id: host-3 
I1211 15:57:44.507082       1 glog.go:58] Connecting to 192.168.8.8:10001
I1211 15:57:44.507213       1 glog.go:58] Still trying, connection is CONNECTING
I1211 15:57:44.507904       1 glog.go:58] Connected
I1211 15:57:44.507922       1 glog.go:79] Getting New Controller Client ....
I1211 15:57:44.507926       1 glog.go:79] Initiating Publishing volume ....
I1211 15:57:44.561568       1 glog.go:79] Got response
I1211 15:57:44.561649       1 glog.go:79] PublishVolume: update fsdax Capacity from 12670992384 to 4081057792
I1211 15:57:44.561693       1 glog.go:58] GRPC response: publish_info:<key:"name" value:"pvc-5095c5eb-fd5d-11e8-84fd-deadbeef0100" > publish_info:<key:"size" value:"8589934592" > 
I1211 15:57:44.564569       1 glog.go:58] GRPC call: /csi.v0.Controller/ControllerPublishVolume
I1211 15:57:44.564587       1 glog.go:58] GRPC request: volume_id:"5220d768-fd5d-11e8-998e-2e150e4311c3" node_id:"host-3" volume_capability:<mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > > volume_attributes:<key:"name" value:"pvc-5095c5eb-fd5d-11e8-84fd-deadbeef0100" > volume_attributes:<key:"nsmode" value:"fsdax" > volume_attributes:<key:"size" value:"8589934592" > volume_attributes:<key:"storage.kubernetes.io/csiProvisionerIdentity" value:"1544543564005-8081-pmem-csi" > 
I1211 15:57:44.564647       1 glog.go:79] ControllerPublishVolume: cs.Node: host-2 req.volume_id: 5220d768-fd5d-11e8-998e-2e150e4311c3, req.node_id: host-3 
I1211 15:57:44.564674       1 glog.go:58] Connecting to 192.168.8.8:10001
I1211 15:57:44.564777       1 glog.go:58] Still trying, connection is CONNECTING
I1211 15:57:44.565371       1 glog.go:58] Connected
I1211 15:57:44.565416       1 glog.go:79] Getting New Controller Client ....
I1211 15:57:44.565486       1 glog.go:79] Initiating Publishing volume ....
I1211 15:57:44.591093       1 glog.go:79] Got response
E1211 15:57:44.591114       1 glog.go:119] GRPC error: rpc error: code = Internal desc = Failed to create volume: No region is having enough space required(8589934592)

okartau · 2018-12-11T17:43:28Z

anyway, the right thing to do is to look into local volume registry where volumeID gets stored, and return OK if that ID is already there.

okartau · 2018-12-12T12:16:51Z

fix proposal in #106

okartau · 2019-01-10T20:44:59Z

fix in #106 was merged, we can close this

okartau closed this as completed Jan 10, 2019

dmayle mentioned this issue Feb 5, 2020

Longhorn v0.7.0, On kubernetes 1.16.4, volumes never attach longhorn/longhorn#1044

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ControllerPublishVolume on Node called twice with same VolumeID #103

ControllerPublishVolume on Node called twice with same VolumeID #103

okartau commented Dec 11, 2018

pohly commented Dec 11, 2018 via email

okartau commented Dec 11, 2018

okartau commented Dec 11, 2018

okartau commented Dec 12, 2018

okartau commented Jan 10, 2019

ControllerPublishVolume on Node called twice with same VolumeID #103

ControllerPublishVolume on Node called twice with same VolumeID #103

Comments

okartau commented Dec 11, 2018

pohly commented Dec 11, 2018 via email

okartau commented Dec 11, 2018

okartau commented Dec 11, 2018

okartau commented Dec 12, 2018

okartau commented Jan 10, 2019