Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csi-provisioner is unable to create volume via ebs-plugin with no helpful error #214

Closed
frittentheke opened this issue Feb 14, 2019 · 25 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@frittentheke
Copy link
Contributor

frittentheke commented Feb 14, 2019

/kind bug

What happened?

I installed a fresh Kubernetes 1.13.3 cluster using kubeadm on AWS.
I created the CSINodeInfo CRD via https://github.com/kubernetes/kubernetes/blob/master/cluster/addons/storage-crds/csinodeinfo.yaml

And then applied controller.yaml + node.yaml.

After adjusting the volumes for the controller (see: #212) things started up nicely and I was able to query the CSINodeInfo objects from my nodes just fine.

I created the most basic storage class I could think of:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: ebs
provisioner: ebs.csi.aws.com
parameters:
  type: gp2
reclaimPolicy: Retain
volumeBindingMode: Immediate

and then applied a corresponding PVC:

apiVersion: v1                                                                                                                                                                                                                                                                                                                 
kind: PersistentVolumeClaim
metadata:
  name: ebs-claim
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: ebs 
  resources:
    requests:
      storage: 10Gi

The ebs-csi-controller, or rather the csi-provisioner picked up the request to CreateVolume, but then things fail miserably and without (at least to me) a helpful error message - even when running ebs-plugin with v=9:

+ ebs-csi-controller-0 › cluster-driver-registrar
+ ebs-csi-controller-0 › csi-attacher
+ ebs-csi-controller-0 › csi-provisioner
+ ebs-csi-controller-0 › ebs-plugin
ebs-csi-controller-0 csi-provisioner I0214 17:43:39.263549       1 csi-provisioner.go:82] Version: v1.0.1-0-g9a258a0a
ebs-csi-controller-0 csi-provisioner I0214 17:43:39.263600       1 csi-provisioner.go:96] Building kube configs for running in cluster...
ebs-csi-controller-0 cluster-driver-registrar I0214 17:43:37.821382       1 main.go:77] Version: v1.0.1-0-g0660c6c6
ebs-csi-controller-0 cluster-driver-registrar I0214 17:43:37.821512       1 main.go:80] Attempting to open a gRPC connection with: "/csi/csi.sock"
ebs-csi-controller-0 cluster-driver-registrar I0214 17:43:37.821535       1 connection.go:69] Connecting to /csi/csi.sock
ebs-csi-controller-0 cluster-driver-registrar I0214 17:43:37.821813       1 connection.go:96] Still trying, connection is CONNECTING
ebs-csi-controller-0 cluster-driver-registrar I0214 17:43:37.822145       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
ebs-csi-controller-0 cluster-driver-registrar I0214 17:43:38.822216       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
ebs-csi-controller-0 csi-attacher I0214 17:43:40.721128       1 main.go:76] Version: v1.0.1-0-gb7dadac
ebs-csi-controller-0 csi-attacher I0214 17:43:40.722553       1 connection.go:89] Connecting to /var/lib/csi/sockets/pluginproxy/csi.sock
ebs-csi-controller-0 ebs-csi-controller-0 csi-attacherebs-csi-controller-0 cluster-driver-registrar I0214 17:43:39.864132       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
I0214 17:43:40.722806       1 connection.go:116] Still trying, connection is CONNECTING
 ebs-pluginebs-csi-controller-0 csi-attacher  I0214 17:43:36.330221       1 driver.go:49] Driver: ebs.csi.aws.com Version: 0.3.0-alpha
I0214 17:43:40.722991       1 connection.go:116] Still trying, connection is TRANSIENT_FAILURE
ebs-csi-controller-0ebs-csi-controller-0  csi-attacher cluster-driver-registrarI0214 17:43:41.723107       1 connection.go:116] Still trying, connection is CONNECTING
 I0214 17:43:41.040443       1 connection.go:96] Still trying, connection is CONNECTING
ebs-csi-controller-0 cluster-driver-registrar I0214 17:43:41.040493       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
ebs-csi-controller-0 cluster-driver-registrar I0214 17:43:42.106398       1 connection.go:96] Still trying, connection is CONNECTING
ebs-csi-controller-0 cluster-driver-registrar I0214 17:43:42.106437       1 connection.go:96] Still trying, connection is TRANSIENT_FAILURE
ebs-csi-controller-0 ebs-plugin I0214 17:43:42.429605       1 mount_linux.go:174] Cannot run systemd-run, assuming non-systemd OS
ebs-csi-controller-0 ebs-csi-controller-0ebs-plugin I0214 17:43:42.429619       1 mount_linux.go:175] systemd-run failed with: exit status 1
 ebs-csi-controller-0 ebs-plugin I0214 17:43:42.429628       1 mount_linux.go:176] systemd-run output: Failed to create bus connection: No such file or directory
csi-provisionerebs-csi-controller-0  ebs-plugin I0214 17:43:42.375961       1 controller.go:197] Connecting to /var/lib/csi/sockets/pluginproxy/csi.sock
I0214 17:43:42.429859       1 driver.go:93] Listening for connections on address: &net.UnixAddr{Name:"/var/lib/csi/sockets/pluginproxy/csi.sock", Net:"unix"}
ebs-csi-controller-0 csi-provisioner I0214 17:43:42.376205       1 controller.go:224] Still trying, connection is CONNECTING
ebs-csi-controller-0 csi-provisioner I0214 17:43:42.378224       1 controller.go:224] Still trying, connection is TRANSIENT_FAILURE
ebs-csi-controller-0 ebs-csi-controller-0csi-attacher  I0214 17:43:42.765155       1 connection.go:113] Connected
ebs-csi-controller-0ebs-plugin  csi-attacher I0214 17:43:42.765176       1 connection.go:242] GRPC call: /csi.v1.Identity/Probe
I0214 17:43:42.771770       1 identity.go:61] Probe: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
ebs-csi-controller-0 ebs-csi-controller-0csi-attacher I0214 17:43:42.765181       1 connection.go:243] GRPC request: {}
 ebs-csi-controller-0 csi-attacher I0214 17:43:42.772187       1 connection.go:245] GRPC response: {}
ebs-pluginebs-csi-controller-0  csi-attacher I0214 17:43:42.772849       1 connection.go:246] GRPC error: <nil>
I0214 17:43:42.773643       1 identity.go:27] GetPluginInfo: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
ebs-csi-controller-0ebs-csi-controller-0  csi-attacher I0214 17:43:42.772857       1 main.go:211] Probe succeeded
ebs-csi-controller-0 ebs-plugincsi-attacher  I0214 17:43:42.772880       1 connection.go:242] GRPC call: /csi.v1.Identity/GetPluginInfo
I0214 17:43:42.775224       1 identity.go:37] GetPluginCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
ebs-csi-controller-0 csi-attacher I0214 17:43:42.772887       1 connection.go:243] GRPC request: {}
ebs-csi-controller-0 csi-attacher I0214 17:43:42.773941       1 connection.go:245] GRPC response: {"name":"ebs.csi.aws.com","vendor_version":"0.3.0-alpha"}
ebs-csi-controller-0 csi-attacher I0214 17:43:42.774549       1 connection.go:246] GRPC error: <nil>
ebs-csi-controller-0 csi-attacher I0214 17:43:42.774558       1 main.go:128] CSI driver name: "ebs.csi.aws.com"
ebs-csi-controller-0 csi-attacher I0214 17:43:42.774566       1 connection.go:242] GRPC call: /csi.v1.Identity/GetPluginCapabilities
ebs-csi-controller-0 csi-attacher I0214 17:43:42.774571       1 connection.go:243] GRPC request: {}
ebs-csi-controller-0 csi-attacher I0214 17:43:42.775700       1 connection.go:245] GRPC response: {"capabilities":[{"Type":{"Service":{"type":1}}},{"Type":{"Service":{"type":2}}}]}
ebs-csi-controller-0 csi-attacher I0214 17:43:42.783221       1 connection.go:246] GRPC error: <nil>
ebs-csi-controller-0 csi-attacher I0214 17:43:42.783230       1 connection.go:242] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
ebs-csi-controller-0 csi-attacher I0214 17:43:42.783236       1 connection.go:243] GRPC request: {}
ebs-csi-controller-0ebs-csi-controller-0  csi-attacherebs-plugin I0214 17:43:42.784204       1 controller.go:220] ControllerGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
 I0214 17:43:42.785050       1 connection.go:245] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}}]}
ebs-csi-controller-0 csi-attacher I0214 17:43:42.788303       1 connection.go:246] GRPC error: <nil>
ebs-csi-controller-0 csi-attacher I0214 17:43:42.789537       1 main.go:152] CSI driver supports ControllerPublishUnpublish, using real CSI handler
ebs-csi-controller-0 csi-attacher I0214 17:43:42.790980       1 controller.go:111] Starting CSI attacher
ebs-csi-controller-0 ebs-csi-controller-0 cluster-driver-registrar I0214 17:43:43.081709       1 connection.go:96] Still trying, connection is CONNECTING
ebs-csi-controller-0 cluster-driver-registrar I0214 17:43:43.081811       1 connection.go:93] Connected
ebs-csi-controller-0 cluster-driver-registrar I0214 17:43:43.081820       1 main.go:88] Calling CSI driver to discover driver name.
ebs-csi-controller-0 cluster-driver-registrar I0214 17:43:43.081829       1 connection.go:137] GRPC call: /csi.v1.Identity/GetPluginInfo
ebs-csi-controller-0 cluster-driver-registrar I0214 17:43:43.081834       1 connection.go:138] GRPC request: {}
ebs-csi-controller-0ebs-plugin cluster-driver-registrar I0214 17:43:43.083106       1 identity.go:27] GetPluginInfo: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
 I0214 17:43:43.083821       1 connection.go:140] GRPC response: {"name":"ebs.csi.aws.com","vendor_version":"0.3.0-alpha"}
ebs-csi-controller-0 cluster-driver-registrar I0214 17:43:43.084433       1 connection.go:141] GRPC error: <nil>
ebs-csi-controller-0 cluster-driver-registrar I0214 17:43:43.084441       1 main.go:96] CSI driver name: "ebs.csi.aws.com"
ebs-csi-controller-0 cluster-driver-registrar I0214 17:43:43.084447       1 main.go:100] Loading kubeconfig.
ebs-csi-controller-0 cluster-driver-registrar I0214 17:43:43.084737       1 k8s_register.go:54] AttachRequired: true
ebs-csi-controller-0 cluster-driver-registrar I0214 17:43:43.084765       1 k8s_register.go:55] PodInfoOnMountVersion: 
ebs-csi-controller-0 cluster-driver-registrar I0214 17:43:43.084771       1 k8s_register.go:58] Registering csidrivers
ebs-csi-controller-0 cluster-driver-registrar I0214 17:43:43.104433       1 k8s_register.go:69] CSIDriver CRD already had been registered
ebs-csi-controller-0 csi-provisioner I0214 17:43:43.376479       1 controller.go:224] Still trying, connection is CONNECTING
ebs-csi-controller-0 csi-provisioner I0214 17:43:43.376585       1 controller.go:221] Connected
ebs-csi-controller-0 csi-provisioner I0214 17:43:43.380089       1 controller.go:571] Starting provisioner controller ebs.csi.aws.com_ebs-csi-controller-0_12606c0e-3080-11e9-b3df-b28d31712265!
ebs-csi-controller-0 csi-provisioner I0214 17:43:43.483402       1 controller.go:620] Started provisioner controller ebs.csi.aws.com_ebs-csi-controller-0_12606c0e-3080-11e9-b3df-b28d31712265!
ebs-csi-controller-0 csi-provisioner I0214 17:43:43.483520       1 controller.go:926] provision "default/ebs-claim" class "ebs": started
ebs-csi-controller-0 csi-provisioner I0214 17:43:43.487847       1 controller.go:188] GRPC call: /csi.v1.Identity/GetPluginCapabilities
ebs-csi-controller-0 csi-provisioner I0214 17:43:43.488450       1 event.go:221] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"default", Name:"ebs-claim", UID:"e6d74eec-307b-11e9-a224-0274920412e2", APIVersion:"v1", ResourceVersion:"207179", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "default/ebs-claim"
ebs-csi-controller-0 ebs-plugin I0214 17:43:43.489633       1 identity.go:37] GetPluginCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
ebs-csi-controller-0 csi-provisioner I0214 17:43:43.487861       1 controller.go:189] GRPC request: {}
ebs-csi-controller-0 csi-provisioner I0214 17:43:43.492706       1 controller.go:191] GRPC response: {"capabilities":[{"Type":{"Service":{"type":1}}},{"Type":{"Service":{"type":2}}}]}
ebs-csi-controller-0 csi-provisioner I0214 17:43:43.498658       1 controller.go:192] GRPC error: <nil>
ebs-csi-controller-0 csi-provisioner I0214 17:43:43.498671       1 controller.go:188] GRPC call: /csi.v1.Controller/ControllerGetCapabilities
ebs-csi-controller-0 csi-provisioner I0214 17:43:43.498676       1 controller.go:189] GRPC request: {}
ebs-csi-controller-0 ebs-plugin I0214 17:43:43.502479       1 controller.go:220] ControllerGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
ebs-csi-controller-0 csi-provisioner I0214 17:43:43.503051       1 controller.go:191] GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}}]}
ebs-csi-controller-0 csi-provisioner I0214 17:43:43.506029       1 controller.go:192] GRPC error: <nil>
ebs-csi-controller-0 ebs-csi-controller-0 csi-provisioner I0214 17:43:43.506043       1 controller.go:188] GRPC call: /csi.v1.Identity/GetPluginInfo
ebs-plugin I0214 17:43:43.507153       1 identity.go:27] GetPluginInfo: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
ebs-csi-controller-0 csi-provisioner I0214 17:43:43.506047       1 controller.go:189] GRPC request: {}
ebs-csi-controller-0 csi-provisioner I0214 17:43:43.507428       1 controller.go:191] GRPC response: {"name":"ebs.csi.aws.com","vendor_version":"0.3.0-alpha"}
ebs-csi-controller-0 csi-provisioner I0214 17:43:43.508821       1 controller.go:192] GRPC error: <nil>
ebs-csi-controller-0 ebs-plugin I0214 17:43:43.512210       1 controller.go:49] CreateVolume: called with args {Name:pvc-e6d74eec-307b-11e9-a224-0274920412e2 CapacityRange:required_bytes:10737418240  VolumeCapabilities:[mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > ] Parameters:map[type:gp2] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:<nil> XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
ebs-csi-controller-0 csi-provisioner I0214 17:43:43.508853       1 controller.go:544] CreateVolumeRequest {Name:pvc-e6d74eec-307b-11e9-a224-0274920412e2 CapacityRange:required_bytes:10737418240  VolumeCapabilities:[mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > ] Parameters:map[type:gp2] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:<nil> XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
ebs-csi-controller-0 csi-provisioner I0214 17:43:43.509013       1 controller.go:188] GRPC call: /csi.v1.Controller/CreateVolume
ebs-csi-controller-0 csi-provisioner I0214 17:43:43.509022       1 controller.go:189] GRPC request: {"capacity_range":{"required_bytes":10737418240},"name":"pvc-e6d74eec-307b-11e9-a224-0274920412e2","parameters":{"type":"gp2"},"volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}]}
ebs-csi-controller-0 ebs-pluginebs-csi-controller-0  E0214 17:43:53.509489       1 driver.go:80] GRPC error: rpc error: code = Internal desc = RequestCanceled: request context canceled
csi-provisioner I0214 17:43:53.509243       1 controller.go:191] GRPC response: {}
ebs-csi-controller-0 csi-provisioner I0214 17:43:53.509800       1 controller.go:192] GRPC error: rpc error: code = DeadlineExceeded desc = context deadline exceeded
ebs-csi-controller-0 ebs-csi-controller-0ebs-plugin caused by: context canceled
 csi-provisioner W0214 17:43:53.509816       1 controller.go:592] CreateVolume timeout: 10s has expired, operation will be retried
ebs-csi-controller-0ebs-csi-controller-0 csi-provisioner  I0214 17:43:58.509996       1 controller.go:188] GRPC call: /csi.v1.Controller/CreateVolume
ebs-plugin ebs-csi-controller-0I0214 17:43:58.513424       1 controller.go:49] CreateVolume: called with args {Name:pvc-e6d74eec-307b-11e9-a224-0274920412e2 CapacityRange:required_bytes:10737418240  VolumeCapabilities:[mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > ] Parameters:map[type:gp2] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:<nil> XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
 csi-provisioner I0214 17:43:58.510027       1 controller.go:189] GRPC request: {"capacity_range":{"required_bytes":10737418240},"name":"pvc-e6d74eec-307b-11e9-a224-0274920412e2","parameters":{"type":"gp2"},"volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}]}
ebs-csi-controller-0 ebs-plugin E0214 17:44:08.510374       1 driver.go:80] GRPC error: rpc error: code = Internal desc = RequestCanceled: request context canceled
ebs-csi-controller-0 csi-provisioner I0214 17:44:08.510168       1 controller.go:191] GRPC response: {}
ebs-csi-controller-0ebs-csi-controller-0  ebs-plugincsi-provisioner  caused by: context canceled
I0214 17:44:08.510648       1 controller.go:192] GRPC error: rpc error: code = DeadlineExceeded desc = context deadline exceeded
ebs-csi-controller-0 csi-provisioner W0214 17:44:08.510662       1 controller.go:592] CreateVolume timeout: 10s has expired, operation will be retried
ebs-csi-controller-0ebs-csi-controller-0  ebs-plugincsi-provisioner I0214 17:44:14.510855       1 controller.go:188] GRPC call: /csi.v1.Controller/CreateVolume
 I0214 17:44:14.513327       1 controller.go:49] CreateVolume: called with args {Name:pvc-e6d74eec-307b-11e9-a224-0274920412e2 CapacityRange:required_bytes:10737418240  VolumeCapabilities:[mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > ] Parameters:map[type:gp2] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:<nil> XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
ebs-csi-controller-0 csi-provisioner I0214 17:44:14.510911       1 controller.go:189] GRPC request: {"capacity_range":{"required_bytes":10737418240},"name":"pvc-e6d74eec-307b-11e9-a224-0274920412e2","parameters":{"type":"gp2"},"volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}]}
ebs-csi-controller-0 ebs-plugin E0214 17:44:24.511174       1 driver.go:80] GRPC error: rpc error: code = Internal desc = RequestCanceled: request context canceled
ebs-csi-controller-0ebs-csi-controller-0  csi-provisionerebs-plugin  caused by: context deadline exceeded
I0214 17:44:24.511042       1 controller.go:191] GRPC response: {}
ebs-csi-controller-0 csi-provisioner I0214 17:44:24.511578       1 controller.go:192] GRPC error: rpc error: code = DeadlineExceeded desc = context deadline exceeded
ebs-csi-controller-0 csi-provisioner W0214 17:44:24.511592       1 controller.go:592] CreateVolume timeout: 10s has expired, operation will be retried
ebs-csi-controller-0 ebs-pluginebs-csi-controller-0  I0214 17:44:31.714097       1 controller.go:49] CreateVolume: called with args {Name:pvc-e6d74eec-307b-11e9-a224-0274920412e2 CapacityRange:required_bytes:10737418240  VolumeCapabilities:[mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > ] Parameters:map[type:gp2] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:<nil> XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
csi-provisioner I0214 17:44:31.711803       1 controller.go:188] GRPC call: /csi.v1.Controller/CreateVolume
ebs-csi-controller-0 csi-provisioner I0214 17:44:31.711829       1 controller.go:189] GRPC request: {"capacity_range":{"required_bytes":10737418240},"name":"pvc-e6d74eec-307b-11e9-a224-0274920412e2","parameters":{"type":"gp2"},"volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}]}
ebs-csi-controller-0ebs-csi-controller-0  csi-provisioner I0214 17:44:41.711943       1 controller.go:191] GRPC response: {}
ebs-pluginebs-csi-controller-0 csi-provisioner  I0214 17:44:41.715233       1 controller.go:192] GRPC error: rpc error: code = DeadlineExceeded desc = context deadline exceeded
E0214 17:44:41.712073       1 driver.go:80] GRPC error: rpc error: code = Internal desc = RequestCanceled: request context canceled
ebs-csi-controller-0 ebs-csi-controller-0csi-provisioner  W0214 17:44:41.715293       1 controller.go:592] CreateVolume timeout: 10s has expired, operation will be retried
ebs-plugin caused by: context deadline exceeded

What you expected to happen?

I expected either csi-provisioner to create a volume or ebs-plugin to return a helpful error message about i.e. the failing API call.

Environment

  • Kubernetes version (use kubectl version): 1.13.3
  • Driver version: latest (0.3.0alpha)
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Feb 14, 2019
@leakingtapan
Copy link
Contributor

From the log:

ebs-csi-controller-0 csi-provisioner I0214 17:43:43.509022       1 controller.go:189] GRPC request: {"capacity_range":{"required_bytes":10737418240},"name":"pvc-e6d74eec-307b-11e9-a224-0274920412e2","parameters":{"type":"gp2"},"volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}]}
ebs-csi-controller-0 ebs-pluginebs-csi-controller-0  E0214 17:43:53.509489       1 driver.go:80] GRPC error: rpc error: code = Internal desc = RequestCanceled: request context canceled

The CreateVolume is timed out after 10s which is the default value for external provisioner.

Could you paste the log for the ebs csi driver in the csi-controller pod?

@frittentheke
Copy link
Contributor Author

frittentheke commented Feb 14, 2019

Thanks for the quick reply @leakingtapan !

Certainly you may get the logs:

E0214 20:23:00.786708       1 driver.go:80] GRPC error: rpc error: code = Internal desc = RequestCanceled: request context canceled
caused by: context canceled
I0214 20:39:40.797478       1 identity.go:37] GetPluginCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0214 20:39:40.799849       1 controller.go:220] ControllerGetCapabilities: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0214 20:39:40.802181       1 identity.go:27] GetPluginInfo: called with args {XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
I0214 20:39:40.805212       1 controller.go:49] CreateVolume: called with args {Name:pvc-e6d74eec-307b-11e9-a224-0274920412e2 CapacityRange:required_bytes:10737418240  VolumeCapabilities:[mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > ] Parameters:map[type:gp2] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:<nil> XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
E0214 20:39:50.803210       1 driver.go:80] GRPC error: rpc error: code = Internal desc = RequestCanceled: request context canceled
caused by: context deadline exceeded
I0214 20:39:55.808528       1 controller.go:49] CreateVolume: called with args {Name:pvc-e6d74eec-307b-11e9-a224-0274920412e2 CapacityRange:required_bytes:10737418240  VolumeCapabilities:[mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > ] Parameters:map[type:gp2] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:<nil> XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
E0214 20:40:05.805163       1 driver.go:80] GRPC error: rpc error: code = Internal desc = RequestCanceled: request context canceled
caused by: context canceled
I0214 20:40:11.807830       1 controller.go:49] CreateVolume: called with args {Name:pvc-e6d74eec-307b-11e9-a224-0274920412e2 CapacityRange:required_bytes:10737418240  VolumeCapabilities:[mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > ] Parameters:map[type:gp2] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:<nil> XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
E0214 20:40:21.806068       1 driver.go:80] GRPC error: rpc error: code = Internal desc = RequestCanceled: request context canceled
caused by: context deadline exceeded
I0214 20:40:29.008532       1 controller.go:49] CreateVolume: called with args {Name:pvc-e6d74eec-307b-11e9-a224-0274920412e2 CapacityRange:required_bytes:10737418240  VolumeCapabilities:[mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > ] Parameters:map[type:gp2] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:<nil> XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
E0214 20:40:39.006983       1 driver.go:80] GRPC error: rpc error: code = Internal desc = RequestCanceled: request context canceled
caused by: context deadline exceeded
I0214 20:40:47.649623       1 controller.go:49] CreateVolume: called with args {Name:pvc-e6d74eec-307b-11e9-a224-0274920412e2 CapacityRange:required_bytes:10737418240  VolumeCapabilities:[mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > ] Parameters:map[type:gp2] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:<nil> XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
E0214 20:40:57.647926       1 driver.go:80] GRPC error: rpc error: code = Internal desc = RequestCanceled: request context canceled
caused by: context deadline exceeded
I0214 20:41:08.020603       1 controller.go:49] CreateVolume: called with args {Name:pvc-e6d74eec-307b-11e9-a224-0274920412e2 CapacityRange:required_bytes:10737418240  VolumeCapabilities:[mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > ] Parameters:map[type:gp2] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:<nil> XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
E0214 20:41:18.016828       1 driver.go:80] GRPC error: rpc error: code = Internal desc = RequestCanceled: request context canceled
caused by: context canceled
I0214 20:41:30.461074       1 controller.go:49] CreateVolume: called with args {Name:pvc-e6d74eec-307b-11e9-a224-0274920412e2 CapacityRange:required_bytes:10737418240  VolumeCapabilities:[mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > ] Parameters:map[type:gp2] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:<nil> XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
E0214 20:41:40.459776       1 driver.go:80] GRPC error: rpc error: code = Internal desc = RequestCanceled: request context canceled
caused by: context deadline exceeded
I0214 20:41:55.393744       1 controller.go:49] CreateVolume: called with args {Name:pvc-e6d74eec-307b-11e9-a224-0274920412e2 CapacityRange:required_bytes:10737418240  VolumeCapabilities:[mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > ] Parameters:map[type:gp2] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:<nil> XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
E0214 20:42:05.390616       1 driver.go:80] GRPC error: rpc error: code = Internal desc = RequestCanceled: request context canceled
caused by: context canceled
I0214 20:42:23.309213       1 controller.go:49] CreateVolume: called with args {Name:pvc-e6d74eec-307b-11e9-a224-0274920412e2 CapacityRange:required_bytes:10737418240  VolumeCapabilities:[mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > ] Parameters:map[type:gp2] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:<nil> XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
E0214 20:42:33.307443       1 driver.go:80] GRPC error: rpc error: code = Internal desc = RequestCanceled: request context canceled
caused by: context deadline exceeded
I0214 20:42:54.810461       1 controller.go:49] CreateVolume: called with args {Name:pvc-e6d74eec-307b-11e9-a224-0274920412e2 CapacityRange:required_bytes:10737418240  VolumeCapabilities:[mount:<fs_type:"ext4" > access_mode:<mode:SINGLE_NODE_WRITER > ] Parameters:map[type:gp2] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:<nil> XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
E0214 20:43:04.807458       1 driver.go:80] GRPC error: rpc error: code = Internal desc = RequestCanceled: request context canceled
caused by: context canceled

The initially posted logs contain output from all containers in the controller pod (fetched using stern) cluttered together.

@leakingtapan
Copy link
Contributor

leakingtapan commented Feb 15, 2019

Tested v1.13.3 cluster created using kops. I was not able to repo the issue. Is this consistently happen to you? What are the feature gates turned on? Also, could you check on aws console (or using cli) to see what's the status of the EBS volume? Is there any EBS volume got created with tag CSIVolumeName that has the volume name being created eg pvc-e6d74eec-307b-11e9-a224-0274920412e2 ?

@leakingtapan
Copy link
Contributor

NVM about the feature gate. Saw your reply from #213

@frittentheke
Copy link
Contributor Author

@leakingtapan there is no volume created at all, not even without certain tags.
Thats why I was wondering if it's possible to turn on full AWS API request debugging somehow to see if the ebs-plugin is even trying to do anything.

Could you maybe look at the feature gates and other options you have for your kubelet and Kube-API server? Regarding #213 ... am I mistaken about the "wrong" placement of the csi.socket files somehow? But I suppose if the csi-provisioner is picking up the requested PVC and then is able to talk to ebs-plugin in the same pod, it should at least lead to a volume being created with AWS, right?

@frittentheke
Copy link
Contributor Author

@leakingtapan
Luckly or sadly we were able to identify the root cause of this issue to be fully unrelated to the aws-ebs-csi-driver. The ebs-plugin was unable to resolve the IP of the AWS API via DNS. So the plugin initialized just fine (using the metadata service), but then was unable to send out any requests to the AWS API. If I may kindly ask to add more logging of the ebs-plugin when doing API requests? A failed DNS request should be distinguishable from timeout reaching the AWS API.

The only thing left to say is: Thank you.

@leakingtapan
Copy link
Contributor

leakingtapan commented Feb 15, 2019

@frittentheke thx for root cause. We definite want to make improvements on such debugability issue. I added an issue to track this as a feature request.

@shishirkh
Copy link

I think I am facing the same issue - cannot be sure as I am not seeing any DNS error in the logs.

@frittentheke How did you solve it when you faced the DNS issue?

@throrin19
Copy link

Same problem as @shishirkh here. Anybody to help ?

@namgk
Copy link

namgk commented Sep 29, 2022

we've got the same issue running k8s 1.23

failed to provision volume with StorageClass "gp2": rpc error: code = Internal desc = Could not create volume "pvc-c7610d24-aa0c-42ca-bf23-f07b0b06139f": could not create volume in EC2: RequestCanceled: request context canceled caused by: context deadline exceeded

@sharif997
Copy link

we've got the same issue running k8s 1.23

failed to provision volume with StorageClass "gp2": rpc error: code = Internal desc = Could not create volume "pvc-c7610d24-aa0c-42ca-bf23-f07b0b06139f": could not create volume in EC2: RequestCanceled: request context canceled caused by: context deadline exceeded

I am facing the same issue , any luck solving it ?

@Kingshuk-Kubera
Copy link

Kingshuk-Kubera commented Oct 27, 2022

I followed below steps and it worked -
https://docs.aws.amazon.com/eks/latest/userguide/ebs-csi.html

@edrimon
Copy link

edrimon commented Mar 15, 2023

@leakingtapan Luckly or sadly we were able to identify the root cause of this issue to be fully unrelated to the aws-ebs-csi-driver. The ebs-plugin was unable to resolve the IP of the AWS API via DNS. So the plugin initialized just fine (using the metadata service), but then was unable to send out any requests to the AWS API. If I may kindly ask to add more logging of the ebs-plugin when doing API requests? A failed DNS request should be distinguishable from timeout reaching the AWS API.

The only thing left to say is: Thank you.

Hi, I think I have similar issue. How you managed at the end to perform debug to identify the cause? For example how can I find exactly which FQDN it tries to resolve and it fails?

@ricardorqr
Copy link

I'm also having the same problem. No solution?

failed to provision volume with StorageClass "ebs-sc": rpc error: code = Internal desc = Could not create volume "pvc-8eb69f7e-fa16-45bb-803e-295cb924cf81": could not create volume in EC2: RequestCanceled: request context canceled caused by: context deadline exceeded

@Kingshuk-Kubera
Copy link

I was following this https://docs.aws.amazon.com/eks/latest/userguide/ebs-csi.html, and I made two mistakes, which caused the issue -

  1. Creating the Amazon EBS CSI driver IAM role for service accounts - (https://docs.aws.amazon.com/eks/latest/userguide/csi-iam-role.html), check role has been created properly. Check "Trust Relationships" of the role, click edit in AWS console and check for any validation errors. For, me, there were validation error.
  2. Creating an IAM OIDC provider for your cluster - Again, this has been created properly. It has at least one thumbprint.

@ricardorqr
Copy link

ricardorqr commented Mar 25, 2023

I was following this https://docs.aws.amazon.com/eks/latest/userguide/ebs-csi.html, and I made two mistakes, which caused the issue -

  1. Creating the Amazon EBS CSI driver IAM role for service accounts - (https://docs.aws.amazon.com/eks/latest/userguide/csi-iam-role.html), check role has been created properly. Check "Trust Relationships" of the role, click edit in AWS console and check for any validation errors. For, me, there were validation error.
  2. Creating an IAM OIDC provider for your cluster - Again, this has been created properly. It has at least one thumbprint.

Man, have I said that I love you???????? 🤣🤣🤣🤣🤣 You solved my problem.

Yeah, I made a few mistakes too. Let me show you how I fixed it. Mine was a bit similar to yours.

  1. I was following this tutorial: https://repost.aws/knowledge-center/eks-persistent-storage However, I noticed that my cluster didn't have OIDC provider. So I was following the next

  2. Then, I was following this one https://repost.aws/knowledge-center/eks-troubleshoot-oidc-and-irsa to add OIDC provider to my cluster

  3. Lastly, I follow this one https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.html

After all, everything was working nicely. Thanks again, brother. 🤜🏼🤛🏼

@fl-max
Copy link

fl-max commented Jul 25, 2023

My issue was an I/O Timeout error while making a request to STS. To debug this issue, you'll want to enable aws-sdk-debug-log ouput (see docs).

NOTE: If using Terraform, you can enable this option with the following configuration:

resource "aws_eks_addon" "example" {
    ....
    configuration_values = jsonencode({
    controller: {
        sdkDebugLog = true
    }
  })
}

Once enabled, and your EBS controller pods have restarted, check the ebs-plugin container logs in the ebs-csi-controller pod. After doing this, I noticed the following in my logs:

DEBUG: Send Request sts/AssumeRoleWithWebIdentity failed, attempt 1/8, error RequestError: send request failed
caused by: Post "https://sts.us-east-1.amazonaws.com/": dial tcp x.x.x.x:443: i/o timeout

To fix, I added the VPC Service Endpoint for com.amazonaws.REGION.sts to the VPC my EKS cluster was deployed in.

@d3bt3ch
Copy link

d3bt3ch commented Sep 1, 2023

I am having the same issue. In our case we are not using EKS and provisioning our clusters with ClusterAPI. Any help?

@bck01215
Copy link

bck01215 commented Sep 15, 2023

@fl-max could you link a guide to creating the VPC service endpoint. I have one created now with the same subnets, but am still getting the same errors you were when debugging is turned on.

EDIT:

It's only failing in two of my subnets

@edrimon
Copy link

edrimon commented Sep 16, 2023 via email

@bck01215
Copy link

@edrimon I ended up having the same issue. Opening port 53/udp in the ACL worked.

@jtmarmon
Copy link

I ran into this issue and the DNS comments above were a helpful tip. In our case, I had recently installed a CNI plugin (Cilium) and forgot to restart the addon pods. Rolling restart of the ebs-csi-controller deployment immediately fixed it for us

@arathore47
Copy link

Rolling restart of the ebs-csi-controller as well as the ebs-csi-node refreshed the DNS cache and resolved the issue for us

@CameronHudson8
Copy link

CameronHudson8 commented Mar 14, 2024

My issue was an I/O Timeout error while making a request to STS. To debug this issue, you'll want to enable aws-sdk-debug-log ouput (see docs).

Thank you so much @fl-max. After adding the flag --aws-sdk-debug-log to my ebs-csi-controller deployment, I found that DNS resolution for sts.ap-northeast-1.amazonaws.com was working, but the ebs-csi-controller was getting I/O timeouts when trying to connect to sts.ap-northeast-1.amazonaws.com:443.

Since I currently have no way to filter egress by hostname, I opened up egress with the following network policy. After that, traffic began flowing from the ebs-csi-controller, and the PV successfully provisioned.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-ebs-csi-controller-egress
spec:
  egress:
    - ports:
        - port: 443
          protocol: TCP
  podSelector:
    matchLabels:
      app: ebs-csi-controller
  policyTypes:
    - Egress

@AjayEdupuganti
Copy link

This solved the issue for me.
first i did
nslookup ec2.ap-south-1.amazonaws.com
it gave

Server:         127.0.0.53
Address:        127.0.0.53#53

Non-authoritative answer:
Name:   ec2.ap-south-1.amazonaws.com
Address: 52.95.80.15

Then i edited ebs-csi-controller configuration

kubectl edit deployment -n kube-system ebs-csi-controller
added dnspolicy and hostaliases.

spec:
  template:
    spec:
      dnsPolicy: ClusterFirst
      hostAliases:
      - hostnames:
        - ec2.ap-south-1.amazonaws.com
        ip: 52.95.80.15

thats it. hope it helps someone

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests