Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Unity CSI node driver reports "invalid memory address or nil pointer dereference" #152

Closed
ErikZandboer opened this issue Jan 11, 2022 · 10 comments
Assignees
Labels
area/csi-unity Issue pertains to the CSI Driver for Dell EMC Unity type/bug Something isn't working. This is the default label associated with a bug issue.
Milestone

Comments

@ErikZandboer
Copy link

On a clean single-node vanilla K8s install I try to install the Unity CSI driver version 2.1. The controller pod runs ok, but the nodes pod keep crashing. When querying the logs for the nodes pod, the driver shows:
panic: runtime error: invalid memory address or nil pointer dereference

At the same time the registrar reports:
transport: Error while dialing dial unix /var/lib/kubelet/plugins/unity.emc.dell.com/csi_sock: connect: connection refused", restarting registration container.

To Reproduce
Steps to reproduce the behavior:

  1. Install a clean CentOS 7.8 VM
  2. Install docker 1.13, Vanilla K8s 1.22.5. Taint the node so it can run containers. Create the namespace "unity".
  3. Have a Unity VSA running (version 5.1.0.0.5.394)
  4. Fetch the cert from the Unity and create the cert secret, create a secret.yaml and fill out the details, create the creds secret.
  5. Install the snapshotter v4.2, install helm3, install sshpass
  6. Get the Unity CSI driver 2.1 (git clone) and run ./csi-install using a default myvalues.yaml (only changed controllerCount to 1)
  7. See the controller pod go into the running state (5/5) and the node pod entering CrashLoopBackOff
  8. Logs in the driver container of the node pod shows "panic: runtime error: invalid memory address or nil pointer dereference"

Expected behavior
The node pod starting successfully and going into the running state (2/2).

Logs (node pod; driver container)
[root@k8s-single ~]# kubectl -n unity logs unity-node-ffnk8 driver
Endpoint /var/lib/kubelet/plugins/unity.emc.dell.com/csi_sock
Removed endpoint /var/lib/kubelet/plugins/unity.emc.dell.com/csi_sock
ls: cannot access '/var/lib/kubelet/plugins/unity.emc.dell.com/csi_sock': No such file or directory
time="2022-01-11T12:25:19Z" level=info runid=start msg="Driver Mode:node" func="github.com/dell/csi-unity/service.(service).BeforeServe()" file="/go/src/csi-unity/service/service.go:174"
time="2022-01-11T12:25:19Z" level=info runid=start msg="X_CSI_UNITY_NODENAME: k8s-single" func="github.com/dell/csi-unity/service.(service).BeforeServe()" file="/go/src/csi-unity/service/service.go:182"
time="2022-01-11T12:25:19Z" level=info runid=config-0 msg="Synchronizing driver secret
" func="github.com/dell/csi-unity/service.(service).syncDriverSecret()" file="/go/src/csi-unity/service/service.go:459"
time="2022-01-11T12:25:19Z" level=info msg="configured csi-unity.dellemc.com" ArrayId=virt21512v5f91 Endpoint="https://192.168.192.202/" IsDefault=true SkipCertificateValidation=true password="
*****" username=admin
time="2022-01-11T12:25:19Z" level=info runid=node-0 msg="Starting goroutine to add Node information to storage array" func="github.com/dell/csi-unity/service.(*service).syncNodeInfoRoutine()" file="/go/src/csi-unity/service/node.go:1829"
csi-unity logger initiated. This should be called only once.
gounity logger initiated. This should be called only once.
time="2022-01-11T12:25:19Z" level=info arrayid=virt21512v5f91 runid=config-1 msg="Dynamic config load goroutine invoked" func="github.com/dell/csi-unity/service.(*service).loadDynamicConfig()" file="/go/src/csi-unity/service/service.go:374"
time="2022-01-11T12:25:19Z" level=info runid=config-0 msg="configured csi-unity.dellemc.com" func="github.com/dell/csi-unity/service.(*service).BeforeServe.func1()" file="/go/src/csi-unity/service/service.go:169"
time="2022-01-11T12:25:19Z" level=info msg="identity service registered"
time="2022-01-11T12:25:19Z" level=info msg="node service registered"
time="2022-01-11T12:25:19Z" level=info runid=RegisterAdditionalServers msg="Registering additional GRPC servers" func="github.com/dell/csi-unity/service.(service).RegisterAdditionalServers()" file="/go/src/csi-unity/service/service.go:286"
time="2022-01-11T12:25:19Z" level=info msg=serving endpoint="unix:///var/lib/kubelet/plugins/unity.emc.dell.com/csi_sock"
time="2022-01-11T12:25:19Z" level=info arrayid=virt21512v5f91 runid=config-1 msg="Synchronizing driver config
" func="github.com/dell/csi-unity/service.(*service).syncDriverConfig()" file="/go/src/csi-unity/service/service.go:560"
time="2022-01-11T12:25:19Z" level=warning arrayid=virt21512v5f91 runid=config-1 msg="Log level changed to: info" func="github.com/dell/csi-unity/service.(*service).syncDriverConfig()" file="/go/src/csi-unity/service/service.go:578"
time="2022-01-11T12:25:19Z" level=info msg="/csi.v1.Identity/GetPluginInfo: REQ 0001: XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2022-01-11T12:25:19Z" level=info msg="/csi.v1.Identity/GetPluginInfo: REP 0001: Name=csi-unity.dellemc.com, VendorVersion=2.1.0, Manifest=map[commit:79d94eb4d3eb0521fcf489053c544b290d0595db formed:Fri, 03 Dec 2021 09:51:20 UTC semver:2.1.0 url:http://github.com/dell/csi-unity], XXX_NoUnkeyedLiteral={}, XXX_sizecache=0"
time="2022-01-11T12:25:20Z" level=warning arrayid=virt21512v5f91 runid=node-2 msg="Cannot read directory: /sys/class/fc_host : open /sys/class/fc_host: no such file or directory" func="github.com/dell/csi-unity/service/utils.GetFCInitiators()" file="/go/src/csi-unity/service/utils/emcutils.go:95"
time="2022-01-11T12:25:20Z" level=warning arrayid=virt21512v5f91 runid=node-2 msg="'FC Initiators' cannot be retrieved." func="github.com/dell/csi-unity/service.(*service).addNodeInformationIntoArray()" file="/go/src/csi-unity/service/node.go:1591"
time="2022-01-11T12:25:20Z" level=info msg="URI/api/instances/host/name:k8s-single?fields=id,name,description,fcHostInitiators,iscsiHostInitiators,hostIPPorts?fields" func="github.com/dell/gounity.(*host).FindHostByName()" file="dell/gounity@v1.8.0/host.go:42"
time="2022-01-11T12:25:20Z" level=error msg="failed to invoke Unity REST API server" func="github.com/dell/gounity.(*Client).executeWithRetryAuthenticate()" file="dell/gounity@v1.8.0/unityclient.go:127" error="[{The requested resource does not exist. (Error Code:0x7d13005)}]"
time="2022-01-11T12:25:20Z" level=info msg="URI/api/instances/host/name:k8s-single?fields=id,name,description,fcHostInitiators,iscsiHostInitiators,hostIPPorts?fields" func="github.com/dell/gounity.(*host).FindHostByName()" file="dell/gounity@v1.8.0/host.go:42"
time="2022-01-11T12:25:20Z" level=error msg="failed to invoke Unity REST API server" func="github.com/dell/gounity.(*Client).executeWithRetryAuthenticate()" file="dell/gounity@v1.8.0/unityclient.go:127" error="[{The requested resource does not exist. (Error Code:0x7d13005)}]"
time="2022-01-11T12:25:20Z" level=error msg="failed to invoke Unity REST API server" func="github.com/dell/gounity.(*Client).executeWithRetryAuthenticate()" file="dell/gounity@v1.8.0/unityclient.go:127" error="[{Requested element, property or action does not exist:[tenant]. (Error Code:0x7d13137)}]"
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x166a366]
goroutine 13 [running]:
github.com/dell/csi-unity/service.(*service).addNewNodeToArray(0xc000584000, 0x1bebae0, 0xc00024a960, 0xc00052bbc0, 0xc0003b7b50, 0x1, 0x1, 0xc0003b7a10, 0x1, 0x1, ...)
/go/src/csi-unity/service/node.go:1771 +0x146
github.com/dell/csi-unity/service.(*service).addNodeInformationIntoArray(0xc000584000, 0x1bebae0, 0xc00024a690, 0xc00052bbc0, 0x1bebae0, 0xc00024a690)
/go/src/csi-unity/service/node.go:1620 +0x819
github.com/dell/csi-unity/service.(*service).syncNodeInfo.func2.1(0x1bebae0, 0xc00024a060, 0xc000584000, 0xc00052bbc0)
/go/src/csi-unity/service/node.go:1875 +0xa5
created by github.com/dell/csi-unity/service.(*service).syncNodeInfo.func2
/go/src/csi-unity/service/node.go:1873 +0x85

Logs (node pod; registrar container)
[root@k8s-single unity.emc.dell.com]# kubectl -n unity logs unity-node-ffnk8 registrar
I0111 12:25:19.773635 1 main.go:164] Version: v2.3.0
I0111 12:25:19.773661 1 main.go:165] Running node-driver-registrar in mode=registration
I0111 12:25:19.774159 1 main.go:189] Attempting to open a gRPC connection with: "/csi/csi_sock"
I0111 12:25:19.774183 1 connection.go:154] Connecting to unix:///csi/csi_sock
I0111 12:25:19.775517 1 main.go:196] Calling CSI driver to discover driver name
I0111 12:25:19.775527 1 connection.go:183] GRPC call: /csi.v1.Identity/GetPluginInfo
I0111 12:25:19.775529 1 connection.go:184] GRPC request: {}
I0111 12:25:19.777657 1 connection.go:186] GRPC response: {"manifest":{"commit":"79d94eb4d3eb0521fcf489053c544b290d0595db","formed":"Fri, 03 Dec 2021 09:51:20 UTC","semver":"2.1.0","url":"http://github.com/dell/csi-unity"},"name":"csi-unity.dellemc.com","vendor_version":"2.1.0"}
I0111 12:25:19.777696 1 connection.go:187] GRPC error:
I0111 12:25:19.777700 1 main.go:206] CSI driver name: "csi-unity.dellemc.com"
I0111 12:25:19.777735 1 node_register.go:52] Starting Registration Server at: /registration/csi-unity.dellemc.com-reg.sock
I0111 12:25:19.777965 1 node_register.go:61] Registration Server started at: /registration/csi-unity.dellemc.com-reg.sock
I0111 12:25:19.778025 1 node_register.go:91] Skipping healthz server because HTTP endpoint is set to: ""
E0111 12:25:20.470869 1 connection.go:132] Lost connection to unix:///csi/csi_sock.
I0111 12:25:21.076923 1 main.go:100] Received GetInfo call: &InfoRequest{}
I0111 12:25:21.077074 1 main.go:107] "Kubelet registration probe created" path="/var/lib/kubelet/plugins/unity.emc.dell.com/registration"
I0111 12:25:21.080850 1 main.go:118] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:false,Error:RegisterPlugin error -- plugin registration failed with err: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /var/lib/kubelet/plugins/unity.emc.dell.com/csi_sock: connect: connection refused",}
E0111 12:25:21.080862 1 main.go:120] Registration process failed with error: RegisterPlugin error -- plugin registration failed with err: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /var/lib/kubelet/plugins/unity.emc.dell.com/csi_sock: connect: connection refused", restarting registration container.

System Information (please complete the following information):

  • OS/Version: CentOS 7.8 running as a VM under ESXi 7.0.2 (2 cores, 4GB memory)
  • Kubernetes 1.22.5, single node tainted to be both master and worker
  • Unity VSA 5.1.0.0.5.394 running on same ESXi host

Additional context
Also tried to get this working on a 3 node K8s setup, same issue. Have reinstalled from scratch several times with same outcome. Also reproduced with similar environment but using K8s 1.21.8.

@ErikZandboer ErikZandboer added needs-triage Issue requires triage. type/bug Something isn't working. This is the default label associated with a bug issue. labels Jan 11, 2022
@rajendraindukuri rajendraindukuri added area/csi-unity Issue pertains to the CSI Driver for Dell EMC Unity and removed needs-triage Issue requires triage. labels Jan 11, 2022
@karthikk92
Copy link

karthikk92 commented Jan 11, 2022

could you please share what is the value for variable tenantName passed in myvalues.yaml(values.yaml) file while installation of the driver?

@ErikZandboer
Copy link
Author

could you please share what is the value for tenant passed in myvalues.yaml(values.yaml) file while installation of the driver?

The tenant is left at the default value from the myvalues example, so "".

@karthikk92
Copy link

also could you please share full myvalues.yaml and secret.yaml file . Also please attach full controller and node logs.

@ErikZandboer
Copy link
Author

ErikZandboer commented Jan 11, 2022

secret.yaml.txt
myvalues.yaml.txt

Attached are the original myvalues.yaml and secret.yaml. How do I get to the "full controller and node" logs? What output are you looking for? Happy to provide :)

@ErikZandboer
Copy link
Author

ErikZandboer commented Jan 11, 2022

controller.log.txt
node.log.txt
I dumped the full logs using kubectl -n unity logs --all-containers assuming this is what you need. I included the output into the files attached. If you require anything else let me know!

@karthikk92
Copy link

karthikk92 commented Jan 12, 2022

The node and controller logs does not have debug logs(has info logs). Could you please set loglevel to Debug and install again and share the logs? We require debug logs for further debugging.

eg:
logLevel: "Debug"

@ErikZandboer
Copy link
Author

container_debug.log.txt
node_debug.log.txt
secret.yaml.txt
myvalues.yaml.txt

Hi, thanks for your support. The requested logs are attached, level is now debug. For completeness I also added the current secret and myvalues yaml files. Hope you have enough info to debug this.

@ErikZandboer
Copy link
Author

Minor update: I retried with CSI version 2.0.0. This worked immediately, so I guess that rules out any issues with the Unity VSA.

@karthikk92
Copy link

Able to reproduce the issue and issue is because of the change in rest api behavior w.r.t tenant in unity vsa 5.1.0 ( it worked fine for 5.0 unity vsa and 5.1 physical unity host also). So we will be sharing the new nightly build with the fix for vsa 5.1.0 unity support. dell/csi-unity#55 this is the PR which has fix.

Thank You,
Karthik K

@gallacher gallacher added this to the v1.2.0 milestone Jan 18, 2022
@ErikZandboer
Copy link
Author

Confirmed this fix works, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/csi-unity Issue pertains to the CSI Driver for Dell EMC Unity type/bug Something isn't working. This is the default label associated with a bug issue.
Projects
None yet
Development

No branches or pull requests

4 participants