Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Not able to run more then one replicas of csm-isilon-controller after upgrading to dell-csm-operator-controller-manager 1.4.0 #1099

Closed
N1K68 opened this issue Jan 12, 2024 · 5 comments
Assignees
Labels
area/csi-powerscale Issue pertains to the CSI Driver for Dell EMC PowerScale needs-triage Issue requires triage. type/bug Something isn't working. This is the default label associated with a bug issue.

Comments

@N1K68
Copy link

N1K68 commented Jan 12, 2024

Bug Description

The first controller start without any problems. For the second controller five containers (podmon,resizer,attacher,provisioner, snapshotter) always gets: "Waiting on connection to driver csi.sock: context deadline exceeded".

The CSM Operator was installed manually using the without OLM using: "bash scripts/install.sh" since the CSM Operator 1.4.0 doesn't seems to be available at RedHats OpertorHub at the moment.

This issue has been reproduced in 3 different cluster.

Logs

Defaulted container "podmon" out of: podmon, resizer, attacher, provisioner, snapshotter, csi-metadata-retriever, driver
time="Fri, 12 Jan 2024 15:27:49 UTC" level=info msg="parameter value after config file processing" PODMON_CONTROLLER_LOG_LEVEL=debug
time="Fri, 12 Jan 2024 15:27:49 UTC" level=info msg="parameter value after config file processing" monitor.ArrayConnectivityPollRate=1m0s
time="Fri, 12 Jan 2024 15:27:49 UTC" level=info msg="parameter value after config file processing" monitor.ArrayConnectivityConnectionLossThreshold=3
time="Fri, 12 Jan 2024 15:27:49 UTC" level=info msg="parameter value after config file processing" monitor.PodMonitor.SkipArrayConnectionValidation=false
time="Fri, 12 Jan 2024 15:27:49 UTC" level=info msg="Running in controller mode"
time="Fri, 12 Jan 2024 15:27:49 UTC" level=info msg="CSI Driver for PowerScale"
time="Fri, 12 Jan 2024 15:27:49 UTC" level=info msg="attempting k8sapi connection"
time="Fri, 12 Jan 2024 15:27:49 UTC" level=info msg="Using InClusterConfig()"
time="Fri, 12 Jan 2024 15:27:49 UTC" level=info msg="connected to k8sapi"
time="Fri, 12 Jan 2024 15:27:49 UTC" level=info msg="Attempting driver connection at: unix:/var/run/csi/csi.sock"
time="Fri, 12 Jan 2024 15:27:59 UTC" level=debug msg="grpc.Dial returned context deadline exceeded"
time="Fri, 12 Jan 2024 15:27:59 UTC" level=error msg="Waiting on connection to driver csi.sock: context deadline exceeded"
time="Fri, 12 Jan 2024 15:28:39 UTC" level=debug msg="grpc.Dial returned context deadline exceeded"
time="Fri, 12 Jan 2024 15:28:39 UTC" level=error msg="Waiting on connection to driver csi.sock: context deadline exceeded"
time="Fri, 12 Jan 2024 15:29:19 UTC" level=debug msg="grpc.Dial returned context deadline exceeded"
time="Fri, 12 Jan 2024 15:29:19 UTC" level=error msg="Waiting on connection to driver csi.sock: context deadline exceeded"
time="Fri, 12 Jan 2024 15:29:59 UTC" level=debug msg="grpc.Dial returned context deadline exceeded"
time="Fri, 12 Jan 2024 15:29:59 UTC" level=error msg="Waiting on connection to driver csi.sock: context deadline exceeded"
time="Fri, 12 Jan 2024 15:30:39 UTC" level=debug msg="grpc.Dial returned context deadline exceeded"
time="Fri, 12 Jan 2024 15:30:39 UTC" level=error msg="Waiting on connection to driver csi.sock: context deadline exceeded"
time="Fri, 12 Jan 2024 15:31:19 UTC" level=debug msg="grpc.Dial returned context deadline exceeded"
time="Fri, 12 Jan 2024 15:31:19 UTC" level=error msg="Waiting on connection to driver csi.sock: context deadline exceeded"
time="Fri, 12 Jan 2024 15:31:59 UTC" level=debug msg="grpc.Dial returned context deadline exceeded"
time="Fri, 12 Jan 2024 15:31:59 UTC" level=error msg="Waiting on connection to driver csi.sock: context deadline exceeded"
time="Fri, 12 Jan 2024 15:32:39 UTC" level=debug msg="grpc.Dial returned context deadline exceeded"
time="Fri, 12 Jan 2024 15:32:39 UTC" level=error msg="Waiting on connection to driver csi.sock: cont
Defaulted container "podmon" out of: podmon, resizer, attacher, provisioner, snapshotter, csi-metadata-retriever, driver
time="Fri, 12 Jan 2024 15:27:49 UTC" level=info msg="parameter value after config file processing" PODMON_CONTROLLER_LOG_LEVEL=debug
time="Fri, 12 Jan 2024 15:27:49 UTC" level=info msg="parameter value after config file processing" monitor.ArrayConnectivityPollRate=1m0s
time="Fri, 12 Jan 2024 15:27:49 UTC" level=info msg="parameter value after config file processing" monitor.ArrayConnectivityConnectionLossThreshold=3
time="Fri, 12 Jan 2024 15:27:49 UTC" level=info msg="parameter value after config file processing" monitor.PodMonitor.SkipArrayConnectionValidation=false
time="Fri, 12 Jan 2024 15:27:49 UTC" level=info msg="Running in controller mode"
time="Fri, 12 Jan 2024 15:27:49 UTC" level=info msg="CSI Driver for PowerScale"
time="Fri, 12 Jan 2024 15:27:49 UTC" level=info msg="attempting k8sapi connection"
time="Fri, 12 Jan 2024 15:27:49 UTC" level=info msg="Using InClusterConfig()"
time="Fri, 12 Jan 2024 15:27:49 UTC" level=info msg="connected to k8sapi"
time="Fri, 12 Jan 2024 15:27:49 UTC" level=info msg="Attempting driver connection at: unix:/var/run/csi/csi.sock"
time="Fri, 12 Jan 2024 15:27:59 UTC" level=debug msg="grpc.Dial returned context deadline exceeded"
time="Fri, 12 Jan 2024 15:27:59 UTC" level=error msg="Waiting on connection to driver csi.sock: context deadline exceeded"
time="Fri, 12 Jan 2024 15:28:39 UTC" level=debug msg="grpc.Dial returned context deadline exceeded"
time="Fri, 12 Jan 2024 15:28:39 UTC" level=error msg="Waiting on connection to driver csi.sock: context deadline exceeded"
time="Fri, 12 Jan 2024 15:29:19 UTC" level=debug msg="grpc.Dial returned context deadline exceeded"
time="Fri, 12 Jan 2024 15:29:19 UTC" level=error msg="Waiting on connection to driver csi.sock: context deadline exceeded"
time="Fri, 12 Jan 2024 15:29:59 UTC" level=debug msg="grpc.Dial returned context deadline exceeded"
time="Fri, 12 Jan 2024 15:29:59 UTC" level=error msg="Waiting on connection to driver csi.sock: context deadline exceeded"
time="Fri, 12 Jan 2024 15:30:39 UTC" level=debug msg="grpc.Dial returned context deadline exceeded"
time="Fri, 12 Jan 2024 15:30:39 UTC" level=error msg="Waiting on connection to driver csi.sock: context deadline exceeded"
time="Fri, 12 Jan 2024 15:31:19 UTC" level=debug msg="grpc.Dial returned context deadline exceeded"
time="Fri, 12 Jan 2024 15:31:19 UTC" level=error msg="Waiting on connection to driver csi.sock: context deadline exceeded"
time="Fri, 12 Jan 2024 15:31:59 UTC" level=debug msg="grpc.Dial returned context deadline exceeded"
time="Fri, 12 Jan 2024 15:31:59 UTC" level=error msg="Waiting on connection to driver csi.sock: context deadline exceeded"
time="Fri, 12 Jan 2024 15:32:39 UTC" level=debug msg="grpc.Dial returned context deadline exceeded"
time="Fri, 12 Jan 2024 15:32:39 UTC" level=error msg="Waiting on connection to driver csi.sock: context deadline exceeded"

ext deadline exceeded"

  • resizer
    I0112 15:42:15.550926 1 main.go:93] Version : v1.9.2
    I0112 15:42:15.551044 1 feature_gate.go:249] feature gates: &{map[]}
    I0112 15:42:15.552151 1 connection.go:164] Connecting to unix:///var/run/csi/csi.sock
    W0112 15:42:25.552252 1 connection.go:183] Still connecting to unix:///var/run/csi/csi.sock
    W0112 15:42:35.552923 1 connection.go:183] Still connecting to unix:///var/run/csi/csi.sock
    W0112 15:42:45.553309 1 connection.go:183] Still connecting to unix:///var/run/csi/csi.sock
    F0112 15:42:45.553587 1 main.go:134] failed to connect to CSI driver: context deadline exceeded

attacher
I0112 15:42:30.555491 1 main.go:97] Version: v4.4.2
I0112 15:42:30.556813 1 connection.go:164] Connecting to unix:///var/run/csi/csi.sock
W0112 15:42:40.557690 1 connection.go:183] Still connecting to unix:///var/run/csi/csi.sock
W0112 15:42:50.557912 1 connection.go:183] Still connecting to unix:///var/run/csi/csi.sock
W0112 15:43:00.557201 1 connection.go:183] Still connecting to unix:///var/run/csi/csi.sock
E0112 15:43:00.557760 1 main.go:136] context deadline exceeded

provisioner
W0112 15:42:30.710740 1 feature_gate.go:241] Setting GA feature gate Topology=true. It will be removed in a future release.
I0112 15:42:30.710829 1 feature_gate.go:249] feature gates: &{map[Topology:true]}
I0112 15:42:30.710845 1 csi-provisioner.go:154] Version: v3.6.2
I0112 15:42:30.710849 1 csi-provisioner.go:177] Building kube configs for running in cluster...
I0112 15:42:30.711441 1 connection.go:164] Connecting to unix:///var/run/csi/csi.sock
W0112 15:42:40.712179 1 connection.go:183] Still connecting to unix:///var/run/csi/csi.sock
W0112 15:42:50.712476 1 connection.go:183] Still connecting to unix:///var/run/csi/csi.sock
W0112 15:43:00.712464 1 connection.go:183] Still connecting to unix:///var/run/csi/csi.sock
E0112 15:43:00.712581 1 csi-provisioner.go:215] context deadline exceeded

snapshotter
I0112 15:42:30.858708 1 main.go:109] Version: v6.3.2
I0112 15:42:30.859928 1 connection.go:164] Connecting to unix:///var/run/csi/csi.sock
W0112 15:42:40.860373 1 connection.go:183] Still connecting to unix:///var/run/csi/csi.sock
W0112 15:42:50.860734 1 connection.go:183] Still connecting to unix:///var/run/csi/csi.sock
W0112 15:43:00.860639 1 connection.go:183] Still connecting to unix:///var/run/csi/csi.sock
E0112 15:43:00.860672 1 main.go:174] error connecting to CSI driver: context deadline exceeded

Screenshots

image

Additional Environment Information

No response

Steps to Reproduce

Install the CSM Operator 1.4.0

Expected Behavior

Expected to to be able able to run two replicas of the csm-isilon-controller

CSM Driver(s)

isilon v2.9.0

Installation Type

Manually using the without OLM using: "bash scripts/install.sh"

Container Storage Modules Enabled

resiliency v1.8.0
observability v1.7.0

Container Orchestrator

OpenShift 4.13.27

Operating System

RHEL 9.2

@N1K68 N1K68 added needs-triage Issue requires triage. type/bug Something isn't working. This is the default label associated with a bug issue. labels Jan 12, 2024
@shanmydell shanmydell added the area/csi-powerscale Issue pertains to the CSI Driver for Dell EMC PowerScale label Jan 16, 2024
@shanmydell
Copy link
Collaborator

@cbartoszDell : Please have a look

@ybrock
Copy link

ybrock commented Jan 16, 2024

Hello,

I confirm, we're having the same issue with csi-isilon v2.9.0 and container-storage-modules 1.2.0

All is working fine, except that the second pod for the controler is "crash loopback". There is nothing relevant in the logs.

With version v2.8.0, all six container in the pod were in "Running" state.
With version v2.9.0, "csi-metadata-retriever" and "driver" containers are running, the 4 other are "Waiting"

Scale the replicas to 1 is a workaround ....

Kind regards

@N1K68
Copy link
Author

N1K68 commented Jan 17, 2024

I should maybe add that the issue started after the upgrade of the CSM Operator. That is the problem occured while we where still running the old 2.7.0 driver (we skipped 2.8.0) and persisted after we upgraded to the 2.9.0 driver. So it seems the CSM operator is the common denominator.

@adarsh-dell
Copy link
Contributor

Hi @ybrock @N1K68 ,

This issue is acknowledged, and a workaround is outlined in the release notes of the relevant driver. For details on the workaround, please consult the following documentation:

Link to Documentation

Rest assured, we plan to address and resolve this issue in an upcoming release.

@gallacher
Copy link
Contributor

This issue is being addressed as part of #1110.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/csi-powerscale Issue pertains to the CSI Driver for Dell EMC PowerScale needs-triage Issue requires triage. type/bug Something isn't working. This is the default label associated with a bug issue.
Projects
None yet
Development

No branches or pull requests

6 participants