Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: CSM Operator fails to install CSM Replication on the remote cluster #988

Closed
lukeatdell opened this issue Sep 22, 2023 · 1 comment
Assignees
Labels
area/csm-operator area/csm-replication Issue pertains to the CSM Replication module type/bug Something isn't working. This is the default label associated with a bug issue.
Milestone

Comments

@lukeatdell
Copy link

Bug Description

When installing CSM Replication via CSM Operator on two OpenShift clusters, the csm-operator-manager pod is unable to communicate with the secondary cluster and is thus unable to complete validation pre-checks on the remote/secondary/disaster-recover cluster, blocking the install of the CSI driver and CSM Replication via CSM Operator.

Logs

2023-09-22T16:52:17.811Z INFO workspace/main.go:143 Use ConfigDirectory /etc/config/dell-csm-operator {"TraceId": "main"}
1.6954015401672292e+09 INFO controller-runtime.metrics Metrics server is starting to listen {"addr": "127.0.0.1:8080"}
1.695401540168204e+09 INFO setup starting manager
1.6954015401686692e+09 INFO Starting server {"path": "/metrics", "kind": "metrics", "addr": "127.0.0.1:8080"}
I0922 16:52:20.168696 1 leaderelection.go:248] attempting to acquire leader lease openshift-operators/090cae6a.dell.com...
1.6954015401686754e+09 INFO Starting server {"kind": "health probe", "addr": "[::]:8081"}
I0922 16:52:37.501589 1 leaderelection.go:258] successfully acquired lease openshift-operators/090cae6a.dell.com
1.6954015575017395e+09 DEBUG events Normal {"object": {"kind":"ConfigMap","namespace":"openshift-operators","name":"090cae6a.dell.com","uid":"10a95494-4bdf-4b41-b82a-bbe9b242d736","apiVersion":"v1","resourceVersion":"1042594"}, "reason": "LeaderElection", "message": "dell-csm-operator-controller-manager-57798fdd78-mndc2_1aaeb44f-5e7e-43fd-b14c-28663aca5b63 became leader"}
1.6954015575018597e+09 DEBUG events Normal {"object": {"kind":"Lease","namespace":"openshift-operators","name":"090cae6a.dell.com","uid":"b00994e6-cc9c-4669-89b6-e407116c33c4","apiVersion":"coordination.k8s.io/v1","resourceVersion":"1042595"}, "reason": "LeaderElection", "message": "dell-csm-operator-controller-manager-57798fdd78-mndc2_1aaeb44f-5e7e-43fd-b14c-28663aca5b63 became leader"}
1.6954015575019403e+09 INFO controller.containerstoragemodule Starting EventSource {"reconciler group": "storage.dell.com", "reconciler kind": "ContainerStorageModule", "source": "kind source: *v1.ContainerStorageModule"}
1.6954015575019846e+09 INFO controller.containerstoragemodule Starting Controller {"reconciler group": "storage.dell.com", "reconciler kind": "ContainerStorageModule"}
1.695401557602666e+09 INFO controller.containerstoragemodule Starting workers {"reconciler group": "storage.dell.com", "reconciler kind": "ContainerStorageModule", "worker count": 1}
2023-09-22T16:52:53.655Z INFO controllers/csm_controller.go:203 ################Starting Reconcile############## {"TraceId": "isilon-1"}
2023-09-22T16:52:53.655Z INFO controllers/csm_controller.go:206 reconcile for {"TraceId": "isilon-1", "Namespace": "dell-csm", "Name": "isilon", "Attempt": 1}
2023-09-22T16:52:53.656Z DEBUG drivers/powerscale.go:80 preCheck {"TraceId": "isilon-1", "skipCertValid": true, "certCount": 1, "secrets": 1}
2023-09-22T16:52:54.265Z INFO controllers/csm_controller.go:1221 proceeding with fresh driver install {"TraceId": "isilon-1"}
2023-09-22T16:52:54.276Z INFO controllers/csm_controller.go:1124 Driver not installed yet {"TraceId": "isilon-1"}
2023-09-22T16:52:54.336Z INFO record/event.go:285 Event(v1.ObjectReference{Kind:"ContainerStorageModule", Namespace:"dell-csm", Name:"isilon", UID:"020e0b89-6313-47a7-a048-9faece60dfb2", APIVersion:"storage.dell.com/v1", ResourceVersion:"1042687", FieldPath:""}): type: 'Warning' reason: 'Updated' Failed Prechecks: failed replication validation: Get "https://api.<endpoint>.com:6443/api?timeout=32s": dial tcp: lookup api.<endpoint>.com on 172.30.0.10:53: no such host {"TraceId": "main"}
################End Reconcile##############
2023-09-22T16:52:54.350Z ERROR utils/status.go:345 failed replication validation: Get "https://api.<endpoint>.com:6443/api?timeout=32s": dial tcp: lookup api.<endpoint>.com on 172.30.0.10:53: no such host *************Create/Update isilon failed ******** {"TraceId": "isilon-1"}
1.695401574350591e+09 ERROR controller.containerstoragemodule Reconciler error {"reconciler group": "storage.dell.com", "reconciler kind": "ContainerStorageModule", "name": "isilon", "namespace": "dell-csm", "error": "failed replication validation: Get \"https://api.<endpoint>.com:6443/api?timeout=32s\": dial tcp: lookup api.<endpoint>.com on 172.30.0.10:53: no such host"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:227
2023-09-22T16:52:54.355Z INFO controllers/csm_controller.go:203 ################Starting Reconcile############## {"TraceId": "isilon-2"}

Screenshots

From Primary Cluster

270002697-61fbe396-ee77-4f28-a88e-56c5a5140c3e

Additional Environment Information

No response

Steps to Reproduce

  1. Install CSM Operator.
  2. Map the secondary cluster's IP to the OpenShift API endpoint in the /etc/hosts file.
  3. Configure pre-requisites for CSM Replication.
    a. Clone csm-replication repo and build repctl.
    b. Add cluster info to repctl using repctl cluster add ...
    c. Install replication CRDs.
    d. Inject service account info
    e. Create storage-classes.
  4. Configure CSI driver pre-requisites.
    a. Create driver namespace.
    b. Create isilon-creds generic secret.
    c. Create empty secret
  5. Create a ContainerStorageModule instance in the OpenShift Console. Editing the config to enable replication, and providing the name of the target cluster under envs.name: TARGET_CLUSTERS_IDS and clicking 'Create'.

Expected Behavior

The csm-operator-manager pod should query the secondary cluster's API endpoint, complete pre-check validation, and install CSM Replication and the CSI driver on both primary and secondary cluster.

CSM Driver(s)

CSI PowerFlex v2.7.0
CSI PowerScale v2.7.0

Installation Type

CSM-Operator v1.2.0

Container Storage Modules Enabled

CSM-Replication v1.5.0

Container Orchestrator

OpenShift 4.12

Operating System

RHEL 9.2

@lukeatdell lukeatdell added needs-triage Issue requires triage. type/bug Something isn't working. This is the default label associated with a bug issue. area/csm-replication Issue pertains to the CSM Replication module area/csm-operator labels Sep 22, 2023
@harshaatdell harshaatdell added backlog and removed needs-triage Issue requires triage. labels Sep 23, 2023
@harshaatdell harshaatdell self-assigned this Sep 23, 2023
@gallacher gallacher added this to the v1.9.0 milestone Sep 25, 2023
@lukeatdell
Copy link
Author

lukeatdell commented Oct 9, 2023

Documentation has been updated to address this potential issue as part of dell/csm-docs#844.
It should be noted, the documented method is a workaround for users with environments that use multiple clusters and do not have the DNS configured to resolve queries against the FQDN of the remote cluster. It is highly recommended that the customer configure the DNS to support this name resolution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/csm-operator area/csm-replication Issue pertains to the CSM Replication module type/bug Something isn't working. This is the default label associated with a bug issue.
Projects
None yet
Development

No branches or pull requests

3 participants