Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stargate ENV SEED is incorrect when deploying to kubeadm cluster with non-default cluster and domain name #778

Closed
tlb1galaxy opened this issue Nov 23, 2022 · 6 comments · Fixed by #785
Assignees
Labels
bug Something isn't working done Issues in the state 'done'

Comments

@tlb1galaxy
Copy link

What happened?
New K8ssandra deployment via k8ssandra-operator injects incorrect environment variable value for SEED into the Stargate pod.

Did you expect to see something different?
Would expect to see the actual DNS name 'demo-seed-service.k8ssandra-operator.svc.k8s-clst01.domain01.local'

How to reproduce it (as minimally and precisely as possible):
brand new deployment of k8ssandra-operator
Kubernetes cluster name and domain are not cluster.local

  1. kubectl create ns k8ssandra-operator
  2. helm install k8ssandra-operator k8ssandra/k8ssandra-operator --version 0.38.2 -n k8ssandra-operator
    a. v0.38.5 has issues, so using 0.38.2
  3. kubectl apply -n k8ssandra-operator -f 30-k8ssandra-k8ssandracluster.yaml

Environment

  • K8ssandra Operator version:
    docker.io/k8ssandra/k8ssandra-operator:v1.2.1

  • Kubernetes version information:

    kubectl version

clientVersion:
  buildDate: "2022-06-15T14:22:29Z"
  compiler: gc
  gitCommit: f66044f4361b9f1f96f0053dd46cb7dce5e990a8
  gitTreeState: clean
  gitVersion: v1.24.2
  goVersion: go1.18.3
  major: "1"
  minor: "24"
  platform: linux/amd64
kustomizeVersion: v4.5.4
serverVersion:
  buildDate: "2022-06-15T14:15:38Z"
  compiler: gc
  gitCommit: f66044f4361b9f1f96f0053dd46cb7dce5e990a8
  gitTreeState: clean
  gitVersion: v1.24.2
  goVersion: go1.18.3
  major: "1"
  minor: "24"
  platform: linux/amd64
  • CNI
    calico v3.24.0

  • CRI
    containerd v1.5.9

  • DNS
    coreDNS

  • Kubernetes cluster kind:

    kubeadm

  • Manifests:
    30-k8ssandra-k8ssandracluster.yaml

---
apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
  name: demo
spec:
  auth: true
  cassandra:
    serverVersion: "4.0.1"
    softPodAntiAffinity: true
    datacenters:
      - metadata:
          name: test-dc1
        racks:
          - name: rack1
        size: 4
        resources:
          limits:
            cpu: "500m"
            memory: 4Gi
          requests:
            cpu: "500m"
            memory: 4Gi
        config:
          jvmOptions:
            heap_initial_size: 1G
            heap_max_size: 2G
        storageConfig:
          cassandraDataVolumeClaimSpec:
            storageClassName: rook-ceph-block-hdd10k
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 20Gi
        stargate:
          size: 1
          resources:
            limits:
              cpu: "250m"
              memory: 512Mi
            requests:
              cpu: "250m"
              memory: 512Mi
          heapSize: 256Mi
          allowStargateOnDataNodes: true
          affinity:
            podAffinity:
              preferredDuringSchedulingIgnoredDuringExecution:
...
  • Stargate pod manifest:
apiVersion: v1
kind: Pod
metadata:
  annotations:
    cni.projectcalico.org/containerID: 1b3d48461cce4e7f6288999b43126588a0ef02e177d7a84a9c7d6911f95c256d
    cni.projectcalico.org/podIP: 192.168.43.197/32
    cni.projectcalico.org/podIPs: 192.168.43.197/32
  creationTimestamp: "2022-11-23T19:08:52Z"
  generateName: demo-test-dc1-rack1-stargate-deployment-5959bf7c8d-
  labels:
    app.kubernetes.io/component: stargate
    app.kubernetes.io/created-by: stargate-controller
    app.kubernetes.io/name: k8ssandra-operator
    app.kubernetes.io/part-of: k8ssandra
    k8ssandra.io/cluster-name: demo
    k8ssandra.io/cluster-namespace: k8ssandra-operator
    k8ssandra.io/stargate: demo-test-dc1-stargate
    k8ssandra.io/stargate-deployment: demo-test-dc1-rack1-stargate-deployment
    pod-template-hash: 5959bf7c8d
  name: demo-test-dc1-rack1-stargate-deployment-5959bf7c8d-dlpg6
  namespace: k8ssandra-operator
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: demo-test-dc1-rack1-stargate-deployment-5959bf7c8d
    uid: 25b59f39-b2a6-4022-ba57-fd1b7c450700
  resourceVersion: "8567777"
  uid: 0979636f-2664-443a-92a3-2f44bfda9bd8
spec:
  affinity:
    podAffinity: {}
  containers:
  - env:
    - name: LISTEN
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: status.podIP
    - name: JAVA_OPTS
      value: -XX:+CrashOnOutOfMemoryError -Xms268435456 -Xmx268435456
    - name: CLUSTER_NAME
      value: demo
    - name: CLUSTER_VERSION
      value: "4.0"
    - name: SEED
      value: demo-seed-service.k8ssandra-operator.svc.cluster.local
    - name: DATACENTER_NAME
      value: test-dc1
    - name: RACK_NAME
      value: rack1
    - name: DISABLE_BUNDLES_WATCH
      value: "true"
    - name: ENABLE_AUTH
      value: "true"
    image: docker.io/stargateio/stargate-4_0:v1.0.67
  • K8ssandra Operator Logs:

No errors show in the operator log

  • Stargate pod logs
Using environment for config
Running java -server -XX:+CrashOnOutOfMemoryError -Xms268435456 -Xmx268435456 -Dstargate.libdir=./stargate-lib -Djava.awt.headless=true -jar ./stargate-lib/stargate-starter-1.0.67.jar --cluster-name demo --cluster-version 4.0 --cluster-seed demo-seed-service.k8ssandra-operator.svc.cluster.local --listen 192.168.43.254 --dc test-dc1 --rack rack1 --enable-auth --disable-bundles-watch
Unable to resolve seed node address demo-seed-service.k8ssandra-operator.svc.cluster.local
@tlb1galaxy tlb1galaxy added the bug Something isn't working label Nov 23, 2022
@adejanovski
Copy link
Contributor

Hi @tlb1galaxy,

it looks like we're missing something in the CRD indeed to indicate the cluster domain: https://github.com/k8ssandra/k8ssandra-operator/blob/main/pkg/stargate/deployments.go#L29-L31

@burmanm, should we totally remove the cluster domain part or add the cluster domain in the CRD?

@tlb1galaxy
Copy link
Author

@adejanovski
Also a side note. Just so you know. I don't think it affects the issue/resolution.
I also tried a few different scenarios to see if the procedure was to blame or possible DNS update timing issue.

Move Stargate config to spec.stargate:

same result

Deploy with no Stargate initially and then patch with Starget config (nested or toplevel):

same result

@burmanm
Copy link
Contributor

burmanm commented Nov 24, 2022

@burmanm, should we totally remove the cluster domain part or add the cluster domain in the CRD?

Remove, if we're accessing the local cluster (and not a remote one). We don't need that info and traditionally the Kubernetes cluster itself hasn't known its own cluster name.

@adejanovski
Copy link
Contributor

Sounds good, let's remove all occurrences of .cluster.local then.

@adejanovski adejanovski added the ready Issues in the state 'ready' label Nov 25, 2022
@hanhduynguyen
Copy link

same issue.
I create a non-default cluster due to exiting bug. After creating cluster, my pods stuck at pending state.

helm install k8ssandra-cluster k8ssandra/k8ssandra-operator -n k8ssandra-operator --set global.clusterScoped=true --create-namespace

Wanting for solution

@adejanovski adejanovski added in-progress Issues in the state 'in-progress' and removed ready Issues in the state 'ready' labels Nov 29, 2022
@adejanovski adejanovski self-assigned this Nov 29, 2022
@adejanovski
Copy link
Contributor

What about this @burmanm ?
Do we need to keep the cluster.local variant in the dns names for the Certificate or could it create some problems?

@adejanovski adejanovski added ready-for-review Issues in the state 'ready-for-review' and removed in-progress Issues in the state 'in-progress' labels Nov 30, 2022
@adejanovski adejanovski added review Issues in the state 'review' and removed ready-for-review Issues in the state 'ready-for-review' labels Dec 8, 2022
@adejanovski adejanovski added done Issues in the state 'done' and removed review Issues in the state 'review' labels Dec 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working done Issues in the state 'done'
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants