K8SSAND-1563 ⁃ error "Exception encountered during startup: The seed provider lists no seeds" when recreating pods #567

Tom910 · 2022-06-14T12:49:48Z

What happened?

When I have many cass clusters and run kubectl delete --all pods --namespace k8s-cluster-test after this command 70% of clusters would recover but 30% of clusters would have the error Exception encountered during startup: The seed provider lists no seeds.

cluster logs

WARN  [main] 2022-06-14 11:52:25,312 K8SeedProvider4x.java:58 - Seed provider couldn't lookup host test-clusterr-37-seed-service
WARN  [main] 2022-06-14 11:52:25,338 K8SeedProvider4x.java:58 - Seed provider couldn't lookup host test-clusterr-37-dctestr-37-additional-seed-service
ERROR [main] 2022-06-14 11:52:25,340 CassandraDaemon.java:909 - Exception encountered during startup: The seed provider lists no seeds.

dns check on cass nodes

❯ kubectl exec --stdin --tty -n k8s-cluster-test test-clusterr-37-dctestr-37-default-sts-0 -- /bin/bash

cassandra@test-clusterr-37-dctestr-37-default-sts-0:/$ curl test-clusterr-37-seed-service
curl: (6) Could not resolve host: test-clusterr-37-seed-service

cassandra@test-clusterr-37-dctestr-37-default-sts-0:/$ curl test-clusterr-37-dctestr-37-all-pods-service
curl: (7) Failed to connect to test-clusterr-37-dctestr-37-all-pods-service port 80: Connection refused

At the same time, the cluster is repaired if you delete the pods again

Did you expect to see something different?

I want to see 100% recover clusters

How to reproduce it (as minimally and precisely as possible):

Environment

K8ssandra Operator version:

github.com/k8ssandra/k8ssandra-operator/config/deployments/control-plane/cluster-scope?ref=v1.1.1
* Kubernetes version information:
v1.21.5 coredns = 1.6.3
* Kubernetes cluster kind:

managed

Manifests:

I have 40 similar clusters like:

apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
  name: test-clusterr-37
  namespace: k8s-cluster-test
spec:
  cassandra:
    serverVersion: "4.0.1"
    softPodAntiAffinity: true
    resources:
      requests:
        cpu: "500m"
        memory: "1Gi"
      limits:
        cpu: "1900m"
        memory: "2Gi"
    datacenters:
      - metadata:
          name: dctestr-37
        size: 2
        storageConfig:
          cassandraDataVolumeClaimSpec:
            resources:
              requests:
                storage: 2Gi
        config:
          jvmOptions:
            heapSize: 256M

K8ssandra Operator Logs:

1.655210188848521e+09   INFO    controller.k8ssandracluster     Reconciling Medusa user secrets {"reconciler group": "k8ssandra.io", "reconciler kind": "K8ssandraCluster", "name": "test-clusterr-37", "namespace": "k8s-cluster-test", "K8ssandraCluster": "k8s-cluster-test/test-clusterr-37"}
1.6552101888485692e+09  INFO    controller.k8ssandracluster     Medusa user secrets successfully reconciled     {"reconciler group": "k8ssandra.io", "reconciler kind": "K8ssandraCluster", "name": "test-clusterr-37", "namespace": "k8s-cluster-test", "K8ssandraCluster": "k8s-cluster-test/test-clusterr-37"}
1.655210188848572e+09   INFO    controller.k8ssandracluster     Reconciling replicated secrets  {"reconciler group": "k8ssandra.io", "reconciler kind": "K8ssandraCluster", "name": "test-clusterr-37", "namespace": "k8s-cluster-test", "K8ssandraCluster": "k8s-cluster-test/test-clusterr-37"}
1.655210188848788e+09   INFO    controller.k8ssandracluster     Medusa reconcile for dctestr-37 on namespace k8s-cluster-test   {"reconciler group": "k8ssandra.io", "reconciler kind": "K8ssandraCluster", "name": "test-clusterr-37", "namespace": "k8s-cluster-test", "K8ssandraCluster": "k8s-cluster-test/test-clusterr-37"}
1.655210188848799e+09   INFO    controller.k8ssandracluster     Medusa is not enabled   {"reconciler group": "k8ssandra.io", "reconciler kind": "K8ssandraCluster", "name": "test-clusterr-37", "namespace": "k8s-cluster-test", "K8ssandraCluster": "k8s-cluster-test/test-clusterr-37"}
1.6552101888513434e+09  INFO    controller.k8ssandracluster     Reconciling seeds       {"reconciler group": "k8ssandra.io", "reconciler kind": "K8ssandraCluster", "name": "test-clusterr-37", "namespace": "k8s-cluster-test", "K8ssandraCluster": "k8s-cluster-test/test-clusterr-37", "CassandraDatacenter": "k8s-cluster-test/dctestr-37", "K8SContext": ""}
1.6552101888514278e+09  INFO    controller.k8ssandracluster     The datacenter is reconciled    {"reconciler group": "k8ssandra.io", "reconciler kind": "K8ssandraCluster", "name": "test-clusterr-37", "namespace": "k8s-cluster-test", "K8ssandraCluster": "k8s-cluster-test/test-clusterr-37", "CassandraDatacenter": "k8s-cluster-test/dctestr-37", "K8SContext": ""}
1.6552101888591702e+09  DEBUG   controller-runtime.webhook.webhooks     received request        {"webhook": "/validate-k8ssandra-io-v1alpha1-k8ssandracluster", "UID": "bd0f9a93-61da-44ff-88fb-3f19a6955a6c", "kind": "k8ssandra.io/v1alpha1, Kind=K8ssandraCluster", "resource": {"group":"k8ssandra.io","version":"v1alpha1","resource":"k8ssandraclusters"}}
1.6552101888596108e+09  INFO    k8ssandracluster-webhook        validate K8ssandraCluster update        {"K8ssandraCluster": "test-clusterr-37"}
1.6552101888596413e+09  DEBUG   controller-runtime.webhook.webhooks     wrote response  {"webhook": "/validate-k8ssandra-io-v1alpha1-k8ssandracluster", "code": 200, "reason": "", "UID": "bd0f9a93-61da-44ff-88fb-3f19a6955a6c", "allowed": true}
1.6552101888640745e+09  INFO    controller.k8ssandracluster     Preparing to update replication for system keyspaces    {"reconciler group": "k8ssandra.io", "reconciler kind": "K8ssandraCluster", "name": "test-clusterr-37", "namespace": "k8s-cluster-test", "K8ssandraCluster": "k8s-cluster-test/test-clusterr-37", "CassandraDatacenter": "k8s-cluster-test/dctestr-37", "K8SContext": "", "replication": {"dctestr-37":2}}
1.6552101888641243e+09  INFO    controller.k8ssandracluster     Ensuring that keyspace system_traces exists in cluster test-clusterr-37...      {"reconciler group": "k8ssandra.io", "reconciler kind": "K8ssandraCluster", "name": "test-clusterr-37", "namespace": "k8s-cluster-test", "K8ssandraCluster": "k8s-cluster-test/test-clusterr-37", "CassandraDatacenter": "k8s-cluster-test/dctestr-37", "K8SContext": ""}
1.6552101888643324e+09  ERROR   controller.k8ssandracluster     Failed to fetch datacenter pods {"reconciler group": "k8ssandra.io", "reconciler kind": "K8ssandraCluster", "name": "test-clusterr-37", "namespace": "k8s-cluster-test", "K8ssandraCluster": "k8s-cluster-test/test-clusterr-37", "CassandraDatacenter": "k8s-cluster-test/dctestr-37", "K8SContext": "", "error": "no pods in READY state found in datacenter dctestr-37"}
github.com/k8ssandra/k8ssandra-operator/pkg/cassandra.(*defaultManagementApiFacade).EnsureKeyspaceReplication
        /workspace/pkg/cassandra/management.go:289
github.com/k8ssandra/k8ssandra-operator/controllers/k8ssandra.(*K8ssandraClusterReconciler).updateReplicationOfSystemKeyspaces
        /workspace/controllers/k8ssandra/schemas.go:183
github.com/k8ssandra/k8ssandra-operator/controllers/k8ssandra.(*K8ssandraClusterReconciler).checkSchemas
        /workspace/controllers/k8ssandra/schemas.go:43
github.com/k8ssandra/k8ssandra-operator/controllers/k8ssandra.(*K8ssandraClusterReconciler).reconcileDatacenters
        /workspace/controllers/k8ssandra/datacenters.go:168
github.com/k8ssandra/k8ssandra-operator/controllers/k8ssandra.(*K8ssandraClusterReconciler).reconcile
        /workspace/controllers/k8ssandra/k8ssandracluster_controller.go:133
github.com/k8ssandra/k8ssandra-operator/controllers/k8ssandra.(*K8ssandraClusterReconciler).Reconcile
        /workspace/controllers/k8ssandra/k8ssandracluster_controller.go:87
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:114
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:311
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:227
1.6552101888643928e+09  ERROR   controller.k8ssandracluster     Failed to update replication    {"reconciler group": "k8ssandra.io", "reconciler kind": "K8ssandraCluster", "name": "test-clusterr-37", "namespace": "k8s-cluster-test", "K8ssandraCluster": "k8s-cluster-test/test-clusterr-37", "CassandraDatacenter": "k8s-cluster-test/dctestr-37", "K8SContext": "", "keyspace": "system_traces", "error": "no pods in READY state found in datacenter dctestr-37"}
github.com/k8ssandra/k8ssandra-operator/controllers/k8ssandra.(*K8ssandraClusterReconciler).checkSchemas
        /workspace/controllers/k8ssandra/schemas.go:43
github.com/k8ssandra/k8ssandra-operator/controllers/k8ssandra.(*K8ssandraClusterReconciler).reconcileDatacenters
        /workspace/controllers/k8ssandra/datacenters.go:168
github.com/k8ssandra/k8ssandra-operator/controllers/k8ssandra.(*K8ssandraClusterReconciler).reconcile
        /workspace/controllers/k8ssandra/k8ssandracluster_controller.go:133
github.com/k8ssandra/k8ssandra-operator/controllers/k8ssandra.(*K8ssandraClusterReconciler).Reconcile
        /workspace/controllers/k8ssandra/k8ssandracluster_controller.go:87
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:114
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:311
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:227
1.6552101888729985e+09  INFO    controller.k8ssandracluster     updated k8ssandracluster status {"reconciler group": "k8ssandra.io", "reconciler kind": "K8ssandraCluster", "name": "test-clusterr-37", "namespace": "k8s-cluster-test", "K8ssandraCluster": "k8s-cluster-test/test-clusterr-37"}
1.6552101888730514e+09  ERROR   controller.k8ssandracluster     Reconciler error        {"reconciler group": "k8ssandra.io", "reconciler kind": "K8ssandraCluster", "name": "test-clusterr-37", "namespace": "k8s-cluster-test", "error": "no pods in READY state found in datacenter dctestr-37"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:227

Anything else we need to know?:

I found a similar issue - kubernetes/kubernetes#92559 about DNS cache but I don't have oportunity change this parameters right now

┆Issue is synchronized with this Jira Task by Unito
┆friendlyId: K8SSAND-1563
┆priority: Medium

The text was updated successfully, but these errors were encountered:

Tom910 · 2022-06-14T13:22:40Z

Additional information

❯ kubectl get pods test-clusterr-37-dctestr-37-default-sts-0 -n k8s-cluster-test -o yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2022-06-14T11:21:21Z"
  generateName: test-clusterr-37-dctestr-37-default-sts-
  labels:
    app.kubernetes.io/instance: cassandra-test-clusterr-37
    app.kubernetes.io/managed-by: cass-operator
    app.kubernetes.io/name: cassandra
    app.kubernetes.io/version: 4.0.1
    cassandra.datastax.com/cluster: test-clusterr-37
    cassandra.datastax.com/datacenter: dctestr-37
    cassandra.datastax.com/node-state: Starting
    cassandra.datastax.com/rack: default
    controller-revision-hash: test-clusterr-37-dctestr-37-default-sts-78698c559b
    statefulset.kubernetes.io/pod-name: test-clusterr-37-dctestr-37-default-sts-0
  name: test-clusterr-37-dctestr-37-default-sts-0

❯ kubectl get svc test-clusterr-37-seed-service -n k8s-cluster-test -o yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    cassandra.datastax.com/resource-hash: n6slxhCbAHOHCzjD4a6VkgBOeGLkAVQ+cLrpmZHxNKI=
  creationTimestamp: "2022-06-10T11:55:32Z"
  labels:
    app.kubernetes.io/instance: cassandra-test-clusterr-37
    app.kubernetes.io/managed-by: cass-operator
    app.kubernetes.io/name: cassandra
    app.kubernetes.io/version: 4.0.1
    cassandra.datastax.com/cluster: test-clusterr-37
  name: test-clusterr-37-seed-service
  namespace: k8s-cluster-test
  ownerReferences:
  - apiVersion: cassandra.datastax.com/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: CassandraDatacenter
    name: dctestr-37
    uid: ed346db6-6df8-47e3-b555-0a55139f92c2
  resourceVersion: "397321"
  uid: 0ca4f1a1-0f22-47e5-a892-42d7304af523
spec:
  clusterIP: None
  clusterIPs:
  - None
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  publishNotReadyAddresses: true
  selector:
    cassandra.datastax.com/cluster: test-clusterr-37
    cassandra.datastax.com/seed-node: "true"
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

as result pod didn't have cassandra.datastax.com/seed-node: "true" and service don't work correctly

jsanda · 2022-06-14T15:13:47Z

This doesn't answer the question about what's happening, but I want to ask what is the use case for deleting the pods? Have you considered using the stopped property to shutdown DCs? That will scale the underlying statefulsets down to zero but keep the persistent volumes intact.

jsanda · 2022-06-14T17:41:22Z

This should be investigated. At any point for various reasons, Kubernetes can delete and reschedule pods. k8ssandra-operator and cass-operator need to handle that gracefully.

It's also worth noting that cass-operator creates a PodDisruptionBudget for each CassadraDatacenter. It only sets the minAvailable property to a value of the datacenter size minus one.

Tom910 · 2022-06-14T20:14:29Z

This should be investigated. At any point for various reasons, Kubernetes can delete and reschedule pods. k8ssandra-operator and cass-operator need to handle that gracefully.

In my case it was cluster for test and I used preemptible workers which are automatically deleted

hvintus · 2022-06-21T18:49:25Z

The problem seems to be that startOneNodePerRack starts the first node without seed label (since it assumes there still are other seeds), but by the time Cassandra daemon is started, the seed services have no endpoints anymore. cass-operator doesn't handle that case in further reconciliation attempts stopping at findStartingNodes.

The reason there still are IPs in the seed endpoints when all the pods have been deleted is the k8ssanda-operator behaviour. It gathers all the pods with seed labels from all the clusters and posts them in the -addional-seed-service. In case of the single cluster deployment it effectively copies IPs from the -seed-service to the -additional-seed-service, just with a delay caused by its reconciliation loop. Since cass-operator considers both services when deciding to assign the seed label to the starting node, that pod ends up without the label.

jsanda · 2022-06-23T03:31:29Z

The problem seems to be that startOneNodePerRack starts the first node without seed label (since it assumes there still are other seeds)

This is incorrect. Here is the snippet from the method:

			if labelSeedBeforeStart {
					patch := client.MergeFrom(pod.DeepCopy())
					pod.Labels[api.SeedNodeLabel] = "true"
					if err := rc.Client.Patch(rc.Ctx, pod, patch); err != nil {
						return "", err
					}

					rc.Recorder.Eventf(rc.Datacenter, corev1.EventTypeNormal, events.LabeledPodAsSeed,
						"Labeled pod a seed node %s", pod.Name)

					// sleeping five seconds for DNS paranoia
					time.Sleep(5 * time.Second)
				}

You are correct that for a single-DC cluster the seed-service and additional-seeds-service will the same endpoints. When the pods get deleted, those endpoints will get removed from the seed-service. The next time k8ssandra-operators performs reconciliation after the endpoints are removed from the seed-service it will delete the endpoints for the additional-seeds-service. At this point cass-operator should proceed with starting a node in each rack.

There is obviously some sort of timing issue since @Tom910 mentioned this didn't happen all the time and deleting the pods a second time got things back to a healthy state.

For what it's worth, I've had trouble with preemptible worker nodes and cass-operator in the past and avoided them because of it.

This is going to require some more investigation.

jsanda · 2022-06-23T03:32:59Z

Please add your planning poker estimate with ZenHub @burmanm

hvintus · 2022-06-23T07:50:48Z

@jsanda You are right that startOneNodePerRack could label the pod as seed, but the problem is that it won't in that particular case:

	externalSeedPoints := 0
	if existingEndpoints, err := rc.GetAdditionalSeedEndpoint(); err == nil {
		externalSeedPoints = len(existingEndpoints.Subsets[0].Addresses)
	}

	labelSeedBeforeStart := readySeeds == 0 && len(rc.Datacenter.Spec.AdditionalSeeds) == 0 && externalSeedPoints == 0

externalSeedPoints is not 0 since there still are IP addresses in the -addional-seed-service endpoints. The sequence of events I observe when I delete all the worker pods:

Before termination:
- pod1: Started, Seed
- pod2: Started, Seed
- pod3: Started, Seed
- seed-service: [Pod1, Pod2, Pod3]
- additional-seed-service: [Pod1, Pod2, Pod3]
Pods terminated:
- seed-service: []
- additional-seed-service: [Pod1, Pod2, Pod3]
Pods recreated:
- pod1: Ready-to-Start
- pod2: Ready-to-Start
- pod3: Ready-to-Start
- seed-service: []
- additional-seed-service: [Pod1, Pod2, Pod3]
First pod is being started:
- pod1: Starting
- pod2: Ready-to-Start
- pod3: Ready-to-Start
- seed-service: []
- additional-seed-service: [Pod1, Pod2, Pod3]
Additional seed service refreshed:
- pod1: Starting
- pod2: Ready-to-Start
- pod3: Ready-to-Start
- seed-service: []
- additional-seed-service: []

One of the fixes I see here is to exclude endpoints gathered from the DC when refreshing its -additional-seed-service. That way they would be truly additional.

k8ssandra-operator/controllers/k8ssandra/seeds.go:newEndpoints:

	for _, seed := range seeds {
		// When building endpoints for `dc`, exclude pods gathered from the `dc` itself.
		if labels.GetLabel(&seed, cassdcapi.DatacenterLabel) != dc.Name {
			addresses = append(addresses, corev1.EndpointAddress{
				IP: seed.Status.PodIP,
			})
		}
	}

With the fix the sequence becomes such:
3. Pods recreated:
* pod1: Ready-to-Start
* pod2: Ready-to-Start
* pod3: Ready-to-Start
* seed-service: []
* additional-seed-service: []
4. First pod is being started:
* pod1: Starting, Seed
* pod2: Ready-to-Start
* pod3: Ready-to-Start
* seed-service: [Pod1]
* additional-seed-service: []
4. Additional seed service refreshed:
* pod1: Starting, Seed
* pod2: Ready-to-Start
* pod3: Ready-to-Start
* seed-service: [Pod1]
* additional-seed-service: []

Additional changes are required in order to avoid for the clusters to become stuck at start-up without seeds in more general case. One way to do that would be to change findStartingNodes to restore Seed label on the starting pod if no other seeds are discovered at that phase.

jsanda · 2022-06-23T21:52:52Z

As I mentioned in my previous comment, I think there may either be a timing issue or something that fails to trigger reconciliation in k8ssandra-operator sometimes. I say since the error doesn't happen all the time.

I have tried to reproduce several times locally with a kind cluster and have been unable to do so. I deployed a 3 node Cassandra cluster. Every time the additional seeds service endpoints get deleted eventually. The pods get recreated and return to the ready state.

We could change the logic in seeds.go as you described, but I would still like to determine the root cause.

@hvintus how are you reproducing this?

hvintus · 2022-06-30T18:01:03Z

@jsanda I reproduce just by doing delete pods. There is a timing issues indeed. If a pod is started up before the additional seeds endpoint is deleted.

% kubectl get endpoints -w --output-watch-events | grep seed --line-buffered | ts '[%Y-%m-%d %H:%M:%.S]' &
% kubectl get pods -l cassandra.datastax.com/datacenter=dc1 -L cassandra.datastax.com/node-state,cassandra.datastax.com/seed-node -w &| ts '[%Y-%m-%d %H:%M:%.S]' &
% kubectl delete pods -l cassandra.datastax.com/datacenter=dc1
[2022-06-30 20:40:56.736898] demo-dc1-default-sts-0   2/2     Terminating       Started          true
[2022-06-30 20:40:56.799916] demo-dc1-default-sts-1   2/2     Terminating       Started          true
[2022-06-30 20:40:56.874857] demo-dc1-default-sts-2   2/2     Terminating       Started          true
[2022-06-30 20:40:57.221728] MODIFIED   demo-seed-service                    172.17.0.11,172.17.0.2,172.17.0.7
[2022-06-30 20:40:57.423070] MODIFIED   demo-seed-service                    172.17.0.11,172.17.0.2,172.17.0.7
[2022-06-30 20:41:04.055432] demo-dc1-default-sts-2   0/2     Terminating       Started          true
[2022-06-30 20:41:04.094332] demo-dc1-default-sts-2   0/2     Terminating       Started          true
[2022-06-30 20:41:04.134647] MODIFIED   demo-seed-service                    172.17.0.11,172.17.0.2,172.17.0.7
[2022-06-30 20:41:04.142609] demo-dc1-default-sts-2   0/2     Terminating       Started          true
[2022-06-30 20:41:04.274507] MODIFIED   demo-seed-service                    172.17.0.2,172.17.0.7
[2022-06-30 20:41:04.308150] demo-dc1-default-sts-0   0/2     Terminating       Started          true
[2022-06-30 20:41:04.381720] demo-dc1-default-sts-2   0/2     Pending           Ready-to-Start
[2022-06-30 20:41:04.409772] demo-dc1-default-sts-0   0/2     Terminating       Started          true
[2022-06-30 20:41:04.425808] demo-dc1-default-sts-0   0/2     Terminating       Started          true
[2022-06-30 20:41:04.435312] MODIFIED   demo-seed-service                    172.17.0.2
[2022-06-30 20:41:04.509333] demo-dc1-default-sts-2   0/2     Pending           Ready-to-Start
[2022-06-30 20:41:04.572524] demo-dc1-default-sts-0   0/2     Pending           Ready-to-Start
[2022-06-30 20:41:04.646595] demo-dc1-default-sts-0   0/2     Pending           Ready-to-Start
[2022-06-30 20:41:04.760963] demo-dc1-default-sts-1   0/2     Terminating       Started          true
[2022-06-30 20:41:04.788673] MODIFIED   demo-seed-service                    172.17.0.2
[2022-06-30 20:41:04.958007] demo-dc1-default-sts-1   0/2     Terminating       Started          true
[2022-06-30 20:41:04.972290] demo-dc1-default-sts-1   0/2     Terminating       Started          true
[2022-06-30 20:41:05.000227] MODIFIED   demo-seed-service                    172.17.0.2
[2022-06-30 20:41:05.125158] demo-dc1-default-sts-1   0/2     Pending           Ready-to-Start
[2022-06-30 20:41:05.132116] MODIFIED   demo-seed-service                    <none>
[2022-06-30 20:41:05.258016] demo-dc1-default-sts-1   0/2     Pending           Ready-to-Start
[2022-06-30 20:41:06.554753] demo-dc1-default-sts-2   0/2     Init:0/2          Ready-to-Start
[2022-06-30 20:41:07.354699] demo-dc1-default-sts-0   0/2     Init:0/2          Ready-to-Start
[2022-06-30 20:41:08.155195] demo-dc1-default-sts-1   0/2     Init:0/2          Ready-to-Start
[2022-06-30 20:41:08.988027] demo-dc1-default-sts-0   0/2     Init:1/2          Ready-to-Start
[2022-06-30 20:41:09.831673] demo-dc1-default-sts-2   0/2     Init:1/2          Ready-to-Start
[2022-06-30 20:41:10.578270] demo-dc1-default-sts-1   0/2     Init:1/2          Ready-to-Start
[2022-06-30 20:41:11.352809] demo-dc1-default-sts-2   0/2     Init:1/2          Ready-to-Start
[2022-06-30 20:41:12.171996] demo-dc1-default-sts-0   0/2     Init:1/2          Ready-to-Start
[2022-06-30 20:41:16.868549] demo-dc1-default-sts-2   0/2     PodInitializing   Ready-to-Start
[2022-06-30 20:41:16.971890] demo-dc1-default-sts-0   0/2     PodInitializing   Ready-to-Start
[2022-06-30 20:41:17.968239] demo-dc1-default-sts-2   1/2     Running           Ready-to-Start
[2022-06-30 20:41:18.076346] demo-dc1-default-sts-0   1/2     Running           Ready-to-Start
[2022-06-30 20:41:18.860728] demo-dc1-default-sts-1   0/2     PodInitializing   Ready-to-Start
[2022-06-30 20:41:19.252626] demo-dc1-default-sts-1   1/2     Running           Ready-to-Start
[2022-06-30 20:41:29.471696] demo-dc1-default-sts-1   1/2     Running           Starting
[2022-06-30 20:41:29.510180] DELETED    demo-dc1-additional-seed-service     172.17.0.2,172.17.0.7,172.17.0.11

In my recent run additional seeds endpoint is deleted 24 seconds after the seed service got cleared. It is less than a second after the first pod transitioned into Starting state (without a seed label).

sync-by-unito bot changed the title ~~error "Exception encountered during startup: The seed provider lists no seeds" when recreating pods~~ K8SSAND-1563 ⁃ error "Exception encountered during startup: The seed provider lists no seeds" when recreating pods Jun 14, 2022

Tom910 mentioned this issue Jun 14, 2022

K8SSAND-1264 ⁃ Adding or removing Reaper or Stargate to or from a single-node cluster to lose readiness #426

Open

adejanovski added the zh:Assess/Investigate Issues in the ZenHub pipeline 'Assess/Investigate' label Jun 15, 2022

adejanovski mentioned this issue Jul 25, 2022

K8SSAND-1661 ⁃ In a single DC cluster, an upgrade will never complete k8ssandra/k8ssandra#1445

Open

adejanovski added zh:Assess/Investigate Issues in the ZenHub pipeline 'Assess/Investigate' and removed zh:Assess/Investigate Issues in the ZenHub pipeline 'Assess/Investigate' labels Aug 30, 2022

This was referenced Sep 6, 2022

E2E test for switching DSE workloads #668

Merged

K8SSAND-1764 ⁃ Recreating all cassdc pods triggers "The seed provider lists no seeds" error k8ssandra/cass-operator#402

Closed

Fix no seeds error #671

Merged

adejanovski closed this as completed in #671 Sep 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

K8SSAND-1563 ⁃ error "Exception encountered during startup: The seed provider lists no seeds" when recreating pods #567

K8SSAND-1563 ⁃ error "Exception encountered during startup: The seed provider lists no seeds" when recreating pods #567

Tom910 commented Jun 14, 2022 •

edited by sync-by-unito bot

Loading

Tom910 commented Jun 14, 2022

jsanda commented Jun 14, 2022

jsanda commented Jun 14, 2022

Tom910 commented Jun 14, 2022

hvintus commented Jun 21, 2022

jsanda commented Jun 23, 2022

jsanda commented Jun 23, 2022

hvintus commented Jun 23, 2022 •

edited

Loading

jsanda commented Jun 23, 2022

hvintus commented Jun 30, 2022

K8SSAND-1563 ⁃ error "Exception encountered during startup: The seed provider lists no seeds" when recreating pods #567

K8SSAND-1563 ⁃ error "Exception encountered during startup: The seed provider lists no seeds" when recreating pods #567

Comments

Tom910 commented Jun 14, 2022 • edited by sync-by-unito bot Loading

Tom910 commented Jun 14, 2022

jsanda commented Jun 14, 2022

jsanda commented Jun 14, 2022

Tom910 commented Jun 14, 2022

hvintus commented Jun 21, 2022

jsanda commented Jun 23, 2022

jsanda commented Jun 23, 2022

hvintus commented Jun 23, 2022 • edited Loading

jsanda commented Jun 23, 2022

hvintus commented Jun 30, 2022

Tom910 commented Jun 14, 2022 •

edited by sync-by-unito bot

Loading

hvintus commented Jun 23, 2022 •

edited

Loading