Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K8SSAND-1563 ⁃ error "Exception encountered during startup: The seed provider lists no seeds" when recreating pods #567

Closed
Tom910 opened this issue Jun 14, 2022 · 10 comments · Fixed by #671
Labels
zh:Review Issues in the ZenHub pipeline 'Review'

Comments

@Tom910
Copy link

Tom910 commented Jun 14, 2022

What happened?

When I have many cass clusters and run kubectl delete --all pods --namespace k8s-cluster-test after this command 70% of clusters would recover but 30% of clusters would have the error Exception encountered during startup: The seed provider lists no seeds.

cluster logs

WARN  [main] 2022-06-14 11:52:25,312 K8SeedProvider4x.java:58 - Seed provider couldn't lookup host test-clusterr-37-seed-service
WARN  [main] 2022-06-14 11:52:25,338 K8SeedProvider4x.java:58 - Seed provider couldn't lookup host test-clusterr-37-dctestr-37-additional-seed-service
ERROR [main] 2022-06-14 11:52:25,340 CassandraDaemon.java:909 - Exception encountered during startup: The seed provider lists no seeds.

dns check on cass nodes

❯ kubectl exec --stdin --tty -n k8s-cluster-test test-clusterr-37-dctestr-37-default-sts-0 -- /bin/bash

cassandra@test-clusterr-37-dctestr-37-default-sts-0:/$ curl test-clusterr-37-seed-service
curl: (6) Could not resolve host: test-clusterr-37-seed-service

cassandra@test-clusterr-37-dctestr-37-default-sts-0:/$ curl test-clusterr-37-dctestr-37-all-pods-service
curl: (7) Failed to connect to test-clusterr-37-dctestr-37-all-pods-service port 80: Connection refused

At the same time, the cluster is repaired if you delete the pods again

Did you expect to see something different?

I want to see 100% recover clusters

How to reproduce it (as minimally and precisely as possible):

Environment

  • K8ssandra Operator version:

    github.com/k8ssandra/k8ssandra-operator/config/deployments/control-plane/cluster-scope?ref=v1.1.1

    * Kubernetes version information:

    v1.21.5 coredns = 1.6.3

    * Kubernetes cluster kind:
managed
  • Manifests:

I have 40 similar clusters like:

apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
  name: test-clusterr-37
  namespace: k8s-cluster-test
spec:
  cassandra:
    serverVersion: "4.0.1"
    softPodAntiAffinity: true
    resources:
      requests:
        cpu: "500m"
        memory: "1Gi"
      limits:
        cpu: "1900m"
        memory: "2Gi"
    datacenters:
      - metadata:
          name: dctestr-37
        size: 2
        storageConfig:
          cassandraDataVolumeClaimSpec:
            resources:
              requests:
                storage: 2Gi
        config:
          jvmOptions:
            heapSize: 256M

  • K8ssandra Operator Logs:
1.655210188848521e+09   INFO    controller.k8ssandracluster     Reconciling Medusa user secrets {"reconciler group": "k8ssandra.io", "reconciler kind": "K8ssandraCluster", "name": "test-clusterr-37", "namespace": "k8s-cluster-test", "K8ssandraCluster": "k8s-cluster-test/test-clusterr-37"}
1.6552101888485692e+09  INFO    controller.k8ssandracluster     Medusa user secrets successfully reconciled     {"reconciler group": "k8ssandra.io", "reconciler kind": "K8ssandraCluster", "name": "test-clusterr-37", "namespace": "k8s-cluster-test", "K8ssandraCluster": "k8s-cluster-test/test-clusterr-37"}
1.655210188848572e+09   INFO    controller.k8ssandracluster     Reconciling replicated secrets  {"reconciler group": "k8ssandra.io", "reconciler kind": "K8ssandraCluster", "name": "test-clusterr-37", "namespace": "k8s-cluster-test", "K8ssandraCluster": "k8s-cluster-test/test-clusterr-37"}
1.655210188848788e+09   INFO    controller.k8ssandracluster     Medusa reconcile for dctestr-37 on namespace k8s-cluster-test   {"reconciler group": "k8ssandra.io", "reconciler kind": "K8ssandraCluster", "name": "test-clusterr-37", "namespace": "k8s-cluster-test", "K8ssandraCluster": "k8s-cluster-test/test-clusterr-37"}
1.655210188848799e+09   INFO    controller.k8ssandracluster     Medusa is not enabled   {"reconciler group": "k8ssandra.io", "reconciler kind": "K8ssandraCluster", "name": "test-clusterr-37", "namespace": "k8s-cluster-test", "K8ssandraCluster": "k8s-cluster-test/test-clusterr-37"}
1.6552101888513434e+09  INFO    controller.k8ssandracluster     Reconciling seeds       {"reconciler group": "k8ssandra.io", "reconciler kind": "K8ssandraCluster", "name": "test-clusterr-37", "namespace": "k8s-cluster-test", "K8ssandraCluster": "k8s-cluster-test/test-clusterr-37", "CassandraDatacenter": "k8s-cluster-test/dctestr-37", "K8SContext": ""}
1.6552101888514278e+09  INFO    controller.k8ssandracluster     The datacenter is reconciled    {"reconciler group": "k8ssandra.io", "reconciler kind": "K8ssandraCluster", "name": "test-clusterr-37", "namespace": "k8s-cluster-test", "K8ssandraCluster": "k8s-cluster-test/test-clusterr-37", "CassandraDatacenter": "k8s-cluster-test/dctestr-37", "K8SContext": ""}
1.6552101888591702e+09  DEBUG   controller-runtime.webhook.webhooks     received request        {"webhook": "/validate-k8ssandra-io-v1alpha1-k8ssandracluster", "UID": "bd0f9a93-61da-44ff-88fb-3f19a6955a6c", "kind": "k8ssandra.io/v1alpha1, Kind=K8ssandraCluster", "resource": {"group":"k8ssandra.io","version":"v1alpha1","resource":"k8ssandraclusters"}}
1.6552101888596108e+09  INFO    k8ssandracluster-webhook        validate K8ssandraCluster update        {"K8ssandraCluster": "test-clusterr-37"}
1.6552101888596413e+09  DEBUG   controller-runtime.webhook.webhooks     wrote response  {"webhook": "/validate-k8ssandra-io-v1alpha1-k8ssandracluster", "code": 200, "reason": "", "UID": "bd0f9a93-61da-44ff-88fb-3f19a6955a6c", "allowed": true}
1.6552101888640745e+09  INFO    controller.k8ssandracluster     Preparing to update replication for system keyspaces    {"reconciler group": "k8ssandra.io", "reconciler kind": "K8ssandraCluster", "name": "test-clusterr-37", "namespace": "k8s-cluster-test", "K8ssandraCluster": "k8s-cluster-test/test-clusterr-37", "CassandraDatacenter": "k8s-cluster-test/dctestr-37", "K8SContext": "", "replication": {"dctestr-37":2}}
1.6552101888641243e+09  INFO    controller.k8ssandracluster     Ensuring that keyspace system_traces exists in cluster test-clusterr-37...      {"reconciler group": "k8ssandra.io", "reconciler kind": "K8ssandraCluster", "name": "test-clusterr-37", "namespace": "k8s-cluster-test", "K8ssandraCluster": "k8s-cluster-test/test-clusterr-37", "CassandraDatacenter": "k8s-cluster-test/dctestr-37", "K8SContext": ""}
1.6552101888643324e+09  ERROR   controller.k8ssandracluster     Failed to fetch datacenter pods {"reconciler group": "k8ssandra.io", "reconciler kind": "K8ssandraCluster", "name": "test-clusterr-37", "namespace": "k8s-cluster-test", "K8ssandraCluster": "k8s-cluster-test/test-clusterr-37", "CassandraDatacenter": "k8s-cluster-test/dctestr-37", "K8SContext": "", "error": "no pods in READY state found in datacenter dctestr-37"}
github.com/k8ssandra/k8ssandra-operator/pkg/cassandra.(*defaultManagementApiFacade).EnsureKeyspaceReplication
        /workspace/pkg/cassandra/management.go:289
github.com/k8ssandra/k8ssandra-operator/controllers/k8ssandra.(*K8ssandraClusterReconciler).updateReplicationOfSystemKeyspaces
        /workspace/controllers/k8ssandra/schemas.go:183
github.com/k8ssandra/k8ssandra-operator/controllers/k8ssandra.(*K8ssandraClusterReconciler).checkSchemas
        /workspace/controllers/k8ssandra/schemas.go:43
github.com/k8ssandra/k8ssandra-operator/controllers/k8ssandra.(*K8ssandraClusterReconciler).reconcileDatacenters
        /workspace/controllers/k8ssandra/datacenters.go:168
github.com/k8ssandra/k8ssandra-operator/controllers/k8ssandra.(*K8ssandraClusterReconciler).reconcile
        /workspace/controllers/k8ssandra/k8ssandracluster_controller.go:133
github.com/k8ssandra/k8ssandra-operator/controllers/k8ssandra.(*K8ssandraClusterReconciler).Reconcile
        /workspace/controllers/k8ssandra/k8ssandracluster_controller.go:87
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:114
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:311
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:227
1.6552101888643928e+09  ERROR   controller.k8ssandracluster     Failed to update replication    {"reconciler group": "k8ssandra.io", "reconciler kind": "K8ssandraCluster", "name": "test-clusterr-37", "namespace": "k8s-cluster-test", "K8ssandraCluster": "k8s-cluster-test/test-clusterr-37", "CassandraDatacenter": "k8s-cluster-test/dctestr-37", "K8SContext": "", "keyspace": "system_traces", "error": "no pods in READY state found in datacenter dctestr-37"}
github.com/k8ssandra/k8ssandra-operator/controllers/k8ssandra.(*K8ssandraClusterReconciler).checkSchemas
        /workspace/controllers/k8ssandra/schemas.go:43
github.com/k8ssandra/k8ssandra-operator/controllers/k8ssandra.(*K8ssandraClusterReconciler).reconcileDatacenters
        /workspace/controllers/k8ssandra/datacenters.go:168
github.com/k8ssandra/k8ssandra-operator/controllers/k8ssandra.(*K8ssandraClusterReconciler).reconcile
        /workspace/controllers/k8ssandra/k8ssandracluster_controller.go:133
github.com/k8ssandra/k8ssandra-operator/controllers/k8ssandra.(*K8ssandraClusterReconciler).Reconcile
        /workspace/controllers/k8ssandra/k8ssandracluster_controller.go:87
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:114
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:311
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:227
1.6552101888729985e+09  INFO    controller.k8ssandracluster     updated k8ssandracluster status {"reconciler group": "k8ssandra.io", "reconciler kind": "K8ssandraCluster", "name": "test-clusterr-37", "namespace": "k8s-cluster-test", "K8ssandraCluster": "k8s-cluster-test/test-clusterr-37"}
1.6552101888730514e+09  ERROR   controller.k8ssandracluster     Reconciler error        {"reconciler group": "k8ssandra.io", "reconciler kind": "K8ssandraCluster", "name": "test-clusterr-37", "namespace": "k8s-cluster-test", "error": "no pods in READY state found in datacenter dctestr-37"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:266
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2
        /go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:227

Anything else we need to know?:

I found a similar issue - kubernetes/kubernetes#92559 about DNS cache but I don't have oportunity change this parameters right now

┆Issue is synchronized with this Jira Task by Unito
┆friendlyId: K8SSAND-1563
┆priority: Medium

@sync-by-unito sync-by-unito bot changed the title error "Exception encountered during startup: The seed provider lists no seeds" when recreating pods K8SSAND-1563 ⁃ error "Exception encountered during startup: The seed provider lists no seeds" when recreating pods Jun 14, 2022
@Tom910
Copy link
Author

Tom910 commented Jun 14, 2022

Additional information

❯ kubectl get pods test-clusterr-37-dctestr-37-default-sts-0 -n k8s-cluster-test -o yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: "2022-06-14T11:21:21Z"
  generateName: test-clusterr-37-dctestr-37-default-sts-
  labels:
    app.kubernetes.io/instance: cassandra-test-clusterr-37
    app.kubernetes.io/managed-by: cass-operator
    app.kubernetes.io/name: cassandra
    app.kubernetes.io/version: 4.0.1
    cassandra.datastax.com/cluster: test-clusterr-37
    cassandra.datastax.com/datacenter: dctestr-37
    cassandra.datastax.com/node-state: Starting
    cassandra.datastax.com/rack: default
    controller-revision-hash: test-clusterr-37-dctestr-37-default-sts-78698c559b
    statefulset.kubernetes.io/pod-name: test-clusterr-37-dctestr-37-default-sts-0
  name: test-clusterr-37-dctestr-37-default-sts-0
❯ kubectl get svc test-clusterr-37-seed-service -n k8s-cluster-test -o yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    cassandra.datastax.com/resource-hash: n6slxhCbAHOHCzjD4a6VkgBOeGLkAVQ+cLrpmZHxNKI=
  creationTimestamp: "2022-06-10T11:55:32Z"
  labels:
    app.kubernetes.io/instance: cassandra-test-clusterr-37
    app.kubernetes.io/managed-by: cass-operator
    app.kubernetes.io/name: cassandra
    app.kubernetes.io/version: 4.0.1
    cassandra.datastax.com/cluster: test-clusterr-37
  name: test-clusterr-37-seed-service
  namespace: k8s-cluster-test
  ownerReferences:
  - apiVersion: cassandra.datastax.com/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: CassandraDatacenter
    name: dctestr-37
    uid: ed346db6-6df8-47e3-b555-0a55139f92c2
  resourceVersion: "397321"
  uid: 0ca4f1a1-0f22-47e5-a892-42d7304af523
spec:
  clusterIP: None
  clusterIPs:
  - None
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  publishNotReadyAddresses: true
  selector:
    cassandra.datastax.com/cluster: test-clusterr-37
    cassandra.datastax.com/seed-node: "true"
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

as result pod didn't have cassandra.datastax.com/seed-node: "true" and service don't work correctly

@jsanda
Copy link
Contributor

jsanda commented Jun 14, 2022

This doesn't answer the question about what's happening, but I want to ask what is the use case for deleting the pods? Have you considered using the stopped property to shutdown DCs? That will scale the underlying statefulsets down to zero but keep the persistent volumes intact.

@jsanda
Copy link
Contributor

jsanda commented Jun 14, 2022

This should be investigated. At any point for various reasons, Kubernetes can delete and reschedule pods. k8ssandra-operator and cass-operator need to handle that gracefully.

It's also worth noting that cass-operator creates a PodDisruptionBudget for each CassadraDatacenter. It only sets the minAvailable property to a value of the datacenter size minus one.

@Tom910
Copy link
Author

Tom910 commented Jun 14, 2022

This should be investigated. At any point for various reasons, Kubernetes can delete and reschedule pods. k8ssandra-operator and cass-operator need to handle that gracefully.

In my case it was cluster for test and I used preemptible workers which are automatically deleted

@adejanovski adejanovski added the zh:Assess/Investigate Issues in the ZenHub pipeline 'Assess/Investigate' label Jun 15, 2022
@hvintus
Copy link

hvintus commented Jun 21, 2022

The problem seems to be that startOneNodePerRack starts the first node without seed label (since it assumes there still are other seeds), but by the time Cassandra daemon is started, the seed services have no endpoints anymore. cass-operator doesn't handle that case in further reconciliation attempts stopping at findStartingNodes.

The reason there still are IPs in the seed endpoints when all the pods have been deleted is the k8ssanda-operator behaviour. It gathers all the pods with seed labels from all the clusters and posts them in the -addional-seed-service. In case of the single cluster deployment it effectively copies IPs from the -seed-service to the -additional-seed-service, just with a delay caused by its reconciliation loop. Since cass-operator considers both services when deciding to assign the seed label to the starting node, that pod ends up without the label.

@jsanda
Copy link
Contributor

jsanda commented Jun 23, 2022

The problem seems to be that startOneNodePerRack starts the first node without seed label (since it assumes there still are other seeds)

This is incorrect. Here is the snippet from the method:

			if labelSeedBeforeStart {
					patch := client.MergeFrom(pod.DeepCopy())
					pod.Labels[api.SeedNodeLabel] = "true"
					if err := rc.Client.Patch(rc.Ctx, pod, patch); err != nil {
						return "", err
					}

					rc.Recorder.Eventf(rc.Datacenter, corev1.EventTypeNormal, events.LabeledPodAsSeed,
						"Labeled pod a seed node %s", pod.Name)

					// sleeping five seconds for DNS paranoia
					time.Sleep(5 * time.Second)
				}

You are correct that for a single-DC cluster the seed-service and additional-seeds-service will the same endpoints. When the pods get deleted, those endpoints will get removed from the seed-service. The next time k8ssandra-operators performs reconciliation after the endpoints are removed from the seed-service it will delete the endpoints for the additional-seeds-service. At this point cass-operator should proceed with starting a node in each rack.

There is obviously some sort of timing issue since @Tom910 mentioned this didn't happen all the time and deleting the pods a second time got things back to a healthy state.

For what it's worth, I've had trouble with preemptible worker nodes and cass-operator in the past and avoided them because of it.

This is going to require some more investigation.

@jsanda
Copy link
Contributor

jsanda commented Jun 23, 2022

Please add your planning poker estimate with ZenHub @burmanm

@hvintus
Copy link

hvintus commented Jun 23, 2022

@jsanda You are right that startOneNodePerRack could label the pod as seed, but the problem is that it won't in that particular case:

	externalSeedPoints := 0
	if existingEndpoints, err := rc.GetAdditionalSeedEndpoint(); err == nil {
		externalSeedPoints = len(existingEndpoints.Subsets[0].Addresses)
	}

	labelSeedBeforeStart := readySeeds == 0 && len(rc.Datacenter.Spec.AdditionalSeeds) == 0 && externalSeedPoints == 0

externalSeedPoints is not 0 since there still are IP addresses in the -addional-seed-service endpoints. The sequence of events I observe when I delete all the worker pods:

  1. Before termination:
    • pod1: Started, Seed
    • pod2: Started, Seed
    • pod3: Started, Seed
    • seed-service: [Pod1, Pod2, Pod3]
    • additional-seed-service: [Pod1, Pod2, Pod3]
  2. Pods terminated:
    • seed-service: []
    • additional-seed-service: [Pod1, Pod2, Pod3]
  3. Pods recreated:
    • pod1: Ready-to-Start
    • pod2: Ready-to-Start
    • pod3: Ready-to-Start
    • seed-service: []
    • additional-seed-service: [Pod1, Pod2, Pod3]
  4. First pod is being started:
    • pod1: Starting
    • pod2: Ready-to-Start
    • pod3: Ready-to-Start
    • seed-service: []
    • additional-seed-service: [Pod1, Pod2, Pod3]
  5. Additional seed service refreshed:
    • pod1: Starting
    • pod2: Ready-to-Start
    • pod3: Ready-to-Start
    • seed-service: []
    • additional-seed-service: []

One of the fixes I see here is to exclude endpoints gathered from the DC when refreshing its -additional-seed-service. That way they would be truly additional.

k8ssandra-operator/controllers/k8ssandra/seeds.go:newEndpoints:

	for _, seed := range seeds {
		// When building endpoints for `dc`, exclude pods gathered from the `dc` itself.
		if labels.GetLabel(&seed, cassdcapi.DatacenterLabel) != dc.Name {
			addresses = append(addresses, corev1.EndpointAddress{
				IP: seed.Status.PodIP,
			})
		}
	}

With the fix the sequence becomes such:
3. Pods recreated:
* pod1: Ready-to-Start
* pod2: Ready-to-Start
* pod3: Ready-to-Start
* seed-service: []
* additional-seed-service: []
4. First pod is being started:
* pod1: Starting, Seed
* pod2: Ready-to-Start
* pod3: Ready-to-Start
* seed-service: [Pod1]
* additional-seed-service: []
4. Additional seed service refreshed:
* pod1: Starting, Seed
* pod2: Ready-to-Start
* pod3: Ready-to-Start
* seed-service: [Pod1]
* additional-seed-service: []

Additional changes are required in order to avoid for the clusters to become stuck at start-up without seeds in more general case. One way to do that would be to change findStartingNodes to restore Seed label on the starting pod if no other seeds are discovered at that phase.

@jsanda
Copy link
Contributor

jsanda commented Jun 23, 2022

As I mentioned in my previous comment, I think there may either be a timing issue or something that fails to trigger reconciliation in k8ssandra-operator sometimes. I say since the error doesn't happen all the time.

I have tried to reproduce several times locally with a kind cluster and have been unable to do so. I deployed a 3 node Cassandra cluster. Every time the additional seeds service endpoints get deleted eventually. The pods get recreated and return to the ready state.

We could change the logic in seeds.go as you described, but I would still like to determine the root cause.

@hvintus how are you reproducing this?

@hvintus
Copy link

hvintus commented Jun 30, 2022

@jsanda I reproduce just by doing delete pods. There is a timing issues indeed. If a pod is started up before the additional seeds endpoint is deleted.

% kubectl get endpoints -w --output-watch-events | grep seed --line-buffered | ts '[%Y-%m-%d %H:%M:%.S]' &
% kubectl get pods -l cassandra.datastax.com/datacenter=dc1 -L cassandra.datastax.com/node-state,cassandra.datastax.com/seed-node -w &| ts '[%Y-%m-%d %H:%M:%.S]' &
% kubectl delete pods -l cassandra.datastax.com/datacenter=dc1
[2022-06-30 20:40:56.736898] demo-dc1-default-sts-0   2/2     Terminating       Started          true
[2022-06-30 20:40:56.799916] demo-dc1-default-sts-1   2/2     Terminating       Started          true
[2022-06-30 20:40:56.874857] demo-dc1-default-sts-2   2/2     Terminating       Started          true
[2022-06-30 20:40:57.221728] MODIFIED   demo-seed-service                    172.17.0.11,172.17.0.2,172.17.0.7
[2022-06-30 20:40:57.423070] MODIFIED   demo-seed-service                    172.17.0.11,172.17.0.2,172.17.0.7
[2022-06-30 20:41:04.055432] demo-dc1-default-sts-2   0/2     Terminating       Started          true
[2022-06-30 20:41:04.094332] demo-dc1-default-sts-2   0/2     Terminating       Started          true
[2022-06-30 20:41:04.134647] MODIFIED   demo-seed-service                    172.17.0.11,172.17.0.2,172.17.0.7
[2022-06-30 20:41:04.142609] demo-dc1-default-sts-2   0/2     Terminating       Started          true
[2022-06-30 20:41:04.274507] MODIFIED   demo-seed-service                    172.17.0.2,172.17.0.7
[2022-06-30 20:41:04.308150] demo-dc1-default-sts-0   0/2     Terminating       Started          true
[2022-06-30 20:41:04.381720] demo-dc1-default-sts-2   0/2     Pending           Ready-to-Start
[2022-06-30 20:41:04.409772] demo-dc1-default-sts-0   0/2     Terminating       Started          true
[2022-06-30 20:41:04.425808] demo-dc1-default-sts-0   0/2     Terminating       Started          true
[2022-06-30 20:41:04.435312] MODIFIED   demo-seed-service                    172.17.0.2
[2022-06-30 20:41:04.509333] demo-dc1-default-sts-2   0/2     Pending           Ready-to-Start
[2022-06-30 20:41:04.572524] demo-dc1-default-sts-0   0/2     Pending           Ready-to-Start
[2022-06-30 20:41:04.646595] demo-dc1-default-sts-0   0/2     Pending           Ready-to-Start
[2022-06-30 20:41:04.760963] demo-dc1-default-sts-1   0/2     Terminating       Started          true
[2022-06-30 20:41:04.788673] MODIFIED   demo-seed-service                    172.17.0.2
[2022-06-30 20:41:04.958007] demo-dc1-default-sts-1   0/2     Terminating       Started          true
[2022-06-30 20:41:04.972290] demo-dc1-default-sts-1   0/2     Terminating       Started          true
[2022-06-30 20:41:05.000227] MODIFIED   demo-seed-service                    172.17.0.2
[2022-06-30 20:41:05.125158] demo-dc1-default-sts-1   0/2     Pending           Ready-to-Start
[2022-06-30 20:41:05.132116] MODIFIED   demo-seed-service                    <none>
[2022-06-30 20:41:05.258016] demo-dc1-default-sts-1   0/2     Pending           Ready-to-Start
[2022-06-30 20:41:06.554753] demo-dc1-default-sts-2   0/2     Init:0/2          Ready-to-Start
[2022-06-30 20:41:07.354699] demo-dc1-default-sts-0   0/2     Init:0/2          Ready-to-Start
[2022-06-30 20:41:08.155195] demo-dc1-default-sts-1   0/2     Init:0/2          Ready-to-Start
[2022-06-30 20:41:08.988027] demo-dc1-default-sts-0   0/2     Init:1/2          Ready-to-Start
[2022-06-30 20:41:09.831673] demo-dc1-default-sts-2   0/2     Init:1/2          Ready-to-Start
[2022-06-30 20:41:10.578270] demo-dc1-default-sts-1   0/2     Init:1/2          Ready-to-Start
[2022-06-30 20:41:11.352809] demo-dc1-default-sts-2   0/2     Init:1/2          Ready-to-Start
[2022-06-30 20:41:12.171996] demo-dc1-default-sts-0   0/2     Init:1/2          Ready-to-Start
[2022-06-30 20:41:16.868549] demo-dc1-default-sts-2   0/2     PodInitializing   Ready-to-Start
[2022-06-30 20:41:16.971890] demo-dc1-default-sts-0   0/2     PodInitializing   Ready-to-Start
[2022-06-30 20:41:17.968239] demo-dc1-default-sts-2   1/2     Running           Ready-to-Start
[2022-06-30 20:41:18.076346] demo-dc1-default-sts-0   1/2     Running           Ready-to-Start
[2022-06-30 20:41:18.860728] demo-dc1-default-sts-1   0/2     PodInitializing   Ready-to-Start
[2022-06-30 20:41:19.252626] demo-dc1-default-sts-1   1/2     Running           Ready-to-Start
[2022-06-30 20:41:29.471696] demo-dc1-default-sts-1   1/2     Running           Starting
[2022-06-30 20:41:29.510180] DELETED    demo-dc1-additional-seed-service     172.17.0.2,172.17.0.7,172.17.0.11

In my recent run additional seeds endpoint is deleted 24 seconds after the seed service got cleared. It is less than a second after the first pod transitioned into Starting state (without a seed label).

@adejanovski adejanovski added zh:Assess/Investigate Issues in the ZenHub pipeline 'Assess/Investigate' and removed zh:Assess/Investigate Issues in the ZenHub pipeline 'Assess/Investigate' labels Aug 30, 2022
@adejanovski adejanovski added zh:In-Progress Issues in the ZenHub pipeline 'In-Progress' zh:Ready-For-Review Issues in the ZenHub pipeline 'Ready-For-Review' zh:Review Issues in the ZenHub pipeline 'Review' and removed zh:Assess/Investigate Issues in the ZenHub pipeline 'Assess/Investigate' zh:In-Progress Issues in the ZenHub pipeline 'In-Progress' zh:Ready-For-Review Issues in the ZenHub pipeline 'Ready-For-Review' labels Sep 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
zh:Review Issues in the ZenHub pipeline 'Review'
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants