Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: coordinator ha #842

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

Conversation

jvidalg
Copy link

@jvidalg jvidalg commented Aug 18, 2024

Enable High Availability (HA) in Kubernetes for Coordinator

This PR introduces changes to enable High Availability (HA) in Kubernetes by leveraging StatefulSets and headless services. The key goal is to allow identification of individual replicas and ensure each replica has its own service, which can be achieved by replacing the existing Deployment with a StatefulSet.

Key Changes:

  • Convert Deployment to StatefulSet: The Kubernetes Deployment for the coordinator has been converted to a StatefulSet to support stable network identities.
  • Dynamic ConfigMap Generation: ConfigMaps are dynamically generated to assign unique node_ids to each replica. The first replica (coordinator-0) is designated as the seed_node.
  • StatefulSet ConfigMap Mounting: The StatefulSet dynamically mounts the corresponding ConfigMap for each replica, ensuring that each pod has its own configuration.
  • Headless Service: The Service is now configured as a headless service (clusterIP: None), allowing direct communication with individual pods.
  • Service Selector: The Service selector has been updated to match the app: label, ensuring proper routing to the correct pods.

Local Testing

The changes were tested on a local Kubernetes cluster with the following results:

  • Coordinator StatefulSet: The StatefulSet for the coordinator successfully deployed multiple replicas. The logs confirm that the coordinator pods are running and coordinating as expected.
➜  TENSORLAKE_WORKSPACE kubectl get pods -n indexify -l app=coordinator
NAME            READY   STATUS    RESTARTS   AGE
coordinator-0   1/1     Running   0          18m
coordinator-1   1/1     Running   0          18m
  • API Deployment: The API deployment was also confirmed to be running successfully.
➜  TENSORLAKE_WORKSPACE kubectl get pods -n indexify -l app.kubernetes.io/component=api
NAME                   READY   STATUS    RESTARTS   AGE
api-6f7d65c588-8q4dd   1/1     Running   0          17m
  • Coordinator Logs: The logs from the coordinator StatefulSet show that the coordinator service is running and the leader has been established.
➜  TENSORLAKE_WORKSPACE kubectl logs statefulset/coordinator -n indexify
Found 2 pods, using pod/coordinator-1
Running with tracing filter info
2024-08-18T22:08:45.246398Z  INFO indexify::cmd::coordinator: starting indexify coordinator, version: git branch: main - sha:2608f783cc6f10e9d0e725bd4e0289781252b4f1
...
2024-08-18T22:08:45.401516Z  INFO indexify::coordinator_service: leader change detected: true
root@coordinator-0:/indexify# grep seed /etc/coordinator/config.yaml
seed_node: coordinator-0.coordinator.indexify.svc.cluster.local:8970
root@coordinator-1:/indexify# grep seed /etc/coordinator/config.yaml
seed_node: coordinator-0.coordinator.indexify.svc.cluster.local:8970
  • API Logs: The logs from the api deployment indicate that the API server is running and successfully connecting to the coordinator service.
➜  TENSORLAKE_WORKSPACE kubectl logs deployment/api -n indexify 
Running with tracing filter info
2024-08-18T22:10:20.449254Z  INFO indexify::cmd::server: starting indexify server, version: git branch: main - sha:2608f783cc6f10e9d0e725bd4e0289781252b4f1
...
2024-08-18T22:10:20.453345Z  INFO indexify::coordinator_client: connecting without TLS to coordinator service, address: coordinator:8950

NOTE: This was deployed with a new helm release, I removed the one that was using deployment and static configmpa.

Copy link
Contributor

@grampelberg grampelberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can keep using a normal service + deployment for this. The coordinator only cares that it can contact some running coordinator. If we change the seed node to the coordinator service instead of localhost and add some liveness/readiness/startup probes, it should be good enough.

The biggest open question for me is how bootstrap will work - what happens for the first seed node to start?

@@ -30,9 +30,13 @@ api:

coordinator:
enabled: true

name: coordinator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would very strongly discourage this. It is a safe assumption that software like this is being added to its own namespace and threading the name through everything results in crazy amount of complexity. If you really want to, make sure to create a helper function to handle the name.

config.yaml: |-
coordinator_addr: {{ $statefulsetName }}-0.{{ $statefulsetName }}.{{ $.Release.Namespace }}.svc.cluster.local:8950
raft_port: 8970
seed_node: {{ $statefulsetName }}-0.{{ $statefulsetName }}.{{ $.Release.Namespace }}.svc.cluster.local:8970
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We called this HA, but won't it fail in any scenario where 0 isn't running?

coordinator_addr: {{ $statefulsetName }}-0.{{ $statefulsetName }}.{{ $.Release.Namespace }}.svc.cluster.local:8950
raft_port: 8970
seed_node: {{ $statefulsetName }}-0.{{ $statefulsetName }}.{{ $.Release.Namespace }}.svc.cluster.local:8970
node_id: {{ $i }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about adding --node-id to the args for serve? That way, we can have a single config map that gets updated/reloaded and doesn't require any generation like this.

@jvidalg
Copy link
Author

jvidalg commented Aug 19, 2024

I think you can keep using a normal service + deployment for this. The coordinator only cares that it can contact some running coordinator. If we change the seed node to the coordinator service instead of localhost and add some liveness/readiness/startup probes, it should be good enough.

The biggest open question for me is how bootstrap will work - what happens for the first seed node to start?

Yes, I am planning to modify the chart to include readiness and liveness, among other several customizations that we are already using in our clusters. Readiness and Liveness are good for some sort of "HA" because it would replace coordinator if not healthy or present, but it still is not HA, that is why we thought of this alternative, although I agree probes should be fine for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants