orchestrator-raft: are IP addresses really required? #253

bbeaudreault · 2017-08-07T17:17:01Z

The docs mention that RaftBind and RaftNodes should be IP addresses. Is this really true, and if so is there a reason?

For those of us running in kubernetes, this is hard. Ideally we can use the stable hostname of the pods in the replica set. It's possible to set up 3-5 distinct deployments each with their own service with a stable IP address, then use that. But that's a bit more work and more cumbersome long term than to just use the built in primitives.

shlomi-noach · 2017-08-07T18:28:21Z

@bbeaudreault the hashicorp raft library, which I use, explicitly resolved IP addresses of whatever you provide, and then compares anything to that. I've done two rounds trying to go around this (I don't like it either) but insofar without success.

bbeaudreault · 2017-08-07T18:42:18Z

Ok thanks for the info. If you want, we could close this issue and I can follow the next steps issue instead. Or we can keep this open. Up to you. Will follow the other one either way. Thanks!

shlomi-noach · 2017-08-07T18:50:29Z

Let's keep this one open.

shlomi-noach · 2017-08-08T06:28:29Z

@bbeaudreault if instead of arguing with the hashicorp/raft library, I let you specify FQDN and resolve that to an IP upon startup (and from there on only use those IP addresses) -- does that solve your problem?

bbeaudreault · 2017-08-08T10:48:47Z

I appreciate the effort, but unfortunately does not solve the problem. In kubernetes the pods can move around which cause the IPs to change. However, I say let's keep the change for now but not resolve -- it does make it a little easier to be able to use FQDN at startup rather than hard coding the IPs.

shlomi-noach · 2017-08-08T11:37:46Z

That's unfortunate. It would require some rewrite of the hashicorp/raft module. Would you like to put a question on their repo? https://github.com/hashicorp/raft/issues

shlomi-noach · 2017-10-26T10:54:27Z

If I understand correctly, support for this has recently been added in 1.0.0:

I have a similar interest in making this work. Will look into the recent hashicorp/raft changes.

hmcgonig · 2017-11-07T19:26:21Z

The above would be fantastic for kubernetes deployments. Currently, the workaround for this problem is assigning each node a service, then using the static assigned cluster IPs which don't change.

shlomi-noach · 2017-11-07T19:43:45Z

Currently, the workaround for this problem is assigning each node a service, then using the static assigned cluster IPs which don't change.

This is actually the solution I'm looking at right now and which in fact seems to satisfy my requirements: I'm looking at having a single orchestrator node per-DC (as we do today), which for us means a single orchestrator node in a Kubernetes cluster. Putting it behind a service with LoadBalancer makes a lot of sense.

This is still WIP.

The Consul 1.0.0 also introduces major changes to the initial raft setup, which I'm unhappy with, so I will try to push that till a later stage.

hkotka · 2020-04-16T12:32:37Z

I ran into this issue with Docker Swarm setup and there is unfortunately no way to set static ip-addresses to Swarm services, so I have to re-think the architecture a bit. It would be great if Orchestrator could periodically re-resolve RaftNodes FQDN:s if possible :)

sbrattla · 2020-06-04T07:33:23Z

@bbeaudreault and @hmcgonig would you mind providing a little more details on how you went about with orchestrator on Kubernetes. My understanding is that you deploy, say, 3 services where each services represents a single instance (peer) or orchestrator. Each orchestrator instance is then reachable via it's service hostname. Is this understanding correct?

I'm using Docker Swarm; but the concepts should be quite similar to Kubernetes. However, I cannot use the IP associated with a service in RaftBind as orchestrator then logs an error which says it cannot bind to that address (which makes sense, since it's essentially a load balancer IP). A service replica should not be able to bind to the service IP. Subsequently, since the IP in RaftBind must also appear in RaftNodes, I hit the wall.

shlomi-noach · 2020-07-05T06:48:14Z

I tried giving this another go; with the time frame I had, I wasn't able to adapt to the new hashicorp/raft library; there's been a lot of architectural changes and adapting orchestrator to these changes is non trivial.

shlomi-noach · 2020-07-08T06:53:17Z

See #1208 for potential alternate approach

sbrattla · 2020-08-05T06:19:30Z

Thanks for giving it a try @shlomi-noach! I may give #1208 a try, or alternatively just set up three small servers to run the software in a "classic" setup.

jianhaiqing · 2020-08-08T03:35:45Z

The above would be fantastic for kubernetes deployments. Currently, the workaround for this problem is assigning each node a service, then using the static assigned cluster IPs which don't change.

"RaftEnabled": true,
"RaftDataDir": "/var/lib/orchestrator",
 "RaftBind": "orch-node1-0:10008",
 "RaftAdvertise": "orch-node1.raft:10008",
 "DefaultRaftPort": 10008,
 "RaftNodes": [
      "orch-node1.raft:10008",
      "orch-node2.raft:10008",
      "orch-node3.raft:10008",
      "orch-node4.raft:10008",
      "orch-node5.raft:10008"
 ]

redstonemercury · 2020-09-04T16:54:05Z

I'm having issues getting 3 node raft orchestrator deployed on k8s and it seems to be related to this, since once a pod goes down in the statefulSet, the remaining pods are resolving the hostname to the old IP and only re-resolve once they come up new (at which point they have a different IP and trigger the other pods to need a stop/start). Feels like the underlying raft protocol should re-resolve the underlying hostnames, but I also understand why this change isn't easy.

If the workaround is to deploy 3 separate services with static IPs, how are y'all getting this working to ensure you don't have multiple pods go down at the same time? Or are you just taking the risk that all the pods could move around simultaneously creating orchestrator downtime? Or am I missing something about the implementation for 3 separate services within the same deployment?

derekperkins · 2020-09-04T17:44:15Z

Since you're in k8s, you would just use a pod disruption budget to keep multiple pods from going down simultaneously.

sudo-ankitmishra · 2020-10-02T05:36:34Z

I am facing issues while deploying 3 node raft setup on k8s, when I checked the logs for one of the orchestrator node it says that "failed to open raft store, unknown port." I have 3 separate deployment for orchestrator and I am using the IP of orchestrator service in my config file:
sample raft config:
"RaftEnabled": true,
"RaftDataDir": "/var/lib/orchestrator",
"DefaultRaftPort": 10008,
"RaftBind": "http://192.168.50.175:10008",
"RaftNodes": ["http://192.168.50.175:10008",
"http://192.168.50.176:10008",
"http://192.168.50.177:10008"]

Any help will be appriciated.

shlomi-noach · 2020-10-02T05:43:21Z

@sudo-ankitmishra remove the "http://.
In the future, please make sure to open a new issue, so we can better track questions. Thank you!

sudo-ankitmishra · 2020-10-02T15:27:34Z

Hey @shlomi-noach thanks for reply, But I am still not able to get the orchestrator up and running, this is the error I am receiving right now,
2020-10-02 15:18:06 DEBUG Connected to orchestrator backend: orchestrator:?@tcp(10.32.0.88:3306)/orchestrator?timeout=2s 2020-10-02 15:18:06 DEBUG Orchestrator pool SetMaxOpenConns: 128 2020-10-02 15:18:06 DEBUG Initializing orchestrator 2020-10-02 15:18:06 INFO Connecting to backend 10.32.0.88:3306: maxConnections: 128, maxIdleConns: 32 2020-10-02 15:18:06 INFO Starting Discovery 2020-10-02 15:18:06 INFO Registering endpoints 2020-10-02 15:18:06 INFO continuous discovery: setting up 2020-10-02 15:18:06 DEBUG Setting up raft 2020-10-02 15:18:06 DEBUG raft: advertise=192.168.50.177:10008 2020-10-02 15:18:06 ERROR failed to open raft store: listen tcp 192.168.50.177:10008: bind: cannot assign requested address 2020-10-02 15:18:06 FATAL 2020-10-02 15:18:06 ERROR failed to open raft store: listen tcp 192.168.50.177:10008: bind: cannot assign requested address

shlomi-noach · 2020-10-02T17:06:30Z

In RaftNodes, pplease also remove the :port, keep just IPs. And please, open a new issue for any further discussion, thank you.

redstonemercury · 2020-11-17T18:44:03Z

I just want to follow up here to add that a PodDisruptionBudget does not appear to work around this. (I can start a new issue if requested, but this problem is a direct result of the inability to use a StatefulSet, so thought it made sense to continue this discussion here.)

From the documentation here https://kubernetes.io/docs/concepts/workloads/pods/disruptions/#how-disruption-budgets-work:

"Pods which are deleted or unavailable due to a rolling upgrade to an application do count against the disruption budget, but workload resources (such as Deployment and StatefulSet) are not limited by PDBs when doing rolling upgrades. Instead, the handling of failures during application updates is configured in the spec for the specific workload resource."

So in this case, having 3 separate Deployment resources each managing one pod, you cannot appear to use a PodDisruptionBudget to prevent all 3 pods from going down when you (for example) update a config setting in a configmap used by that pod. The PodDisruptionBudget sees that you're over the disruptions properly, but does not prevent a restart of each independent Deployment's pod simultaneously, since it assumes you're managing that within the Deployment resource.

This is my setup, and testing multiple times, updating a configmap means that all 3 containers (each in their own Deployment) restart at the same time.

Pods example:

Name:         orchestrator-0-xxxxxxx-xxxx
Namespace:    kube-monitoring
Priority:     0
Node:         ip-x-x-x-x.us-west-2.compute.internal/x.x.x.x
Start Time:   Tue, 17 Nov 2020 03:40:41 -0500
Labels:       app.kubernetes.io/app=orchestrator
              app.kubernetes.io/instance=orchestrator
              app.kubernetes.io/name=orchestrator-0
              pod-template-hash=xxxxxxx
...
Name:         orchestrator-1-xxxxxxx-xxxx
Namespace:    kube-monitoring
Priority:     0
Node:         ip-x-x-x-x.us-west-2.compute.internal/x.x.x.x
Start Time:   Fri, 13 Nov 2020 11:15:41 -0500
Labels:       app.kubernetes.io/app=orchestrator
              app.kubernetes.io/instance=orchestrator
              app.kubernetes.io/name=orchestrator-1
              pod-template-hash=-xxxxxxx
...
Name:         orchestrator-2-xxxxxxx-xxxx
Namespace:    kube-monitoring
Priority:     0
Node:         ip-x-x-x-x.us-west-2.compute.internal/x.x.x.x
Start Time:   Mon, 16 Nov 2020 08:59:21 -0500
Labels:       app.kubernetes.io/app=orchestrator
              app.kubernetes.io/instance=orchestrator
              app.kubernetes.io/name=orchestrator-1
              pod-template-hash=-xxxxxxx

PodDisruptionBudget details:

Name:           orchestrator
Namespace:      kube-monitoring
Min available:  2
Selector:       app.kubernetes.io/app=orchestrator
Status:
    Allowed disruptions:  1
    Current:              3
    Desired:              2
    Total:                3
Events:                   <none>

I can see them restart simultaneously when I update a config, creating a downtime:

orchestrator-0-xxxxxx-xxxx                  0/1     ContainerCreating   0          17s
orchestrator-1-xxxxxx-xxxx                  0/1     ContainerCreating   0          16s
orchestrator-2-xxxxxx-xxxx                  0/1     ContainerCreating   0          14s

The PodDisruptionBudget sees this happening, but does not seem to prevent it:

Name:           orchestrator
Namespace:      kube-monitoring
Min available:  2
Selector:       app.kubernetes.io/app=orchestrator
Status:
    Allowed disruptions:  0
    Current:              0
    Desired:              2
    Total:                3
Events:                   <none>

Am I doing something wrong in my setup above? Or am I properly understanding from Kubernetes documentation that a PodDisruptionBudget will not manage restart behavior across Deployments? If the latter, how do folks have a highly available application setup in Kubernetes while using separate Deployment resources?

shlomi-noach self-assigned this Aug 7, 2017

shlomi-noach mentioned this issue Aug 7, 2017

orchestrator-raft: next steps. #246

Open

shlomi-noach mentioned this issue Aug 8, 2017

raft hostnames resolving to IP #254

Merged

bbeaudreault mentioned this issue Aug 8, 2017

IP address requirement in raft config hashicorp/raft#236

Closed

derekperkins mentioned this issue May 22, 2018

helm: Allow non-leader Orchestrator instances to accept requests vitessio/vitess#3665

Merged

shlomi-noach mentioned this issue May 30, 2020

Is Docker Swarm a viable way to achieve HA for orchestrator? #1177

Open

shlomi-noach mentioned this issue Jul 8, 2020

support for api/ raft-add-peer and raft-remove-peer #1208

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

orchestrator-raft: are IP addresses really required? #253

orchestrator-raft: are IP addresses really required? #253

bbeaudreault commented Aug 7, 2017

shlomi-noach commented Aug 7, 2017

bbeaudreault commented Aug 7, 2017

shlomi-noach commented Aug 7, 2017

shlomi-noach commented Aug 8, 2017

bbeaudreault commented Aug 8, 2017

shlomi-noach commented Aug 8, 2017 •

edited

Loading

shlomi-noach commented Oct 26, 2017

hmcgonig commented Nov 7, 2017

shlomi-noach commented Nov 7, 2017

hkotka commented Apr 16, 2020

sbrattla commented Jun 4, 2020 •

edited

Loading

shlomi-noach commented Jul 5, 2020

shlomi-noach commented Jul 8, 2020

sbrattla commented Aug 5, 2020

jianhaiqing commented Aug 8, 2020

redstonemercury commented Sep 4, 2020

derekperkins commented Sep 4, 2020

sudo-ankitmishra commented Oct 2, 2020

shlomi-noach commented Oct 2, 2020

sudo-ankitmishra commented Oct 2, 2020

shlomi-noach commented Oct 2, 2020

redstonemercury commented Nov 17, 2020 •

edited

Loading

orchestrator-raft: are IP addresses really required? #253

orchestrator-raft: are IP addresses really required? #253

Comments

bbeaudreault commented Aug 7, 2017

shlomi-noach commented Aug 7, 2017

bbeaudreault commented Aug 7, 2017

shlomi-noach commented Aug 7, 2017

shlomi-noach commented Aug 8, 2017

bbeaudreault commented Aug 8, 2017

shlomi-noach commented Aug 8, 2017 • edited Loading

shlomi-noach commented Oct 26, 2017

hmcgonig commented Nov 7, 2017

shlomi-noach commented Nov 7, 2017

hkotka commented Apr 16, 2020

sbrattla commented Jun 4, 2020 • edited Loading

shlomi-noach commented Jul 5, 2020

shlomi-noach commented Jul 8, 2020

sbrattla commented Aug 5, 2020

jianhaiqing commented Aug 8, 2020

redstonemercury commented Sep 4, 2020

derekperkins commented Sep 4, 2020

sudo-ankitmishra commented Oct 2, 2020

shlomi-noach commented Oct 2, 2020

sudo-ankitmishra commented Oct 2, 2020

shlomi-noach commented Oct 2, 2020

redstonemercury commented Nov 17, 2020 • edited Loading

shlomi-noach commented Aug 8, 2017 •

edited

Loading

sbrattla commented Jun 4, 2020 •

edited

Loading

redstonemercury commented Nov 17, 2020 •

edited

Loading