Skip to content
This repository has been archived by the owner on Jan 24, 2024. It is now read-only.

SEP: Master cluster #72

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 80 additions & 0 deletions 0000-master-cluster.md

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you go into more detail on the future of current HA methods with this in place. as well as the future of syndic? also any potential pitfalls to look at with things such as network latency. what kind of throughput will this require? what about split brain handling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This work is not deprecating any of the current HA functionality nor is it deprecating Syndic.

The network will need to be a reliable network and this is called out in the docs. If there is a split brain problem, the network is not reliable.

Copy link

@OrangeDog OrangeDog Sep 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By that definition, no network is reliable. That's why we need HA solutions in the first place.
We at least need to know which way it's going to fail during a network partition and not do something unsafe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as consistency and reliability. There is a huge difference between local networks and WAN networks. With this design, if a master goes offline for some reason. There is no failure. Any minion connections will be routed to a different master by the load balancer. The other masters will still try and forward events and you will see timeouts in the logs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it isn't just about consistency and reliability. if the communication between masters CAN be broken and them not show as offline, it will happen. it needs to be documentation at the very least of what it looks like when it happens, I honestly don't think it will break much, as we don't do total bidirectional control. but it needs to be documented.

I can see this happening with the kind of engineer that loves segregating network traffic to separate lans. one network for minion communication, one network for storage, one network for master communication. then all of a sudden the network admin has a spanning tree go haywire in the master communication network. both masters will appear up to the minion and storage still works.

Copy link
Contributor Author

@dwoz dwoz Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both masters would not appear up to a minion because minions connect to the load balancer. I have not been able to break anything by taking masters offline. If you'd like to take the work for a spin and try and cause breakage please feel free.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both masters would appear up to the load balancer too. The only connection that is broken in this scenario is master-master.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the scenario described here you're salt cli would fail to receive events because they are not being forwarded from the disconnected master. There will be errors in the logs on the disconnected master that it's not able to forward it's events to the other master. The job would still finish correctly and the job cache would contain the correct results of the job.

Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
- Feature Name: Master Cluster
- Start Date: 2023-08-09
- SEP Status: Draft
- SEP PR: (leave this empty)
- Salt Issue: (leave this empty)

# Summary
[summary]: #summary

Add the ability to create a cluster of Masters that run behind a load balancer.

# Motivation
[motivation]: #motivation

The current [high availability features](https://docs.saltproject.io/en/latest/topics/highavailability/index.html) in the Salt ecosystem allow minions to have back up masters. There are two flavors of Multi Master which Minions can be configured.
dwoz marked this conversation as resolved.
Show resolved Hide resolved

Minions can connect to [multiple masters simultaneously](https://docs.saltproject.io/en/latest/topics/tutorials/multimaster.html).

<img src='/diagrams/000-multi-master.png' width='400px'>

Minions can also be configure to one master at a time [using fail over](https://docs.saltproject.io/en/latest/topics/tutorials/multimaster_pki.html#multiple-masters-for-a-minion).
dwoz marked this conversation as resolved.
Show resolved Hide resolved

<img src='/diagrams/000-multi-master-failover.png' width='400px'>

This results in jobs targeting lots of minions being pinned to a single master. Another drawback to the current HA implementation is that minions need to be re-configured to add or remove masters.


<img src='/diagrams/000-mm-large-job.png' width='400px'>

It would be much more ideal if jobs could scale across multiple masters.


<img src='/diagrams/000-mc-large-job.png' width='400px'>

# Design
[design]: #detailed-design

In order to accomplish this we will need to change the way jobs execute.
dwoz marked this conversation as resolved.
Show resolved Hide resolved
Currently new jobs get send directly to the publish server from the request
dwoz marked this conversation as resolved.
Show resolved Hide resolved
server.

<img src='/diagrams/000-current-job-pub.png' width='400px'>

If we forward IPC Events between Masters we can get the return flow to be shared, as shown below.
dwoz marked this conversation as resolved.
Show resolved Hide resolved


<img src='/diagrams/000-cluster-job-pub.png' width='400px'>

To get job publishes to work we need to make sure publishes also travel over the IPC Event bus.
dwoz marked this conversation as resolved.
Show resolved Hide resolved


<img src='/diagrams/000-cluster-fwd.png' width='400px'>

Jobs can come and go through all the masters in our master pool. From a minion's perspective, all of the masters in our pool are completely the same. We can remove the need of minions to know about multiple masters by putting our pool behind a load balancer. Minions will not need to be re-configured to add master resources.


<img src='/diagrams/000-cluster-arch.png' width='400px'>

> [!IMPORTANT]
> THe current work for this SEP can be found [here](https://github.com/saltstack/salt/pull/64936)
dwoz marked this conversation as resolved.
Show resolved Hide resolved


## Alternatives
[alternatives]: #alternatives

We currently have two alternatives to achieve "high availablity". This is a
third, more robust approach that alleviates the issues with the current options.


## Unresolved questions
[unresolved]: #unresolved-questions

None as of this time.

# Drawbacks
[drawbacks]: #drawbacks

The biggest drawback is the fact that we'll need to maintain the three ways of
dwoz marked this conversation as resolved.
Show resolved Hide resolved
doing HA. This adds complexity however, if successfull. We can potential
dwoz marked this conversation as resolved.
Show resolved Hide resolved
depericate some of or all of the exiting HA functionality.
dwoz marked this conversation as resolved.
Show resolved Hide resolved
Binary file added diagrams/000-cluster-arch.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added diagrams/000-cluster-fwd.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added diagrams/000-cluster-job-pub.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added diagrams/000-current-job-pub.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added diagrams/000-mc-large-job.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added diagrams/000-mm-large-job.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added diagrams/000-multi-master-failover.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added diagrams/000-multi-master.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.