-
Notifications
You must be signed in to change notification settings - Fork 13
Design
Jan Friesse edited this page Jan 23, 2018
·
2 revisions
- Qdevice = Daemon (not part of corosync process) running on every node. This one connects to votequorum API. Provides heartbeat to votequorum. Itself does nothing, needs plugin.
- Qnetd = QDevice Network Daemon runs only on one node (at the beginning, make it clusterable in future). It is able to provide arbiter function for multiple clusters.
- Heuristic = "Scripts" running on every node. Result used as a tie-breaker
- Votequorum = Main corosync quorum provider and API. QDevice calls Votequorum API
- ONLY ONE PARTITION ACTIVE AT THE TIME. Partition don't need to be optimal as long as there is ONLY ONE active.
- Plugin based model
- Plugin for network based quorum (qdevice-net)
- Plugin for disk based quorum (work not yet started)
- Plugin can exchange info between nodes
- Heuristic is stateless
- Bi-directional data-flow (plugin exchanges information with qdevice and vice-verse).
- It's able to run off-site
- If majority of nodes are online, they get quorum even if Qnetd is down
- Ability to arbiter multiple clusters
- Is clusterable (not yet started)
- Only one active plugin at a time
- Qdevice is not allowed to kill local corosync
Plugin is just API.
- Messages are in TLV format
- Certificate is used for authentication of Qdevice-net
- Heuristics must not have scores (known to not work properly in qdisk).
- We will probably need messages exchange between nodes (in qdevice). This needs to work during sync. This means to implement simple form of message sending in votequorum. Such message passing doesn't fulfill some properties (especially total ordering)
- There must be well defined behavior/use-case/failure path
- Qnetd without heuristics - tiebreaker. Qnetd is used ONLY as tiebreaker (and asked ONLY on tie situation). Such mode doesn't allow downscale to < 1/2 nodes
- Quorum fully driven by Qnetd - Qnetd does it's best to choose largest partition. Not possible in such mode to achieve operation WITHOUT qnetd connection but allows downscale to 1 node. Not all nodes must be able to connect to qnetd as long as at least one in membership is able to talk with qnetd.
- Changes in corosync/votequorum:
- Votequorum isn't able to find out if received hearbeat was generated in "old" membership or "new" one. Change of API to always pass ring_id to API user (qdevice) and from API user back to votequorum.
- In sync phase to block corosync until votequorum doesn't receive ping from qdevice (actually, idea is to have timeout so it will not block forewer)
- To achieve previous point, we need working IPC but this is currently blocked on sync. We will allow IPC on sync for (and only for) votequorum.
- Qnetd daemon
- Support for different algorithms
- Complete protocol implemented
- Daemonize + init scripts
- Command to show current status of qnetd (probably unix socket + binary) - try to display ALL information we know
- Qdevice
- Implemented model "net" (qdevice-net):
- Complete protocol implemented
- Support for different algorithms
- Voting
- Reconnect when connection to qnetd is lost
- Handle sigint + daemonize + init script
- Documentation - man pages (would be nice to have algorithms described (very high level) so user know what to expect)
- Implemented model "net" (qdevice-net):
- Algorithms
- LMS
- FFSplit
- Qnetd daemon
- Documentation (man pages + Wiki design document and howto add new message/option/algorithm)
- Test SSL certificate expiration/renew workflow
- Qdevice-net
- Documentation - howto add new message/option/algorithm)
- "Unit-tests" - fake client (simulating cmap/votequorum)?
- Qdevice no votes = number of nodes - 1. So 5 nodes + Qdevice = 5 + 4 = 9 votes
quorum {
provider: corosync_votequorum
device {
model: net
net {
connect_timeout: 10000
tls: on
host: 127.0.0.1
port: 4433
algorithm: ffsplit
tie_breaker: lowest
}
}
}