Skip to content
Jan Friesse edited this page Jan 23, 2018 · 2 revisions

Vocabulary

  • Qdevice = Daemon (not part of corosync process) running on every node. This one connects to votequorum API. Provides heartbeat to votequorum. Itself does nothing, needs plugin.
  • Qnetd = QDevice Network Daemon runs only on one node (at the beginning, make it clusterable in future). It is able to provide arbiter function for multiple clusters.
  • Heuristic = "Scripts" running on every node. Result used as a tie-breaker
  • Votequorum = Main corosync quorum provider and API. QDevice calls Votequorum API

Objectives (General)

  • ONLY ONE PARTITION ACTIVE AT THE TIME. Partition don't need to be optimal as long as there is ONLY ONE active.
  • Plugin based model
    • Plugin for network based quorum (qdevice-net)
    • Plugin for disk based quorum (work not yet started)
  • Plugin can exchange info between nodes
  • Heuristic is stateless
  • Bi-directional data-flow (plugin exchanges information with qdevice and vice-verse).

Objectives (Qnetd)

  • It's able to run off-site
  • If majority of nodes are online, they get quorum even if Qnetd is down
  • Ability to arbiter multiple clusters
  • Is clusterable (not yet started)

Design of Qdevice

  • Only one active plugin at a time
  • Qdevice is not allowed to kill local corosync

Design of plugins (general)

Plugin is just API.

Design of Qnetd

  • Messages are in TLV format
  • Certificate is used for authentication of Qdevice-net

Design problems

  • Heuristics must not have scores (known to not work properly in qdisk).
  • We will probably need messages exchange between nodes (in qdevice). This needs to work during sync. This means to implement simple form of message sending in votequorum. Such message passing doesn't fulfill some properties (especially total ordering)

Behavior

  • There must be well defined behavior/use-case/failure path
    1. Qnetd without heuristics - tiebreaker. Qnetd is used ONLY as tiebreaker (and asked ONLY on tie situation). Such mode doesn't allow downscale to < 1/2 nodes
    2. Quorum fully driven by Qnetd - Qnetd does it's best to choose largest partition. Not possible in such mode to achieve operation WITHOUT qnetd connection but allows downscale to 1 node. Not all nodes must be able to connect to qnetd as long as at least one in membership is able to talk with qnetd.

Finished

  • Changes in corosync/votequorum:
    • Votequorum isn't able to find out if received hearbeat was generated in "old" membership or "new" one. Change of API to always pass ring_id to API user (qdevice) and from API user back to votequorum.
    • In sync phase to block corosync until votequorum doesn't receive ping from qdevice (actually, idea is to have timeout so it will not block forewer)
    • To achieve previous point, we need working IPC but this is currently blocked on sync. We will allow IPC on sync for (and only for) votequorum.
  • Qnetd daemon
    • Support for different algorithms
    • Complete protocol implemented
    • Daemonize + init scripts
    • Command to show current status of qnetd (probably unix socket + binary) - try to display ALL information we know
  • Qdevice
    • Implemented model "net" (qdevice-net):
      • Complete protocol implemented
      • Support for different algorithms
      • Voting
      • Reconnect when connection to qnetd is lost
      • Handle sigint + daemonize + init script
      • Documentation - man pages (would be nice to have algorithms described (very high level) so user know what to expect)
  • Algorithms
    • LMS
    • FFSplit

In progress

  • Qnetd daemon
    • Documentation (man pages + Wiki design document and howto add new message/option/algorithm)
    • Test SSL certificate expiration/renew workflow
  • Qdevice-net
    • Documentation - howto add new message/option/algorithm)
    • "Unit-tests" - fake client (simulating cmap/votequorum)?

Calculations

  • Qdevice no votes = number of nodes - 1. So 5 nodes + Qdevice = 5 + 4 = 9 votes

Example configuration

quorum {
  provider: corosync_votequorum
  device {
    model: net

    net {
      connect_timeout: 10000
      tls: on
      host: 127.0.0.1
      port: 4433
      algorithm: ffsplit
      tie_breaker: lowest
    }
  }
}

Protocol

Protocol used by QNetd and QDevice-net