Implement basic `HAST` integration and an automated HA failover scenario for it #104

yaroslav-gwit · 2024-01-05T18:36:54Z

HAST is the FreeBSD's alternative to DRBD. This means that we can use it to synchronise the storage state between 2 Hoster nodes, in the primary/secondary fashion.

Unlike DRBD, HAST is an official part of the FreeBSD OS, so it should be more stable in terms of user space and kernel space utilities talking the same language.

Here are some docs for more details:

https://docs.freebsd.org/en/books/handbook/disks/#disks-hast
https://cobug.org/slides/hast/
https://man.freebsd.org/cgi/man.cgi?query=hast.conf&sektion=5&format=html

The initial implementation will not try to wrap around the HAST management itself, it's pretty easy to work with it as is. Instead HAST will be integrated into our HA offering, in order to support synchronous replication.

Our current model is based on the async nature of the ZFS replication, which has a lot of advantages:

Scalability - you can scale such setups nearly indefinitely
Low network overhead - WAN replication is supported and even encouraged (for better data redundancy)
Data locality - VM or Jail data is located on a local ZFS dataset, so you don't have to rely on the network shares being available
Easy and painless disaster recovery
Etc

But it brings it's own set of challenges:

There are gaps in the data due to the ZFS replication being async, so the failed over VM/Jail may not be in a clean state 100% of the time
Financial orgs, and healthcare orgs cannot tolerate gaps in their data due to the failover
Switching the replication direction automatically is not always plausible, because you might destroy some data on the receiving side

Here is where the HAST comes in, because we can really easily "cluster" together some storage in the synchronous replication mode, and failover as needed without losing a single bit of data (apart from what was lost in transit on the primary node).

I'll also have to add some docs on how to set up HAST to work with Hoster HA, how to handle split-brain, etc.

The text was updated successfully, but these errors were encountered:

yaroslav-gwit added the new feature Label to apply to new features development label Jan 5, 2024

yaroslav-gwit self-assigned this Jan 5, 2024

yaroslav-gwit added this to Hoster Jan 7, 2024

yaroslav-gwit moved this to Todo in Hoster Jan 7, 2024

yaroslav-gwit moved this from Todo to In Progress in Hoster Jan 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement basic `HAST` integration and an automated HA failover scenario for it #104

Implement basic `HAST` integration and an automated HA failover scenario for it #104

yaroslav-gwit commented Jan 5, 2024 •

edited

Loading

Implement basic HAST integration and an automated HA failover scenario for it #104

Implement basic HAST integration and an automated HA failover scenario for it #104

Comments

yaroslav-gwit commented Jan 5, 2024 • edited Loading

Implement basic `HAST` integration and an automated HA failover scenario for it #104

Implement basic `HAST` integration and an automated HA failover scenario for it #104

yaroslav-gwit commented Jan 5, 2024 •

edited

Loading