This repository is archived. Please go to https://github.com/Azure/hpcpack-high-availability.
Algorithms for doing leader election and name resolving with the help of another HA system, serves as anticorruption layer.
I
: intervalfor heartbeat (e.g. 1 sec)T
: heartbeat timeout (e.g. 5 secs)T > 2 * I
Heartbeat Table
: A table in the external HA system contains heartbeat entry.Heartbeat Entry
: in the format{uuid, utype, timestamp}
ha_time
: current date time of the external HA system- All time is in UTC time
-
UpdateHeartBeat(uuid, utype)
:For each type, update entry
{old_uuid, utype, old_timestamp}
in heartbeat table with{uuid, utype, ha_time}
.For each type, if
uuid
is not equal toold_uuid
, then (ha_time – old_timestamp > T
) must be satisfied.The update process uses optimistic concurrency control. e.g. if the heartbeat entry has been updated before another heartbeat reaches, the later heartbeat is discarded.
-
GetPrimary(utype)
:Return
(uuid, utype)
in heartbeat entry with the corresponding query utype if (ha_time - timestamp <= T
). Else return empty value.
-
After a client S started, it generates a unique instance ID
uuid
to identify itself and marks itself with the exactutype
, which it will work as in the future. -
S calls
GetPrimary(utype)
everyI
secs. -
If
GetPrimary(utype)
returned empty value, S callsUpdateHeartbeat(uuid, utype)
. -
Continue to call
GetPrimary(utype)
everyI
secs.a. If subsequent call to
GetPrimary(utype)
returns(uuid, utype)
generated in 1, S will then work as primary.b. If subsequent call to
GetPrimary(utype)
returns a unique ID which is different fromuuid
and the same type withutype
generated in 1, go back to 2.c. If subsequent call to
GetPrimary(utype)
returns an empty value / a corrupted message, error occurred in 3. Retry 3. -
S call
UpdateHeartBeat(uuid, utype)
andGetPrimary(utype)
everyI
sec.a. If
GetPrimary(utype)
returns anything except(uuid, utype)
, or didn't return for(T - I)
secs, exit itself and restart.
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.
When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repositories using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.