Persist RSS completed marker on all sleds #7186
Labels
bootstrap services
For those occasions where you want the rack to turn on
Sled Agent
Related to the Per-Sled Configuration and Management
Right now, when we complete RSS we persist a marker file (ledger) that indicates that RSS should not be re-run as it has already completed. This prevents data loss. However, this marker file is only present on the M.2 devices attached to the scrimlet where RSS was run. If RSS runs on the other scrimlet, or one of the sleds is swapped into scimlet position, the marker will not be present.
We want this marker to be present on all sleds. We could require it to be persisted to all sleds before completing RSS by direct copying. However, this means that if a new sled is added to the rack and put in the scrimlet position it will not have the files. We already have a mechanism to gossip around configuration required for early boot: namely the
bootstore
. We can put this marker key in the bootstore as well and it will propagate asynchronously over the bootstrap network to any sled in the rack. Max delay for an online sled is ~1 second.This will require some changes to the bootstore to support multiple keys, but this is not unreasonable, and something we have considered in the past.
The text was updated successfully, but these errors were encountered: