Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update cluster-spec with Valkey 8.0 cluster improvements #167

Merged
merged 10 commits into from
Sep 12, 2024
20 changes: 20 additions & 0 deletions topics/cluster-spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -479,6 +479,16 @@ is that:
* All queries about existing keys are processed by "A".
* All queries about non-existing keys in A are processed by "B", because "A" will redirect clients to "B".

Starting from Valkey 8.0, the `CLUSTER SETSLOT` command is synchronously replicated to all healthy replicas
running Valkey version 8.0+. By default, this synchronous replication must complete within 2 seconds.
PingXie marked this conversation as resolved.
Show resolved Hide resolved
If the replication fails, the primary does not execute the command, and the client receives a
`NOREPLICAS Not enough good replicas to write` error. Operators can retry the command or customize the
timeout using the `TIMEOUT` parameter to further increase the reliability of live reconfiguration:

CLUSTER SETSLOT slot [MIGRATING|IMPORTING|NODE] node-id [TIMEOUT timeout]

Here, `timeout` is measured in seconds, with 0 meaning to wait indefinitely.
zuiderkwast marked this conversation as resolved.
Show resolved Hide resolved

This way we no longer create new keys in "A".
In the meantime, `valkey-cli` used during reshardings
and Valkey Cluster configuration will migrate existing keys in
Expand Down Expand Up @@ -513,6 +523,11 @@ set the slots to their normal state again. The same command is usually
sent to all other nodes to avoid waiting for the natural
propagation of the new configuration across the cluster.

Starting from Valkey 8.0, Valkey clusters introduce the ability to elect a primary in empty shards.
This behavior ensures that even when a shard is in the process of receiving its first slot,
a primary can be elected. This prevents scenarios where there would be no primary available in the
empty shard to handle redirected requests from the official slot owner,
thereby maintaining availability during the live reconfiguration.
### ASK redirection
zuiderkwast marked this conversation as resolved.
Show resolved Hide resolved

In the previous section, we briefly talked about ASK redirection. Why can't
Expand Down Expand Up @@ -550,6 +565,11 @@ Slots migration is explained in similar terms but with different wording
(for the sake of redundancy in the documentation) in the `CLUSTER SETSLOT`
command documentation.

Starting from Valkey 8.0, when the primary in either the source or target shard fails during live reconfiguration,
the primary in the other shard will automatically attempt to update its migrating/importing state to correctly pair
with the newly elected primary. If this update is successful, the ASK redirection will continue functioning without
requiring operator intervention.
zuiderkwast marked this conversation as resolved.
Show resolved Hide resolved

### Client connections and redirection handling

To be efficient, Valkey Cluster clients maintain a map of the current slot
Expand Down