Update cluster-spec with Valkey 8.0 cluster improvements #167

PingXie · 2024-08-26T06:11:52Z

Update cluster specification document to clarify shard migration process changes in Valkey 8.0 and include detailed descriptions of shard replication and failover mechanism improvements.

Fix #76

Signed-off-by: Ping Xie <pingxie@google.com>

zuiderkwast

Awesome. But isn't it more? What I remember, there were a lot of changes and long code comments with scenario descriptions which are useful material for docs, like these:

Wouldn't it be good to copy these example scenarios to the relevant sections in the cluster spec?

topics/cluster-spec.md

PingXie · 2024-08-28T01:02:15Z

Awesome. But isn't it more? What I remember, there were a lot of changes and long code comments with scenario descriptions which are useful material for docs, like these:

https://github.com/valkey-io/valkey/pull/245/files#diff-2515500619600c5a1e7a8d9aaa8761a6071314cd226c3b7ee4d0e054691f0a8eR3304

https://github.com/valkey-io/valkey/pull/245/files#diff-2515500619600c5a1e7a8d9aaa8761a6071314cd226c3b7ee4d0e054691f0a8eR6330

Wouldn't it be good to copy these example scenarios to the relevant sections in the cluster spec?

I wasn't sure how much detail we should add to the spec so went with just the "facts" but yeah good point on providing rationales as well. Will bring over some content.

btw, https://github.com/valkey-io/valkey/pull/245/files#diff-2515500619600c5a1e7a8d9aaa8761a6071314cd226c3b7ee4d0e054691f0a8eR3304 has been addressed by #754 and the comments have been removed (in #754 too).

Signed-off-by: Ping Xie <pingxie@google.com>

topics/cluster-spec.md

Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Signed-off-by: Ping Xie <pingxie@outlook.com>

commands/cluster-setslot.md

Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Signed-off-by: Ping Xie <pingxie@outlook.com>

madolson

To viktors comment, I would prefer less detail in the cluster-spec when not explicitly needed by the end user. A lot of the "fixes" are already basically hidden when things go correctly. I think the current level of detail seems right to me.

madolson · 2024-09-01T02:18:46Z

commands/cluster-setslot.md

@@ -79,3 +79,14 @@ Notes:
  If the source node is informed before the destination node and the destination node crashes before it is set as new slot owner, the slot is left with no owner, even after a successful failover.
 * Step 6, sending `SETSLOT` to the nodes not involved in the resharding, is not technically necessary since the configuration will eventually propagate itself.
  However, it is a good idea to do so in order to stop nodes from pointing to the wrong node for the hash slot moved as soon as possible, resulting in less redirections to find the right node.
+* Starting from Valkey 8.0, `CLUSTER SETSLOT` is synchronously replicated to all healthy replicas


I don't want to put synchronous replication anywhere in the docs. It's not technically synchronous since we don't revert it on failures, so it's only one copy of the data. I would prefer saying "is replicated to all healthy replicas before being executed on the primaries.

since we don't revert it on failures, so it's only one copy of the data.

I don't think this is the common definition of "synchronous" replication. I modeled our "synchronous" replication after postgress'. The "revert" part falls in the realm of "distributed transactions" and it is a step up from "synchronous replication".

I would prefer saying "is replicated to all healthy replicas before being executed on the primaries.

This is also not accurate nor complete because we do wait for a certain number of replicas to ack the replication and this is the core part of this feature offering.

I don't think this is the common definition of "synchronous" replication. I modeled our "synchronous" replication after postgress'. The "revert" part falls in the realm of "distributed transactions" and it is a step up from "synchronous replication".

In postgreSQL, everything is a transaction though. There is no "step up" so to speak, they are the same. I also was using my history with PostgreSQL to object to the wording, since I think the database world provides a much stronger guarantee than we are providing.

This is also not accurate nor complete because we do wait for a certain number of replicas to ack the replication and this is the core part of this feature offering.

"Is replicated and acknlowedged on all healthy replicas before being executed on the primary"?

In postgreSQL, everything is a transaction though.

I am not aware that Postgress' replication is done transactionally based on my research. My understanding is that there is no rollback on other replicas when one replica fails to ack the replication either. It is just that the primary will wait for replicas to ack the replication. Note that in order to achieve transactional replication, one would need to implement 2PC, which trades (more) performance/availability for consistency. Do you have a Postgress code or doc pointer that indicates their repliction is done transactionally? I can dig some more too.

"Is replicated and acknlowedged on all healthy replicas before being executed on the primary"?

Yes.

The "synchronous" qualifier is critical here and it is used to contrast the current "asynchronous" replication, which doesn't provide the required semantics.

BTW, not related to Postgress, but "synchronous replication" is also used in the exact same semantics as I used here for another database I worked on before.

Can we avoid the word "synchronous" to make everyone happy? The concern is that someone sees "synchronous replication" out of context and posts a blog post "Valkey switches to synchronous replication" to create confusion.

"Is replicated and acknlowedged on all healthy replicas before being executed on the primary"?

This sounds about like synchronization to me, without using the word. Isn't it clear enough? Then maybe even more clearly indicate that the primary waits for ack before continuing, like...

"Is replicated to all healthy replicas and acknowledged back to the primary before being executed on the primary"?

"Best effort" == "Opportunistic" :) I think we will need a name at some point IF we expand this pattern beyond this command but I am on board with not adding a new concept for this one. I will drop "synchronous" but keep the high level explanation in this doc.

Best effort is kind of the key concept since the very beginning. Memcached doesn't have persistence and replication. Redis/Valkey has it, best effort, async, without compromising latency.

The cluster bus is supposed to be truly consistent though. We'll fix that some day.

Btw, the MIGRATE command doesn't use the replication nor cluster bus. It just sets up a new connection. Is it synchronous? It waits for OK before deleting the transferred key, I think, or does it? Is this how we should have done SETSLOT?

"Best effort" == "Opportunistic"

Yes. I would be OK with either of those two FWIW, it's not my top preference.

I dropped "synchronous". PTAL

madolson · 2024-09-01T02:20:19Z

commands/cluster-setslot.md

+  running Valkey version 8.0+. By default, this synchronous replication must complete within 2 seconds.
+  If the replication fails, the primary does not execute the command, and the client receives a
+  `NOREPLICAS Not enough good replicas to write` error. Operators can retry the command or customize the
+  timeout using the `TIMEOUT` parameter to further increase the reliability of live reconfiguration:


Btw, we should either decide if this style guide is wrong or not: https://github.com/valkey-io/valkey-doc?tab=readme-ov-file#styling-guidelines. "Start every sentence on a new line."

We didn't force it in the past to retain history, but I find it annoying we sometimes follow it.

I don't mind dropping it. I don't like to complain about it, so I haven't enforced it.

We can just skip reflowing text when it's edited.

yeah can we consider dropping this rule? I don't see this being enforced today on the existing doc. I feel that there are a few main situations wherein we need to modify a doc

either we fix typos (or small changes)

or we rewrite some portion of the doc because we have changed the behavior

or we add new content like in this case

In both the second and third cases, I would think we will have to review the entire paragraph already. I don't think we should ever re-flow the first case, irrespective of how we wrote the doc in the first place.

thoughts?

topics/cluster-spec.md

Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Signed-off-by: Ping Xie <pingxie@outlook.com>

Signed-off-by: Ping Xie <pingxie@google.com>

PingXie · 2024-09-11T04:06:00Z

@madolson @zuiderkwast any more feedback? Let's merge this one in the next day or two?

zuiderkwast

This looks good now. Thanks! (Nothing controversial. 😆)

madolson

Ship it!

Update cluster-spec with Valkey 8.0 cluster improvements

f562ac6

Signed-off-by: Ping Xie <pingxie@google.com>

PingXie assigned zuiderkwast and madolson Aug 26, 2024

zuiderkwast self-requested a review August 27, 2024 06:02

zuiderkwast reviewed Aug 27, 2024

View reviewed changes

topics/cluster-spec.md Outdated Show resolved Hide resolved

topics/cluster-spec.md Show resolved Hide resolved

topics/cluster-spec.md Outdated Show resolved Hide resolved

topics/cluster-spec.md Outdated Show resolved Hide resolved

PingXie added 3 commits August 27, 2024 19:03

Address review feedback

e3745bd

Signed-off-by: Ping Xie <pingxie@google.com>

Minor fix

888fd2f

Signed-off-by: Ping Xie <pingxie@google.com>

fix typo

fb9215f

Signed-off-by: Ping Xie <pingxie@google.com>

zuiderkwast reviewed Aug 28, 2024

View reviewed changes

topics/cluster-spec.md Outdated Show resolved Hide resolved

Update topics/cluster-spec.md

6019cf6

Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Signed-off-by: Ping Xie <pingxie@outlook.com>

zuiderkwast reviewed Aug 29, 2024

View reviewed changes

commands/cluster-setslot.md Outdated Show resolved Hide resolved

Update commands/cluster-setslot.md

8a8bd5c

Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech> Signed-off-by: Ping Xie <pingxie@outlook.com>

madolson reviewed Sep 1, 2024

View reviewed changes

PingXie and others added 4 commits August 31, 2024 23:23

Update topics/cluster-spec.md

60408be

Co-authored-by: Madelyn Olson <madelyneolson@gmail.com> Signed-off-by: Ping Xie <pingxie@outlook.com>

address review feedback

9f5c298

Signed-off-by: Ping Xie <pingxie@google.com>

Remove "synchronous" qualifier for CLUSTER SLOTS replication

c4641c7

Signed-off-by: Ping Xie <pingxie@google.com>

Fix typos

18a3b0c

Signed-off-by: Ping Xie <pingxie@google.com>

zuiderkwast approved these changes Sep 11, 2024

View reviewed changes

madolson approved these changes Sep 11, 2024

View reviewed changes

PingXie merged commit 1c9edd0 into valkey-io:main Sep 12, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update cluster-spec with Valkey 8.0 cluster improvements #167

Update cluster-spec with Valkey 8.0 cluster improvements #167

PingXie commented Aug 26, 2024 •

edited

Loading

zuiderkwast left a comment

PingXie commented Aug 28, 2024

madolson left a comment

madolson Sep 1, 2024

PingXie Sep 1, 2024 •

edited

Loading

madolson Sep 2, 2024 •

edited

Loading

PingXie Sep 2, 2024

zuiderkwast Sep 2, 2024

PingXie Sep 5, 2024

zuiderkwast Sep 5, 2024

zuiderkwast Sep 5, 2024

madolson Sep 5, 2024

PingXie Sep 9, 2024

madolson Sep 1, 2024 •

edited

Loading

zuiderkwast Sep 1, 2024

PingXie Sep 1, 2024

PingXie commented Sep 11, 2024

zuiderkwast left a comment

madolson left a comment

Update cluster-spec with Valkey 8.0 cluster improvements #167

Update cluster-spec with Valkey 8.0 cluster improvements #167

Conversation

PingXie commented Aug 26, 2024 • edited Loading

zuiderkwast left a comment

Choose a reason for hiding this comment

PingXie commented Aug 28, 2024

madolson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PingXie Sep 1, 2024 • edited Loading

Choose a reason for hiding this comment

madolson Sep 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

madolson Sep 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PingXie commented Sep 11, 2024

zuiderkwast left a comment

Choose a reason for hiding this comment

madolson left a comment

Choose a reason for hiding this comment

PingXie commented Aug 26, 2024 •

edited

Loading

PingXie Sep 1, 2024 •

edited

Loading

madolson Sep 2, 2024 •

edited

Loading

madolson Sep 1, 2024 •

edited

Loading