Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update cluster-spec with Valkey 8.0 cluster improvements #167

Merged
merged 10 commits into from
Sep 12, 2024

Conversation

PingXie
Copy link
Member

@PingXie PingXie commented Aug 26, 2024

Update cluster specification document to clarify shard migration process changes in Valkey 8.0 and include detailed descriptions of shard replication and failover mechanism improvements.

Fix #76

Signed-off-by: Ping Xie <pingxie@google.com>
Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. But isn't it more? What I remember, there were a lot of changes and long code comments with scenario descriptions which are useful material for docs, like these:

Wouldn't it be good to copy these example scenarios to the relevant sections in the cluster spec?

topics/cluster-spec.md Outdated Show resolved Hide resolved
topics/cluster-spec.md Show resolved Hide resolved
topics/cluster-spec.md Outdated Show resolved Hide resolved
topics/cluster-spec.md Outdated Show resolved Hide resolved
@PingXie
Copy link
Member Author

PingXie commented Aug 28, 2024

Awesome. But isn't it more? What I remember, there were a lot of changes and long code comments with scenario descriptions which are useful material for docs, like these:

Wouldn't it be good to copy these example scenarios to the relevant sections in the cluster spec?

I wasn't sure how much detail we should add to the spec so went with just the "facts" but yeah good point on providing rationales as well. Will bring over some content.

btw, https://github.com/valkey-io/valkey/pull/245/files#diff-2515500619600c5a1e7a8d9aaa8761a6071314cd226c3b7ee4d0e054691f0a8eR3304 has been addressed by #754 and the comments have been removed (in #754 too).

Signed-off-by: Ping Xie <pingxie@google.com>
Signed-off-by: Ping Xie <pingxie@google.com>
Signed-off-by: Ping Xie <pingxie@google.com>
topics/cluster-spec.md Outdated Show resolved Hide resolved
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Signed-off-by: Ping Xie <pingxie@outlook.com>
Co-authored-by: Viktor Söderqvist <viktor.soderqvist@est.tech>
Signed-off-by: Ping Xie <pingxie@outlook.com>
Copy link
Member

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To viktors comment, I would prefer less detail in the cluster-spec when not explicitly needed by the end user. A lot of the "fixes" are already basically hidden when things go correctly. I think the current level of detail seems right to me.

@@ -79,3 +79,14 @@ Notes:
If the source node is informed before the destination node and the destination node crashes before it is set as new slot owner, the slot is left with no owner, even after a successful failover.
* Step 6, sending `SETSLOT` to the nodes not involved in the resharding, is not technically necessary since the configuration will eventually propagate itself.
However, it is a good idea to do so in order to stop nodes from pointing to the wrong node for the hash slot moved as soon as possible, resulting in less redirections to find the right node.
* Starting from Valkey 8.0, `CLUSTER SETSLOT` is synchronously replicated to all healthy replicas
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to put synchronous replication anywhere in the docs. It's not technically synchronous since we don't revert it on failures, so it's only one copy of the data. I would prefer saying "is replicated to all healthy replicas before being executed on the primaries.

Copy link
Member Author

@PingXie PingXie Sep 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we don't revert it on failures, so it's only one copy of the data.

I don't think this is the common definition of "synchronous" replication. I modeled our "synchronous" replication after postgress'. The "revert" part falls in the realm of "distributed transactions" and it is a step up from "synchronous replication".

I would prefer saying "is replicated to all healthy replicas before being executed on the primaries.

This is also not accurate nor complete because we do wait for a certain number of replicas to ack the replication and this is the core part of this feature offering.

Copy link
Member

@madolson madolson Sep 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the common definition of "synchronous" replication. I modeled our "synchronous" replication after postgress'. The "revert" part falls in the realm of "distributed transactions" and it is a step up from "synchronous replication".

In postgreSQL, everything is a transaction though. There is no "step up" so to speak, they are the same. I also was using my history with PostgreSQL to object to the wording, since I think the database world provides a much stronger guarantee than we are providing.

This is also not accurate nor complete because we do wait for a certain number of replicas to ack the replication and this is the core part of this feature offering.

"Is replicated and acknlowedged on all healthy replicas before being executed on the primary"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In postgreSQL, everything is a transaction though.

I am not aware that Postgress' replication is done transactionally based on my research. My understanding is that there is no rollback on other replicas when one replica fails to ack the replication either. It is just that the primary will wait for replicas to ack the replication. Note that in order to achieve transactional replication, one would need to implement 2PC, which trades (more) performance/availability for consistency. Do you have a Postgress code or doc pointer that indicates their repliction is done transactionally? I can dig some more too.

"Is replicated and acknlowedged on all healthy replicas before being executed on the primary"?

Yes.

The "synchronous" qualifier is critical here and it is used to contrast the current "asynchronous" replication, which doesn't provide the required semantics.

BTW, not related to Postgress, but "synchronous replication" is also used in the exact same semantics as I used here for another database I worked on before.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid the word "synchronous" to make everyone happy? The concern is that someone sees "synchronous replication" out of context and posts a blog post "Valkey switches to synchronous replication" to create confusion.

"Is replicated and acknlowedged on all healthy replicas before being executed on the primary"?

This sounds about like synchronization to me, without using the word. Isn't it clear enough? Then maybe even more clearly indicate that the primary waits for ack before continuing, like...

"Is replicated to all healthy replicas and acknowledged back to the primary before being executed on the primary"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Best effort" == "Opportunistic" :) I think we will need a name at some point IF we expand this pattern beyond this command but I am on board with not adding a new concept for this one. I will drop "synchronous" but keep the high level explanation in this doc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best effort is kind of the key concept since the very beginning. Memcached doesn't have persistence and replication. Redis/Valkey has it, best effort, async, without compromising latency.

The cluster bus is supposed to be truly consistent though. We'll fix that some day.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, the MIGRATE command doesn't use the replication nor cluster bus. It just sets up a new connection. Is it synchronous? It waits for OK before deleting the transferred key, I think, or does it? Is this how we should have done SETSLOT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Best effort" == "Opportunistic"

Yes. I would be OK with either of those two FWIW, it's not my top preference.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dropped "synchronous". PTAL

Comment on lines +83 to +86
running Valkey version 8.0+. By default, this synchronous replication must complete within 2 seconds.
If the replication fails, the primary does not execute the command, and the client receives a
`NOREPLICAS Not enough good replicas to write` error. Operators can retry the command or customize the
timeout using the `TIMEOUT` parameter to further increase the reliability of live reconfiguration:
Copy link
Member

@madolson madolson Sep 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, we should either decide if this style guide is wrong or not: https://github.com/valkey-io/valkey-doc?tab=readme-ov-file#styling-guidelines. "Start every sentence on a new line."

We didn't force it in the past to retain history, but I find it annoying we sometimes follow it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind dropping it. I don't like to complain about it, so I haven't enforced it.

We can just skip reflowing text when it's edited.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah can we consider dropping this rule? I don't see this being enforced today on the existing doc. I feel that there are a few main situations wherein we need to modify a doc

  1. either we fix typos (or small changes)
  2. or we rewrite some portion of the doc because we have changed the behavior
  3. or we add new content like in this case

In both the second and third cases, I would think we will have to review the entire paragraph already. I don't think we should ever re-flow the first case, irrespective of how we wrote the doc in the first place.

thoughts?

topics/cluster-spec.md Outdated Show resolved Hide resolved
topics/cluster-spec.md Outdated Show resolved Hide resolved
PingXie and others added 4 commits August 31, 2024 23:23
Co-authored-by: Madelyn Olson <madelyneolson@gmail.com>
Signed-off-by: Ping Xie <pingxie@outlook.com>
Signed-off-by: Ping Xie <pingxie@google.com>
Signed-off-by: Ping Xie <pingxie@google.com>
Signed-off-by: Ping Xie <pingxie@google.com>
@PingXie
Copy link
Member Author

PingXie commented Sep 11, 2024

@madolson @zuiderkwast any more feedback? Let's merge this one in the next day or two?

Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good now. Thanks! (Nothing controversial. 😆)

Copy link
Member

@madolson madolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ship it!

@PingXie PingXie merged commit 1c9edd0 into valkey-io:main Sep 12, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[NEW] Document slot migration improvements
3 participants