Stale `PONG` message causes incorrect `replicaof` updates leading to `replicaof` loops #1015

PingXie · 2024-09-11T04:03:22Z

I have a theory about how this could happen.

We had a stale PONG message issue, which was fixed in commit 28976a9

valkey/src/cluster_legacy.c

Line 3271 in 2b76c8f

if (sender->configEpoch > sender_claimed_config_epoch) {
However we didn't bail after detecting this stale message. We proceed to

valkey/src/cluster_legacy.c

Line 3311 in 2b76c8f

if (sender_claimed_primary && sender->replicaof != sender_claimed_primary) {
And then update sender's replicaof based on the stale message at:

valkey/src/cluster_legacy.c

Line 3317 in 2b76c8f

sender->replicaof = sender_claimed_primary;

Now, imagine the following scenario

[T0] Three nodes: primary A with replica B, and an observer node N
[T1] B's PONG message to N claiming A is its primary gets stuck somewhere on the way to N
[T2] B becomes primary after a manual failover and then notifies A (and N but that message will get stuck behind the PONG message at T1)
[T3] A becomes a replica of B
[T4] A, now a replica of B, sends PING to N, which goes through the following steps that end up "promote" B to a primary, indirectly

valkey/src/cluster_legacy.c

Line 3257 in 2b76c8f

if (sender) {
valkey/src/cluster_legacy.c

Line 3267 in 2b76c8f

if (sender_last_reported_as_primary) {
valkey/src/cluster_legacy.c

Line 3269 in 2b76c8f

if (sender_claimed_primary && areInSameShard(sender_claimed_primary, sender)) {
valkey/src/cluster_legacy.c

Line 3281 in 2b76c8f

clusterSetNodeAsPrimary(sender_claimed_primary);

and sets A's replicaof to B
valkey/src/cluster_legacy.c

Line 3311 in 2b76c8f

if (sender_claimed_primary && sender->replicaof != sender_claimed_primary) {
valkey/src/cluster_legacy.c

Line 3317 in 2b76c8f

sender->replicaof = sender_claimed_primary;

[T5] Finally, B's PONG message to N from [T1] arrives on N and it goes through the following steps
valkey/src/cluster_legacy.c

Line 3257 in 2b76c8f

if (sender) {
valkey/src/cluster_legacy.c

Line 3264 in 2b76c8f

/* Node is a replica. */

Due to step 4, B got promoted to primary, implicitly
valkey/src/cluster_legacy.c

Line 3267 in 2b76c8f

if (sender_last_reported_as_primary) {

However the epoch is stale, which is correctly handled
valkey/src/cluster_legacy.c

Line 3271 in 2b76c8f

if (sender->configEpoch > sender_claimed_config_epoch) {
valkey/src/cluster_legacy.c

Line 3273 in 2b76c8f

"Ignore stale message from %.40s (%s) in shard %.40s;"

We don't bail but instead continue to
valkey/src/cluster_legacy.c

Line 3311 in 2b76c8f

if (sender_claimed_primary && sender->replicaof != sender_claimed_primary) {

and finally updates B->replicaof to A, completing the loop
valkey/src/cluster_legacy.c

Line 3317 in 2b76c8f

sender->replicaof = sender_claimed_primary;

I have seen stale messages in the past and I also notice that the latest failure in the codecov run, which could alter the timing quite a bit so I think this theory is very plausible.

The fix would be to bail immediately after detecting the stale message

valkey/src/cluster_legacy.c

Line 3273 in 2b76c8f

"Ignore stale message from %.40s (%s) in shard %.40s;"

BTW, we have another undetected stale message issue (#798)

Originally posted by @PingXie in #573 (comment)

The text was updated successfully, but these errors were encountered:

PingXie mentioned this issue Sep 11, 2024

Avoid shard id update of replica if not matching with primary shard id #573

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stale `PONG` message causes incorrect `replicaof` updates leading to `replicaof` loops #1015

Stale `PONG` message causes incorrect `replicaof` updates leading to `replicaof` loops #1015

PingXie commented Sep 11, 2024

Stale PONG message causes incorrect replicaof updates leading to replicaof loops #1015

Stale PONG message causes incorrect replicaof updates leading to replicaof loops #1015

Comments

PingXie commented Sep 11, 2024

Stale `PONG` message causes incorrect `replicaof` updates leading to `replicaof` loops #1015

Stale `PONG` message causes incorrect `replicaof` updates leading to `replicaof` loops #1015