-
Notifications
You must be signed in to change notification settings - Fork 981
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
master_id (runid) should protect from accidental master restarts #2636
Comments
* feat(cluster): Add `--cluster_id` flag This flag sets the unique ID of a node in a cluster. It is UB (and bad) to set the same IDs to multiple nodes in the same cluster. If unset (default), the `master_replid` (previously known as `master_id`) is used. Fixes #2643 Related to #2636 * gh comments * oops - revert line removal * fix * replica * disallow cluster_node_id in emulated mode * fix replica test
* feat(cluster): Add `--cluster_id` flag This flag sets the unique ID of a node in a cluster. It is UB (and bad) to set the same IDs to multiple nodes in the same cluster. If unset (default), the `master_replid` (previously known as `master_id`) is used. Fixes #2643 Related to #2636 * gh comments * oops - revert line removal * fix * replica * disallow cluster_node_id in emulated mode * fix replica test
Also consider the case where master is restarted during takeover |
Until now, replicas would re-connect and re-replicate a master after the master will restart. This is problematic in case the master loses its data, which will cause the replica to flush all and lose its data as well. This is a breaking change though, in that whoever controls the replica now has to explicitly issue a `REPLICAOF X Y` in order to re-establish a connection to a new master. This is true even if the master loaded an up to date RDB file. It's not necessary if the replica lost connection to the master and the master was always alive, and the connection is re-established. Fixes #2636
I think that Roman's proposed solution should play nicely with a restarting master after takeover: After the master is restarted, it will get a new repl-id, which will mean that even in an edge case where the replica still tries to connect to that master (shouldn't happen, but still) it will not flush its data. Other replicas for that master will, too, not flush their data, but will instead need to be explicitly sent commands to replicate the new master. |
* feat(replication): Do not auto replicate different master Until now, replicas would re-connect and re-replicate a master after the master will restart. This is problematic in case the master loses its data, which will cause the replica to flush all and lose its data as well. This is a breaking change though, in that whoever controls the replica now has to explicitly issue a `REPLICAOF X Y` in order to re-establish a connection to a new master. This is true even if the master loaded an up to date RDB file. It's not necessary if the replica lost connection to the master and the master was always alive, and the connection is re-established. Fixes #2636 * fix test * fixes * proxy proxy java java * better comment * fix comments * replica_reconnect_on_master_restart * proxy.close()
@ashotland it's done |
Currently we use master_id for two purposes: cluster node id generation and as "master id" during the replication.
Not directly related to this issue but important for providing additional context - using master_id as nodeid for cluster management is cumbersome and confusing.
I suggest we implement master_id protection so that replica that had been synced already and reached SSR with the master id A, won't reconnect automatically with master under the same address with master id B. Specifically, one would need to reissue "replicaof .." command again to bootstrap the replication again.
This behavior change on replica side should be under flag with default to preserve the current behaviour.
The text was updated successfully, but these errors were encountered: