Skip to content
This repository has been archived by the owner on Sep 30, 2024. It is now read-only.

New config & failover behavior: PreventCrossDataCenterMasterFailover #766

Merged
merged 7 commits into from
Jan 14, 2019

Conversation

shlomi-noach
Copy link
Collaborator

Introducing PreventCrossDataCenterMasterFailover (boolean), defaults false.

Setting to true forces orchestrator to only fail over masters within same DC as failed master.

Some notes

regardless of this new config:

  • orchestrator will try its best to pick a replica from same DC
  • If unsuccessful, it may pick a server in a different DC
  • orchestrator then proceeds to check whether it should perform 2-step promotion, i.e. if it can promote yet another server on top of the one already chosen.

Now, when PreventCrossDataCenterMasterFailover: true:

  • It will completely disregard any server not in failed master's DC
  • It will do whatever it can to replace chosen server with one that is in failed master's DC

Finally:

  • whatever happens, if PreventCrossDataCenterMasterFailover: true, and the final-suggested server is not in same DC as failed master, the failover is aborted with error.
    • RESET SLAVE ALL and SET @@global.read_only will not be executed
    • PostMasterFailoverProcesses will not be executed
    • PostUnsuccessfulFailoverProcesses will be executed

Also:

  • This config does not affect intermediate master or master-master failovers.
  • When PreventCrossDataCenterMasterFailover: true, the raft DC distribution becomes mostly irrelevant. To elaborate:
    • It doesn't matter where orchestrator/raft none members are running at, there will never be a cross-DC master failover.
    • Say all masters are in dc1 and dc1 gets network isolated:
      • If orchestrator/raft has a quorum in dc1 then it is happy because it can see the masters and there is no failover.
      • If the orchestrator/raft quorum is outside dc1, then the leader (running from some dc2) will attempt a failover. It will run pre-failover hooks. But it will very quickly realize it cannot find a server to promote, because all of the servers in dc1 are inaccessible to it, and all other servers are disqualified.

cc @github/database-infrastructure @matt-ullmer @jeremycole, @sroysen

@shlomi-noach
Copy link
Collaborator Author

shlomi-noach commented Dec 27, 2018

TODO:

  • documentation

@shlomi-noach
Copy link
Collaborator Author

As example to a complex scenario:

srvA.dc1
+ srvB.dc1
+ srvC.dc1
+ srvX.dc2
  + srvY.dc2
  + srvZ.dc2

assume master srvA.dc1 fails, and "PreventCrossDataCenterMasterFailover": true.

If most up-to-date replica is srvB.dc1 or srvC.dc1 then everything is simple and there's no problem picking the replacement master.

If most up-to-date replica is srvX.dc2, then orchestrator will:

  • step 1:
srvX.dc2
+ srvB.dc1
+ srvC.dc1
+ srvY.dc2
+ srvZ.dc2
  • step 2:
    Realize promoted server is invalid. Try 2-step promotion of, say, srvB.cp1

    • if successful:
srvB.dc1
+ srvX.dc2
  + srvC.dc1
  + srvY.dc2
  + srvZ.dc2
  • if unsuccessful, fail the operation.

@sroysen
Copy link

sroysen commented Jan 2, 2019

ping @jordanwheeler , @akshaysuryawanshi

@akshaysuryawanshi
Copy link

  • if unsuccessful, fail the operation.

@shlomi-noach so does this fail only step 2 or it undo's step 1 as well, because now we have a master effectively in another DC, although not accepting writes since it fails to find the right master, and setting read_only off also fails ?

@shlomi-noach
Copy link
Collaborator Author

@akshaysuryawanshi

or it undo's step 1 as well

it does not undo step 1. Can you please explain again your scenario? I'm not sure if what you're describing is a failover experiment with this branch, or is your scenario unrelated?

@akshaysuryawanshi
Copy link

@shlomi-noach I was trying to understand your example scenario. if I understand correctly, the config option will make Orchestrator choose a better replica (one in the same DC) in the second step of the failover process. So if it isnt able to find the a valid replica, supposedly due to replication lag on them, Orchestrator will leave the topology with a remote master host, but not make it writable. Is that correct understanding of this PR ?

Also, the 2-step promotion step, does it retry based on some timeout or number of retries ? or it checks only once after step 1 is completed ?

@shlomi-noach
Copy link
Collaborator Author

if I understand correctly, the config option will make Orchestrator choose a better replica (one in the same DC) in the second step of the failover process. So if it isnt able to find the a valid replica, supposedly due to replication lag on them, Orchestrator will leave the topology with a remote master host, but not make it writable. Is that correct understanding of this PR ?

correct

Also, the 2-step promotion step, does it retry based on some timeout or number of retries ? or it checks only once after step 1 is completed ?

Only once.

I should clarify the example I presented is the most complex case. "If most up-to-date replica is srvB.dc1 or srvC.dc1 then everything is simple and there's no problem picking the replacement master." should be the more common case.

@akshaysuryawanshi
Copy link

I should clarify the example I presented is the most complex case.

Makes sense, we are testing similar kind of flag, which is checked when in IsBannedFromBeingCandidateReplica. If the candidateReplica's DC is not same as its master then we return false for that replica and abort the failover. It is up to the user to then take the correct action based on the state of the cluster, one of which is what this PR does in step 1.

The scenario is exactly as you mentioned, pretty complex one.

@shlomi-noach shlomi-noach temporarily deployed to production/mysql_cluster=conductor January 6, 2019 06:38 Inactive
@shlomi-noach shlomi-noach temporarily deployed to production/mysql_cluster=conductor January 8, 2019 06:14 Inactive
@shlomi-noach shlomi-noach temporarily deployed to production/mysql_cluster=conductor January 8, 2019 13:56 Inactive
@shlomi-noach shlomi-noach temporarily deployed to production/mysql_cluster=conductor January 9, 2019 07:07 Inactive
@shlomi-noach shlomi-noach temporarily deployed to production/mysql_cluster=conductor January 14, 2019 06:36 Inactive
@shlomi-noach shlomi-noach temporarily deployed to production/mysql_cluster=conductor January 14, 2019 06:55 Inactive
@shlomi-noach
Copy link
Collaborator Author

woot! Tests well in production

@shlomi-noach shlomi-noach merged commit 7d98b24 into master Jan 14, 2019
@shlomi-noach shlomi-noach deleted the disable-cross-dc-master-failover branch January 14, 2019 07:04
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants