hook for graceful master switch #428

igroene · 2018-03-05T15:54:12Z

I have been running some graceful master takeover testing using ProxySQL and Orchestrator together, and I believe it would be a good idea to have a hook that is triggered even earlier than PreFailoverProcesses.
The issue with PreFailoverProcesses is that it is triggered after the demoted master has already been placed by Orchestrator in read_only mode, as shown by this extract from the log:

Mar 03 14:25:10 mysql3 orchestrator[25032]: [martini] Started GET /api/graceful-master-takeover/mysql1/3306 for 192.168.56.1
Mar 03 14:25:10 mysql3 orchestrator[25032]: 2018-03-03 14:25:10 INFO Will demote mysql1:3306 and promote mysql2:3306 instead
Mar 03 14:25:10 mysql3 orchestrator[25032]: 2018-03-03 14:25:10 INFO Stopped slave on mysql2:3306, Self:mysql-bin.000009:3034573, Exec:mysql-bin.000010:18546301
Mar 03 14:25:10 mysql3 orchestrator[25032]: 2018-03-03 14:25:10 INFO Will set mysql1:3306 as read_only
Mar 03 14:25:10 mysql3 orchestrator[25032]: 2018-03-03 14:25:10 INFO instance mysql1:3306 read_only: true
Mar 03 14:25:10 mysql3 orchestrator[25032]: 2018-03-03 14:25:10 INFO auditType:read-only instance:mysql1:3306 cluster:mysql1:3306 message:set as true
Mar 03 14:25:10 mysql3 orchestrator[25032]: 2018-03-03 14:25:10 INFO Will advance mysql2:3306 to master coordinates mysql-bin.000010:18546301
Mar 03 14:25:10 mysql3 orchestrator[25032]: 2018-03-03 14:25:10 INFO Will start slave on mysql2:3306 until coordinates: mysql-bin.000010:18546301
Mar 03 14:25:10 mysql3 orchestrator[25032]: 2018-03-03 14:25:10 INFO Stopped slave on mysql2:3306, Self:mysql-bin.000009:3034573, Exec:mysql-bin.000010:18546301
Mar 03 14:25:10 mysql3 orchestrator[25032]: 2018-03-03 14:25:10 INFO executeCheckAndRecoverFunction: proceeding with DeadMaster detection on mysql1:3306; isActionable?: true; skipProcesses: false
Mar 03 14:25:10 mysql3 orchestrator[25032]: 2018-03-03 14:25:10 INFO topology_recovery: detected DeadMaster failure on mysql1:3306
Mar 03 14:25:10 mysql3 orchestrator[25032]: 2018-03-03 14:25:10 INFO topology_recovery: Running 1 OnFailureDetectionProcesses hooks
Mar 03 14:25:10 mysql3 orchestrator[25032]: 2018-03-03 14:25:10 DEBUG orchestrator/raft: applying command 2055: write-recovery-step
Mar 03 14:25:10 mysql3 orchestrator[25032]: 2018-03-03 14:25:10 INFO topology_recovery: Running OnFailureDetectionProcesses hook 1 of 1: echo 'Detected DeadMaster on mysql1:3306. Affected replicas: 1' >> /tmp/recovery.log
Mar 03 14:25:10 mysql3 orchestrator[25032]: 2018-03-03 14:25:10 DEBUG orchestrator/raft: applying command 2056: write-recovery-step
Mar 03 14:25:10 mysql3 orchestrator[25032]: 2018-03-03 14:25:10 INFO CommandRun(echo 'Detected DeadMaster on mysql1:3306. Affected replicas: 1' >> /tmp/recovery.log,[])
Mar 03 14:25:10 mysql3 orchestrator[25032]: 2018-03-03 14:25:10 INFO CommandRun/running: bash /tmp/orchestrator-process-cmd-358000144
Mar 03 14:25:10 mysql3 orchestrator[25032]: 2018-03-03 14:25:10 INFO CommandRun successful. exit status 0
Mar 03 14:25:10 mysql3 orchestrator[25032]: 2018-03-03 14:25:10 INFO topology_recovery: Completed OnFailureDetectionProcesses hook 1 of 1 in 4.556463ms
Mar 03 14:25:10 mysql3 orchestrator[25032]: 2018-03-03 14:25:10 DEBUG orchestrator/raft: applying command 2057: write-recovery-step
Mar 03 14:25:10 mysql3 orchestrator[25032]: 2018-03-03 14:25:10 INFO Completed OnFailureDetectionProcesses hook 1 of 1 in 4.556463ms
Mar 03 14:25:10 mysql3 orchestrator[25032]: 2018-03-03 14:25:10 INFO topology_recovery: done running OnFailureDetectionProcesses hooks
Mar 03 14:25:10 mysql3 orchestrator[25032]: 2018-03-03 14:25:10 DEBUG orchestrator/raft: applying command 2058: write-recovery-step
Mar 03 14:25:10 mysql3 orchestrator[25032]: 2018-03-03 14:25:10 DEBUG orchestrator/raft: applying command 2059: register-failure-detection
Mar 03 14:25:10 mysql3 orchestrator[25032]: 2018-03-03 14:25:10 INFO executeCheckAndRecoverFunction: proceeding with DeadMaster recovery on mysql1:3306; isRecoverable?: true; skipProcesses: false
Mar 03 14:25:10 mysql3 orchestrator[25032]: 2018-03-03 14:25:10 DEBUG orchestrator/raft: applying command 2060: write-recovery
Mar 03 14:25:10 mysql3 orchestrator[25032]: 2018-03-03 14:25:10 INFO topology_recovery: will handle DeadMaster event on mysql1:3306
Mar 03 14:25:10 mysql3 orchestrator[25032]: 2018-03-03 14:25:10 DEBUG orchestrator/raft: applying command 2061: write-recovery-step
Mar 03 14:25:10 mysql3 orchestrator[25032]: 2018-03-03 14:25:10 INFO auditType:recover-dead-master instance:mysql1:3306 cluster:mysql1:3306 message:problem found; will recover
Mar 03 14:25:10 mysql3 orchestrator[25032]: 2018-03-03 14:25:10 INFO topology_recovery: Running 1 PreFailoverProcesses hooks

For the ProxySQL use case, this returns errors to the application as soon as the host is set in read_only mode.
I would like to use the proposed hook to have ProxySQL set the old master to offline_soft and give active connections a chance to finish work to minimize errors the application is returned.

The text was updated successfully, but these errors were encountered:

shlomi-noach · 2018-03-06T06:19:02Z

This makes sense. Assigning to myself.

igroene · 2018-03-16T17:47:23Z

Hi @shlomi-noach just wondering if you have any plans to implement this in the near future? I understand if there are other priorities :)
Thank you!

shlomi-noach · 2018-03-17T15:52:53Z

@igroene I haven't prioritized this yet. Let me look into it.

shlomi-noach · 2018-03-17T16:06:53Z

@igroene this came up, for which a PR is ready and will ~~shortly be merged~~. EDIT: merged.

It takes a different approach, but I think solves your case, too. You'll get a command value injected in your hooks. The value would be graceful-master-takeover upon a graceful takeover action, or what have you if otherwise.

This was we can avoid specialized hooks. You will read the value of the command variable and make your own choices.

What do you think?

igroene · 2018-03-19T11:20:17Z

Hi @shlomi-noach unfortunately I think this does not solve the case I presented above.
The issue I encountered is not the lack of info about the master change operation, but with the order operations are performed:

issue graceful master switch
orchestrator sets old master as read only
PreFailoverProcesses hook is triggered
...

I would suggest either change the order as follows:

issue graceful master switch
PreFailoverProcesses hook is triggered
orchestrator sets old master as read only
...

or (probably better):

issue graceful master switch
New hook PreGracefulSwitchProcesses is triggered
orchestrator sets old master as read only
PreFailoverProcesses hook is triggered
...

Thank you

shlomi-noach · 2018-03-19T11:33:39Z

Ah, I see your point. Let me look into both options.

Slach · 2018-03-26T16:04:48Z

@shlomi-noach yes, PreGracefulSwiftProcesses would like useful feature
for swifch ProxySQL to correct active writer

shlomi-noach · 2018-03-27T12:44:49Z

I hope to propose a PR this week.

shlomi-noach · 2018-03-28T12:54:24Z

@igroene would you like to experiment with #443 ?
It is not final, and has multiple improvements for graceful-takeover, including removing the "single replica" constraints.

You will find PreGracefulTakeoverProcesses in config.

igroene · 2018-03-28T14:06:37Z

@shlomi-noach the hook worked perfectly! thank you for that. I am able to run sysbench and do a graceful switch without any errors :)

[ 2s ] thds: 4 tps: 57.86 qps: 1153.21 (r/w/o: 809.04/222.46/121.71) lat (ms,95%): 102.97 err/s: 0.00 reconn/s: 0.00
[ 3s ] thds: 4 tps: 69.17 qps: 1389.32 (r/w/o: 973.33/272.65/143.34) lat (ms,95%): 87.56 err/s: 0.00 reconn/s: 0.00
[ 4s ] thds: 4 tps: 46.97 qps: 957.41 (r/w/o: 670.59/191.88/94.94) lat (ms,95%): 282.25 err/s: 0.00 reconn/s: 0.00
[ 5s ] thds: 4 tps: 30.94 qps: 613.71 (r/w/o: 428.10/122.74/62.87) lat (ms,95%): 223.34 err/s: 0.00 reconn/s: 0.00
[ 6s ] thds: 4 tps: 51.12 qps: 995.30 (r/w/o: 693.60/199.46/102.24) lat (ms,95%): 125.52 err/s: 0.00 reconn/s: 0.00
[ 7s ] thds: 4 tps: 45.01 qps: 916.18 (r/w/o: 643.13/182.04/91.02) lat (ms,95%): 132.49 err/s: 0.00 reconn/s: 0.00
[ 8s ] thds: 4 tps: 47.94 qps: 948.79 (r/w/o: 664.15/187.76/96.88) lat (ms,95%): 193.38 err/s: 0.00 reconn/s: 0.00
[ 9s ] thds: 4 tps: 56.08 qps: 1127.53 (r/w/o: 790.07/224.30/113.15) lat (ms,95%): 123.28 err/s: 0.00 reconn/s: 0.00
[ 10s ] thds: 4 tps: 49.85 qps: 986.04 (r/w/o: 686.94/197.41/101.70) lat (ms,95%): 155.80 err/s: 0.00 reconn/s: 0.00
[ 11s ] thds: 4 tps: 54.05 qps: 1113.12 (r/w/o: 783.79/219.22/110.11) lat (ms,95%): 139.85 err/s: 0.00 reconn/s: 0.00
[ 12s ] thds: 4 tps: 56.14 qps: 1128.74 (r/w/o: 793.93/220.54/114.28) lat (ms,95%): 118.92 err/s: 0.00 reconn/s: 0.00
[ 13s ] thds: 4 tps: 50.88 qps: 996.59 (r/w/o: 691.33/201.51/103.75) lat (ms,95%): 155.80 err/s: 0.00 reconn/s: 0.00
[ 14s ] thds: 4 tps: 30.05 qps: 601.95 (r/w/o: 425.67/116.18/60.09) lat (ms,95%): 292.60 err/s: 0.00 reconn/s: 0.00
[ 15s ] thds: 4 tps: 61.83 qps: 1236.68 (r/w/o: 861.69/251.33/123.67) lat (ms,95%): 123.28 err/s: 0.00 reconn/s: 0.00
[ 16s ] thds: 4 tps: 46.10 qps: 930.02 (r/w/o: 653.42/183.40/93.20) lat (ms,95%): 134.90 err/s: 0.00 reconn/s: 0.00
[ 17s ] thds: 4 tps: 42.01 qps: 824.21 (r/w/o: 576.15/162.04/86.02) lat (ms,95%): 189.93 err/s: 0.00 reconn/s: 0.00
[ 18s ] thds: 4 tps: 7.98 qps: 180.59 (r/w/o: 132.70/31.93/15.96) lat (ms,95%): 139.85 err/s: 0.00 reconn/s: 0.00
[ 19s ] thds: 4 tps: 62.20 qps: 1238.90 (r/w/o: 864.72/246.78/127.40) lat (ms,95%): 846.57 err/s: 0.00 reconn/s: 0.00
[ 20s ] thds: 4 tps: 97.01 qps: 1925.12 (r/w/o: 1341.09/387.02/197.01) lat (ms,95%): 53.85 err/s: 0.00 reconn/s: 0.00
[ 21s ] thds: 4 tps: 104.90 qps: 2110.96 (r/w/o: 1479.57/414.60/216.79) lat (ms,95%): 55.82 err/s: 0.00 reconn/s: 0.00

I also took the liberty of testing the other part of the PR:

graceful-master-takover takes a more permissive approach and now allows failing over when then master has multiple replicas, given that:
The user specifies the particular rdesignated eplica they want to failover to
orchestrator is able to replocate all other replicas below designated replica.

That is not working for me via GUI by drag-dropping one of the 2 existing slaves to the left of the master in a simple 3 node topology (1 master, 2 direct slaves). I get this message:

GracefulMasterTakeover: when no target instance indicated, master mysql1:3306 should only have one replica (making the takeover safe and simple), but has 2. Aborting

Slach · 2018-03-28T14:08:25Z

@igroene could you share your orchestrator.conf.json with gracefull update proxysql commands?

igroene · 2018-03-28T14:20:12Z

Here is orchestrator.conf.json I am using for testing:

{
  "Debug": true,
  "EnableSyslog": false,
  "ListenAddress": ":3000",
  "MySQLTopologyUser": "orchestrator",
  "MySQLTopologyPassword": "****",
  "MySQLTopologyCredentialsConfigFile": "",
  "MySQLTopologySSLPrivateKeyFile": "",
  "MySQLTopologySSLCertFile": "",
  "MySQLTopologySSLCAFile": "",
  "MySQLTopologySSLSkipVerify": true,
  "MySQLTopologyUseMutualTLS": false,
  "MySQLOrchestratorHost": "127.0.0.1",
  "MySQLOrchestratorPort": 3306,
  "MySQLOrchestratorDatabase": "orchestrator",
  "MySQLOrchestratorUser": "orc_server_user",
  "MySQLOrchestratorPassword": "****",
  "MySQLOrchestratorCredentialsConfigFile": "",
  "MySQLOrchestratorSSLPrivateKeyFile": "",
  "MySQLOrchestratorSSLCertFile": "",
  "MySQLOrchestratorSSLCAFile": "",
  "MySQLOrchestratorSSLSkipVerify": true,
  "MySQLOrchestratorUseMutualTLS": false,
  "MySQLConnectTimeoutSeconds": 1,
  "DefaultInstancePort": 3306,
  "DiscoverByShowSlaveHosts": true,
  "InstancePollSeconds": 5,
  "UnseenInstanceForgetHours": 240,
  "SnapshotTopologiesIntervalHours": 0,
  "InstanceBulkOperationsWaitTimeoutSeconds": 10,
  "HostnameResolveMethod": "default",
  "MySQLHostnameResolveMethod": "@@hostname",
  "SkipBinlogServerUnresolveCheck": true,
  "ExpiryHostnameResolvesMinutes": 60,
  "RejectHostnameResolvePattern": "",
  "ReasonableReplicationLagSeconds": 10,
  "ProblemIgnoreHostnameFilters": [],
  "VerifyReplicationFilters": false,
  "ReasonableMaintenanceReplicationLagSeconds": 20,
  "CandidateInstanceExpireMinutes": 60,
  "AuditLogFile": "",
  "AuditToSyslog": false,
  "RemoveTextFromHostnameDisplay": ".mydomain.com:3306",
  "ReadOnly": false,
  "AuthenticationMethod": "",
  "HTTPAuthUser": "",
  "HTTPAuthPassword": "",
  "AuthUserHeader": "",
  "PowerAuthUsers": [
    "*"
  ],
  "ClusterNameToAlias": {
    "127.0.0.1": "test suite"
  },
  "SlaveLagQuery": "",
  "DetectClusterAliasQuery": "SELECT ifnull(max(cluster_name), '') as cluster_alias from meta.cluster where anchor=1;",
  "DetectClusterDomainQuery": "",
  "DetectInstanceAliasQuery": "",
  "DetectPromotionRuleQuery": "",
  "DataCenterPattern": "[.]([^.]+)[.][^.]+[.]mydomain[.]com",
  "PhysicalEnvironmentPattern": "[.]([^.]+[.][^.]+)[.]mydomain[.]com",
  "PromotionIgnoreHostnameFilters": [],
  "DetectSemiSyncEnforcedQuery": "",
  "ServeAgentsHttp": false,
  "AgentsServerPort": ":3001",
  "AgentsUseSSL": false,
  "AgentsUseMutualTLS": false,
  "AgentSSLSkipVerify": false,
  "AgentSSLPrivateKeyFile": "",
  "AgentSSLCertFile": "",
  "AgentSSLCAFile": "",
  "AgentSSLValidOUs": [],
  "UseSSL": false,
  "UseMutualTLS": false,
  "SSLSkipVerify": false,
  "SSLPrivateKeyFile": "",
  "SSLCertFile": "",
  "SSLCAFile": "",
  "SSLValidOUs": [],
  "URLPrefix": "",
  "StatusEndpoint": "/api/status",
  "StatusSimpleHealth": true,
  "StatusOUVerify": false,
  "AgentPollMinutes": 60,
  "UnseenAgentForgetHours": 6,
  "StaleSeedFailMinutes": 60,
  "SeedAcceptableBytesDiff": 8192,
  "PseudoGTIDPattern": "",
  "PseudoGTIDPatternIsFixedSubstring": false,
  "PseudoGTIDMonotonicHint": "asc:",
  "DetectPseudoGTIDQuery": "",
  "BinlogEventsChunkSize": 10000,
  "SkipBinlogEventsContaining": [],
  "ReduceReplicationAnalysisCount": true,
  "FailureDetectionPeriodBlockMinutes": 60,
  "RecoveryPollSeconds": 10,
  "RecoveryPeriodBlockSeconds": 3600,
  "RecoveryIgnoreHostnameFilters": [],
  "RecoverMasterClusterFilters": [
    ".*"
  ],
  "RecoverIntermediateMasterClusterFilters": [
    "_intermediate_master_pattern_"
  ],
  "OnFailureDetectionProcesses": [
    "echo 'Detected {failureType} on {failureCluster}. Affected replicas: {countSlaves}' >> /tmp/recovery.log"
  ],
  "PreGracefulTakeoverProcesses": [
    "/root/prefailover.sh"
  ],
  "PreFailoverProcesses": [
    ""
  ],
  "PostFailoverProcesses": [
    "echo '(for all types) Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log"
  ],
  "PostUnsuccessfulFailoverProcesses": [],
  "PostMasterFailoverProcesses": [
    "echo 'Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Promoted: {successorHost}:{successorPort}' >> /tmp/recovery.log"
  ],
  "PostIntermediateMasterFailoverProcesses": [
    "echo 'Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log"
  ],
  "CoMasterRecoveryMustPromoteOtherCoMaster": true,
  "DetachLostSlavesAfterMasterFailover": true,
  "ApplyMySQLPromotionAfterMasterFailover": true,
  "MasterFailoverDetachSlaveMasterHost": false,
  "MasterFailoverLostInstancesDowntimeMinutes": 0,
  "PostponeSlaveRecoveryOnLagMinutes": 0,
  "OSCIgnoreHostnameFilters": [],
  "GraphiteAddr": "",
  "GraphitePath": "",
  "GraphiteConvertHostnameDotsToUnderscores": true,
  "BackendDB": "sqlite",
  "SQLite3DataFile": "/var/lib/orchestrator/orchestrator.db",
  "RaftEnabled": false,
  "RaftDatadir": "/var/lib/orchestrator",
  "RaftBind": "192.168.56.100",
  "DefaultRaftPort": 10008,
  "RaftNodes": [

          "192.168.56.100",
          "192.168.56.101",
          "192.168.56.102"          ]
}

and here is the graceful switch hook:

#!/bin/bash

OldMaster=$ORC_FAILED_HOST

(
echo 'UPDATE mysql_servers SET STATUS="OFFLINE_SOFT" WHERE hostname="'"$OldMaster"'";'
echo "LOAD MYSQL SERVERS TO RUNTIME;"
) | mysql -vvv -uivan -pivan -h mysql3 -P6032

CONNUSED=`mysql -uivan -pivan -h mysql3 -P6032 -e 'SELECT IFNULL(SUM(ConnUsed),0) FROM stats_mysql_connection_pool WHERE status="OFFLINE_SOFT" AND srv_host="'"$OldMaster"'"' -B -N 2> /dev/null`
TRIES=0
while [ $CONNUSED -ne 0 -a $TRIES -ne 20 ]
do
  CONNUSED=`mysql -uivan -pivan -h mysql3 -P6032 -e 'SELECT IFNULL(SUM(ConnUsed),0) FROM stats_mysql_connection_pool WHERE status="OFFLINE_SOFT" AND srv_host="'"$OldMaster"'"' -B -N 2> /dev/null`
  TRIES=$(($TRIES+1))
  if [ $CONNUSED -ne "0" ]; then
    sleep 0.05
  fi
done

shlomi-noach · 2018-03-28T14:20:32Z

@igroene

the hook worked perfectly!

Great!

I also took the liberty of testing the other part of the PR:

Thank you for testing!

I think you might need a refresh to reload your JavaScript which is likely cached. But I'll double check.

igroene · 2018-03-28T15:27:28Z

I tried the refresh but still getting the same error

Slach · 2018-03-28T15:46:47Z

@igroene how you switch to new master ?
why your config have empty PostFailoverProcesses parameter?

igroene · 2018-03-28T15:57:46Z

I am switching by drag-dropping via GUI. PostFailoverProcesses is empty because I don't need a hook for that for this test. ProxySQL will detect the change in read-only that Orchestrator does and move hosts around hostgroups as needed.

Slach · 2018-03-28T16:03:13Z

@igroene

ProxySQL will detect the change in read-only that Orchestrator does and move hosts around hostgroups as needed.

it will be over ProxySQL scheduler switch or over other functionality?

shlomi-noach · 2018-03-28T16:27:27Z

Very good. The rest of the features are still work in progress. It will take a while to merge the branch.

igroene · 2018-03-28T16:50:35Z

Thanks @shlomi-noach !
@Slach this is leveraging replication_hostgroup table with writer/reader hostgroups. ProxySQL switchs hosts around based on read_only value.

shlomi-noach · 2018-03-28T17:58:48Z

ProxySQL switchs hosts around based on read_only value.

@igroene I'd like to suggest this isn't good practice. See my comment on https://mydbops.wordpress.com/2018/03/15/proxysql-series-seamless-replication-switchover-using-mha/, but I will write a more elaborate blog post.

igroene · 2018-03-28T18:57:18Z

Thanks for the warning, I agree with your comment. Just to clarify this is just a testing playground, I wouldn't use this for a prod deployment.

shlomi-noach · 2018-03-29T08:22:47Z

@igroene I still think this is a JavaScript issue. Can you please check the following? Source cluster.js from your browser, and look for /api/graceful-master-takeover/.

Does the entire line read:

        apiCommand("/api/graceful-master-takeover/" + existingMasterNode.Key.Hostname + "/" + existingMasterNode.Key.Port + "/" + newMasterNode.Key.Hostname + "/" + newMasterNode.Key.Port);

or

        apiCommand("/api/graceful-master-takeover/" + existingMasterNode.Key.Hostname + "/" + existingMasterNode.Key.Port);

?

igroene · 2018-03-29T12:23:24Z

You are right, the version being displayed by the browser reads as the first example altought I can't find the reason for that. Tried deleting browser cache, 2 other browsers and still get the same. I double checked the version I compiled indeed has the correct version so I am clueless at this point. Will keep investigating.

shlomi-noach · 2018-03-29T12:30:54Z

The first example is the desired one, actually 😛

igroene · 2018-03-29T12:37:42Z

hahaha you are right. I figured out my issue... I had compiled the branch with the changes but only replaced the binary and not the resources dir on /usr/local. Sorry about that.
Getting a different error now:

Any ideas?
EDIT: here is the full error msg

GracefulMasterTakeover: desginated instance mysql2:3306 cannot take over all of its siblings. Error: 2018-03-29 12:36:16 ERROR Relocating 1 replicas of mysql3:3306 below mysql2:3306 turns to be too complex; please do it manually

shlomi-noach · 2018-03-29T13:39:36Z

No ideas yet. Seems like you're running GTID or pseudo GTID and that it should work.

shlomi-noach · 2018-03-29T13:41:08Z

What happens if you relocate "mysql1" under "mysql2"?

igroene · 2018-04-04T11:37:24Z

Sorry about the delay, I was out for a couple of days. I am indeed using GTID, and if I move mysql1 under mysql2, then try to promote mysql2 it works as expected.
Nothing useful in the logs unfortunately for the case that fails:

Apr 04 11:30:35 mysql1 orchestrator[4453]: [martini] Started GET /api/graceful-master-takeover/mysql1/3306/mysql2/3306 for 192.168.56.1:50935
Apr 04 11:30:35 mysql1 orchestrator[4453]: 2018-04-04 11:30:35 INFO GracefulMasterTakeover: Will let mysql2:3306 take over its siblings
Apr 04 11:30:35 mysql1 orchestrator[4453]: 2018-04-04 11:30:35 INFO Will move 1 replicas below mysql2:3306 via GTID
Apr 04 11:30:35 mysql1 orchestrator[4453]: 2018-04-04 11:30:35 ERROR Relocating 1 replicas of mysql1:3306 below mysql2:3306 turns to be too complex; please do it manually
Apr 04 11:30:35 mysql1 orchestrator[4453]: [martini] Completed 500 Internal Server Error in 22.606937ms

I did a few more tests and I seem to "randomly" get the "too complex" message with one of the slaves. If I then try to promote the other slave it works.

shlomi-noach · 2018-04-08T11:08:29Z

Can you please verify that Auto_position is 1 in show slave status at all times? I suspect perhaps it may be 0, in which case orchestrator cannot actually utilize GTID for failover.

shlomi-noach self-assigned this Mar 6, 2018

shlomi-noach mentioned this issue Apr 15, 2018

Graceful takeover pre hooks #469

Merged

1 task

shlomi-noach closed this as completed in #469 Apr 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hook for graceful master switch #428

hook for graceful master switch #428

igroene commented Mar 5, 2018 •

edited

Loading

shlomi-noach commented Mar 6, 2018

igroene commented Mar 16, 2018

shlomi-noach commented Mar 17, 2018

shlomi-noach commented Mar 17, 2018 •

edited

Loading

igroene commented Mar 19, 2018

shlomi-noach commented Mar 19, 2018

Slach commented Mar 26, 2018

shlomi-noach commented Mar 27, 2018

shlomi-noach commented Mar 28, 2018

igroene commented Mar 28, 2018

Slach commented Mar 28, 2018

igroene commented Mar 28, 2018

shlomi-noach commented Mar 28, 2018

igroene commented Mar 28, 2018

Slach commented Mar 28, 2018

igroene commented Mar 28, 2018

Slach commented Mar 28, 2018

shlomi-noach commented Mar 28, 2018

igroene commented Mar 28, 2018 •

edited

Loading

shlomi-noach commented Mar 28, 2018

igroene commented Mar 28, 2018

shlomi-noach commented Mar 29, 2018

igroene commented Mar 29, 2018

shlomi-noach commented Mar 29, 2018

igroene commented Mar 29, 2018 •

edited

Loading

shlomi-noach commented Mar 29, 2018

shlomi-noach commented Mar 29, 2018

igroene commented Apr 4, 2018

shlomi-noach commented Apr 8, 2018

hook for graceful master switch #428

hook for graceful master switch #428

Comments

igroene commented Mar 5, 2018 • edited Loading

shlomi-noach commented Mar 6, 2018

igroene commented Mar 16, 2018

shlomi-noach commented Mar 17, 2018

shlomi-noach commented Mar 17, 2018 • edited Loading

igroene commented Mar 19, 2018

shlomi-noach commented Mar 19, 2018

Slach commented Mar 26, 2018

shlomi-noach commented Mar 27, 2018

shlomi-noach commented Mar 28, 2018

igroene commented Mar 28, 2018

Slach commented Mar 28, 2018

igroene commented Mar 28, 2018

shlomi-noach commented Mar 28, 2018

igroene commented Mar 28, 2018

Slach commented Mar 28, 2018

igroene commented Mar 28, 2018

Slach commented Mar 28, 2018

shlomi-noach commented Mar 28, 2018

igroene commented Mar 28, 2018 • edited Loading

shlomi-noach commented Mar 28, 2018

igroene commented Mar 28, 2018

shlomi-noach commented Mar 29, 2018

igroene commented Mar 29, 2018

shlomi-noach commented Mar 29, 2018

igroene commented Mar 29, 2018 • edited Loading

shlomi-noach commented Mar 29, 2018

shlomi-noach commented Mar 29, 2018

igroene commented Apr 4, 2018

shlomi-noach commented Apr 8, 2018

igroene commented Mar 5, 2018 •

edited

Loading

shlomi-noach commented Mar 17, 2018 •

edited

Loading

igroene commented Mar 28, 2018 •

edited

Loading

igroene commented Mar 29, 2018 •

edited

Loading