Skip to content
This repository has been archived by the owner on Sep 30, 2024. It is now read-only.

Orchestrator promotes a replica with lag on Mariadb #1363

Closed
mohankara opened this issue Jun 2, 2021 · 46 comments · Fixed by #1374
Closed

Orchestrator promotes a replica with lag on Mariadb #1363

mohankara opened this issue Jun 2, 2021 · 46 comments · Fixed by #1374

Comments

@mohankara
Copy link

mohankara commented Jun 2, 2021

We have a 3 node MariaDB 10.5.10 setup on Centos. 1 Primary and 2 replicas with semi-sync enabled.
Our current orchestrator version is 3.2.4

We had a scenario where the replicas were lagging by few hours, master was not reachable so one of the replicas was promoted as primary in spite of the huge lag. This resulted in a data loss. Ideally orchestrator should wait for the replica's relay logs to be applied on the replica then promote as a master. This seems to be the behavior on MySQL based on my testing but not on Mariadb.

--Test case:
Tests against MySQL and Mariadb are done with these orchestrator parameters in /etc/orchaestrator.conf.json

"DelayMasterPromotionIfSQLThreadNotUpToDate": true,
"debug": true

Restart orchestrator on all 3 nodes
I)Test on MariaDB:
Start a 3 node Mariadb cluster (Semi-sync enabled)
1.Create and add data to a test table
create table test (colA int, colB int, colC datetime, colD int);

insert into test values (rand()*100,rand()*1000,now(),rand()*10000);
insert into test values (rand()*100,rand()*1000,now(),rand()*10000);
insert into test values (rand()*100,rand()*1000,now(),rand()*10000);
insert into test values (rand()*100,rand()*1000,now(),rand()*10000);
insert into test values (rand()*100,rand()*1000,now(),rand()*10000);
insert into test values (rand()*100,rand()*1000,now(),rand()*10000);
insert into test values (rand()*100,rand()*1000,now(),rand()*10000);

  1. Stop slave SQL_THREAD on replicas (Node 2, 3)

  2. Wait for few secs and add some more data to Node 1 (master)
    insert into test values (rand()*100,rand()*1000,now(),rand()*10000);
    insert into test values (rand()*100,rand()*1000,now(),rand()*10000);
    insert into test values (rand()*100,rand()*1000,now(),rand()*10000);
    insert into test values (rand()*100,rand()*1000,now(),rand()*10000);
    insert into test values (rand()*100,rand()*1000,now(),rand()*10000);
    insert into test values (rand()*100,rand()*1000,now(),rand()*10000);
    insert into test values (rand()*100,rand()*1000,now(),rand()*10000);

  3. Stop mysqld on Master (Node 1)

  4. You will see the orchestrator promoting a replica without the data added in Step Rqlite bin #3.

Test on MySQL 5.7.32:
Repeat the same test on 3 node MySQL
You will notice orchestrator promoting one of the replicas with out any data loss ie seeing 14 rows !!!

Thank You
Mohan

@shlomi-noach
Copy link
Collaborator

This PR goes at least halfway though: #1366

Question: in your actual production incident, was SQL_thread running at time of failover, to the best of your knowledge?

@mohankara
Copy link
Author

In 1 case it was running but behind, in another it was down due to an error. In both cases the replica was wrongly promoted.

@mohankara
Copy link
Author

Any ETA on when the fix will be available ? We have few sites that experience this issue.

@shlomi-noach
Copy link
Collaborator

I hope to reproduce this week.

@mohankara
Copy link
Author

I hope to reproduce this week.

Thanks Shlomi. I am looking forward to it.

@shlomi-noach
Copy link
Collaborator

I followed the reproduction steps, but this does not reproduce for me: both replicas apply the relay log and then get promoted. I'm using current master branch.
This was very manual. I'll need to set up something more automated so I can test again and again and again. I need to think about how to do that without exhausting my resources.

@mohankara
Copy link
Author

I followed the reproduction steps, but this does not reproduce for me: both replicas apply the relay log and then get promoted. I'm using current master branch.
That's what we want correct ? Is this with your new fixes in place or with 3.2.4 ?

This was very manual. I'll need to set up something more automated so I can test again and again and again. I need to think about how to do that without exhausting my resources.
This issue happens only with Mariadb. MySQL works fine. I have some cycles for testing if I can be of any assistance to you.

@shlomi-noach
Copy link
Collaborator

This issue happens only with Mariadb. MySQL works fine. I have some cycles for testing if I can be of any assistance to you.

Yes, of course. I used MariaDB 10.5.10, same as reported.

@shlomi-noach
Copy link
Collaborator

I'll continue early next week. This week is increasingly busy

@shiweng
Copy link

shiweng commented Jun 21, 2021

This issue happens only with Mariadb. MySQL works fine. I have some cycles for testing if I can be of any assistance to you.

Yes, of course. I used MariaDB 10.5.10, same as reported.

hi,may be I have similar issues,
Orchestrator version: 3.0.14
mysql version: mysql-community-8.0.19
a replication cluster with 1 master and 2 slaves,use pt-heartbeat as slowquery monitor, and I set following parameters:
DelayMasterPromotionIfSQLThreadNotUpToDate: true
FailMasterPromotionIfSQLThreadNotUpToDate: false
FailMasterPromotionOnLagMinutes:0

when both slaves were lag behind master for more than 60 seconds, I issue "shutdown;" on master, and I hope orchestrator will wait for one of slaves sqlthreaduptodate with master, then begin a promotion.however, promotion begin immediately when master gone,so promoted slave will lost some transactions. this is not what we are expected.

now,I have to set parameters on my production environment:
DelayMasterPromotionIfSQLThreadNotUpToDate: false
FailMasterPromotionIfSQLThreadNotUpToDate: true
FailMasterPromotionOnLagMinutes:0

with these settings, promotion will fail when all slaves are lag behind with master.
will you do me a favor,thanks.

by the way, I use sysbench to simulate pressure on master,and I set "set global binlog_group_commit_sync_delay=20000;" on slaves to simulate lags.

@mohankara
Copy link
Author

This issue happens only with Mariadb. MySQL works fine. I have some cycles for testing if I can be of any assistance to you.

Yes, of course. I used MariaDB 10.5.10, same as reported.

hi,may be I have similar issues,
Orchestrator version: 3.0.14
mysql version: mysql-community-8.0.19
a replication cluster with 1 master and 2 slaves,use pt-heartbeat as slowquery monitor, and I set following parameters:
DelayMasterPromotionIfSQLThreadNotUpToDate: true
FailMasterPromotionIfSQLThreadNotUpToDate: false
FailMasterPromotionOnLagMinutes:0

when both slaves were lag behind master for more than 60 seconds, I issue "shutdown;" on master, and I hope orchestrator will wait for one of slaves sqlthreaduptodate with master, then begin a promotion.however, promotion begin immediately when master gone,so promoted slave will lost some transactions. this is not what we are expected.

now,I have to set parameters on my production environment:
DelayMasterPromotionIfSQLThreadNotUpToDate: false
FailMasterPromotionIfSQLThreadNotUpToDate: true
FailMasterPromotionOnLagMinutes:0

with these settings, promotion will fail when all slaves are lag behind with master.
will you do me a favor,thanks.

by the way, I use sysbench to simulate pressure on master,and I set "set global binlog_group_commit_sync_delay=20000;" on slaves to simulate lags.

I couldn't simulate this issue (replica getting promoted as master while there is lag) on MySQL 5.7. But I was using orchestrator 3.2.4. I know there are fixes related to this issue in 3.2.4. Try that version or 3.2.5 and see. Hope that works for you.

@shlomi-noach
Copy link
Collaborator

Gonna clear some time tomorrow for testing.

@shlomi-noach
Copy link
Collaborator

Can you please share your MariaDB config? Also, a SHOW SLAVE STATUS on any of the replicas?

@shlomi-noach
Copy link
Collaborator

I'm able to reproduce.

@shlomi-noach
Copy link
Collaborator

shlomi-noach commented Jun 23, 2021

OK this is does after all circles back to the surprising behavior on MariaDB, explained in #1260 (comment).

As it turns out, when a replica cannot connect to its master/primary, and tries to reconnect, it resets the value of Read_Master_Log_Pos. To illustrate and to explain why it matters, consider this scenario:

  1. MariaDB primary + 2 replicas in GTID=current_pos and with semi sync enabled with "infinite" timeout
  2. We create a table on the primary, insert one row
  3. all 3 nodes show that one row
  4. we stop slave sql_thread on both replicas
  5. we insert 2nd row on the primary
  6. since sql_thread on replicas is still working, they both pull the binlog event, acking to the primary
  7. the primary now has 2 rows, but both replicas have 1 row.

At this time the coordinates on my test env show:

mysql -e "show slave status\G" | grep Master_Log

output on both replicas:

           Read_Master_Log_Pos: 3985
         Relay_Master_Log_File: mysql_sandbox22511-bin.000001
           Exec_Master_Log_Pos: 3784
---
               Master_Log_File: mysql_sandbox22511-bin.000001
           Read_Master_Log_Pos: 3985
         Relay_Master_Log_File: mysql_sandbox22511-bin.000001
           Exec_Master_Log_Pos: 3784

which makes perfect sense. The primary has advanced as far as pos=3985 but both replicas are only caught up as far as 3784.

Now, we stop the primary. Whatever it is that orchestrator does during promotion, we then see (right before actual promotion):

               Master_Log_File: mysql_sandbox22513-bin.000001
           Read_Master_Log_Pos: 3768
         Relay_Master_Log_File: mysql_sandbox22513-bin.000001
           Exec_Master_Log_Pos: 3768
---
               Master_Log_File: mysql_sandbox22511-bin.000001
           Read_Master_Log_Pos: 3784
         Relay_Master_Log_File: mysql_sandbox22511-bin.000001
           Exec_Master_Log_Pos: 3784

Notice how in both replicas, the value of Read_Master_Log_Pos is decreased from 3985... 💥
So the replicas forget that they were ahead in replication. They forget that they were supposed to track a certain position.
And that's why the failover in MariaDB behaves differently than in MySQL.

Anyway, we're here to solve this. I thought #1264 should have solved it, but apparently there's seemingly someplace else where orchestrator restarts IO_THREAD on the replicas. So I'm gonna try and find where that happens and take it from there.

@shlomi-noach
Copy link
Collaborator

MariaDB'a behavior baffles me. Irrespective of orchestrator, it's keen on losing data. Here's the story:

I've fixed orchestrator behavior. We begin with this issue's use case:

127.0.0.1:22511   [0s,ok,10.5.10-MariaDB-log,rw,MIXED,>>,semi:master]
- 127.0.0.1:22512 [null,nonreplicating,10.5.10-MariaDB-log,rw,MIXED,>>,GTID,semi:replica]
- 127.0.0.1:22513 [null,nonreplicating,10.5.10-MariaDB-log,rw,MIXED,>>,GTID,semi:replica]

22511 is primary; both replicas have IO_THREAD running and SQL_THREAD stopped. There's an INSERT in both replicas' binary logs that isn't applied because of the stopped SQL_THREAD.

I kill the primary, orchestrator attempts a failover, but because it's configured with DelayMasterPromotionIfSQLThreadNotUpToDate: true it waits and waits and eventually times out without promoting the new primary. This is the desired behavior. We end up with:

127.0.0.1:22511     [unknown,invalid,10.5.10-MariaDB-log,rw,MIXED,>>,semi:master,downtimed]
- 127.0.0.1:22512   [null,nonreplicating,10.5.10-MariaDB-log,rw,MIXED,>>,GTID]
  + 127.0.0.1:22513 [0s,ok,10.5.10-MariaDB-log,rw,MIXED,>>,GTID,semi:replica]

At this time, replica 22512 still has that INSERT in its relay log, and still points to the original coordinates:

Jun 23 11:21:17                Master_Log_File: mysql_sandbox22511-bin.000001
Jun 23 11:21:17            Read_Master_Log_Pos: 3985
Jun 23 11:21:17          Relay_Master_Log_File: mysql_sandbox22511-bin.000001
Jun 23 11:21:17            Exec_Master_Log_Pos: 3784
Jun 23 11:21:17 ---
Jun 23 11:21:17                Master_Log_File: mysql_sandbox22512-bin.000001
Jun 23 11:21:17            Read_Master_Log_Pos: 3768
Jun 23 11:21:17          Relay_Master_Log_File: mysql_sandbox22512-bin.000001
Jun 23 11:21:17            Exec_Master_Log_Pos: 3768

the 22513 replica is now a child of 22512 andit makes sense that coordinates look completely different.

And no the magic trick

  1. I stop orchestrator, taking it out of the picture.
  2. Coordinates remain unchanged on all servers
  3. On replica 22512 I run start slave sql_thread
  4. Boom:
Jun 23 11:22:21                Master_Log_File: mysql_sandbox22511-bin.000001
Jun 23 11:22:21            Read_Master_Log_Pos: 3784
Jun 23 11:22:21          Relay_Master_Log_File: mysql_sandbox22511-bin.000001
Jun 23 11:22:21            Exec_Master_Log_Pos: 3784
Jun 23 11:22:21 ---
Jun 23 11:22:21                Master_Log_File: mysql_sandbox22512-bin.000001
Jun 23 11:22:21            Read_Master_Log_Pos: 3768
Jun 23 11:22:21          Relay_Master_Log_File: mysql_sandbox22512-bin.000001
Jun 23 11:22:21            Exec_Master_Log_Pos: 3768

The value of Read_Master_Log_Pos dropped down to 3768.

Why?

No idea. I consider this a bug in MariaDB, and a loss of data. In fact, I'm not sure how it is at all possible to recover relay log entries on 22512... 😮

So what's next?

I'll present an orchestrator PR that behaves correctly by not promoting the replica. But I have no idea how you, the user/DBA, will take it from there. How would you salvage the data? Please let me know if you find a solution.

@shlomi-noach
Copy link
Collaborator

Solution in #1374. Please try it out.

@shlomi-noach
Copy link
Collaborator

This is the reproduction script I used (requires MariaDB, dbdeployer):

cd ~/sandboxes/mariadb10
./stop_all 
cd ~/sandboxes
dbdeployer delete rsandbox_10_5_10
rm -rf ./rsandbox_10_5_10 ./mariadb10
dbdeployer deploy replication 10.5.10 --skip-report-host --my-cnf-options log_bin --my-cnf-options log_slave_updates --my-cnf-options "report_host=127.0.0.1"
ln -s rsandbox_10_5_10 mariadb10
cd mariadb10
./m -uroot -e "grant all on *.* to 'ci'@'%' identified by 'ci'"
./m test -e "set global rpl_semi_sync_master_enabled=1; set global rpl_semi_sync_master_timeout=10000000"
./s1 test -e "set global rpl_semi_sync_slave_enabled=1; stop slave; start slave"
./s2 test -e "set global rpl_semi_sync_slave_enabled=1; stop slave; start slave"
./s1 test -e "stop slave; change master to MASTER_USE_GTID=current_pos; start slave;"
./s2 test -e "stop slave; change master to MASTER_USE_GTID=current_pos; start slave;"
./m test -e "drop table if exists t; create table t(id bigint auto_increment primary key, val int not null)"
./m test -e "insert into t values(null,7)"
sleep 1
./use_all "select * from test.t"
./s1 test -e "stop slave sql_thread;"
./s2 test -e "stop slave sql_thread;"
./m test -e "insert into t values(null,11)"
sleep 1
./use_all "select * from test.t"
./s1 -e "show slave status\G"

./master/stop 

@mohankara
Copy link
Author

Thank you so much @shlomi-noach. Lot to consume and great work. I will try out the solution in #1374. Way I understand is, with this fix, the orchestrator will stop promoting a replica with lag. Is that right ?

@shlomi-noach
Copy link
Collaborator

orchestrator will stop promoting a replica with lag. Is that right ?

Correct, as per #1363 (comment), leading to

127.0.0.1:22511     [unknown,invalid,10.5.10-MariaDB-log,rw,MIXED,>>,semi:master,downtimed]
- 127.0.0.1:22512   [null,nonreplicating,10.5.10-MariaDB-log,rw,MIXED,>>,GTID]
  + 127.0.0.1:22513 [0s,ok,10.5.10-MariaDB-log,rw,MIXED,>>,GTID,semi:replica]

@mohankara
Copy link
Author

Thnx. Will this fix be available in 3.2.5 or next release ?

@shlomi-noach
Copy link
Collaborator

Will this fix be available in 3.2.5 or next release ?

Let's first test it?

@mohankara
Copy link
Author

@shlomi-noach - I opened a case with MariaDB support on why data gets lost (if we take orchestrator out of the equation). Here's their reply. Is this possible ?
Mohan,
One thing I may have to understand is how your orchestrator promotes slave to master even there there is still slave lag. In a perfect world, if orchestrator detects master's failure,
it should run CHANGE MASTER TO RELAY_LOG_FILE='slave-relay-bin.xxx', RELAY_LOG_POS=xxx; until relay log on slave should be fully processed and then should promote that slave to master. This is somewhat explained here - https://mariadb.com/kb/en/change-master-to/#relay-log-options.
I hope it would be helpful to you.

@mohankara
Copy link
Author

@shlomi-noach - I tried to clone the branch, compile and start orchestrator, but wouldn't come up.
Would it be possible for you to provide a binary file that I can use ?

15:07 $ ./script/build
go version go1.15.6 darwin/amd64 found in : Go Binary: /usr/local/bin/go
++ rm -rf .gopath
++ mkdir -p .gopath/src/github.com/openark
++ ln -s /Users/mohan.karapundi/repo/orchestrator .gopath/src/github.com/openark/orchestrator
++ export GOPATH=/Users/mohan.karapundi/repo/orchestrator/.gopath:/Users/mohan.karapundi/repo/orchestrator/.vendor
++ GOPATH=/Users/mohan.karapundi/repo/orchestrator/.gopath:/Users/mohan.karapundi/repo/orchestrator/.vendor

  • mkdir -p bin
  • bindir=/Users/mohan.karapundi/repo/orchestrator/bin
  • scriptdir=/Users/mohan.karapundi/repo/orchestrator/script
    ++ git rev-parse HEAD
  • version=b381c69896604a719ee72563d4d2f81015778ecb
    ++ git describe --tags --always --dirty
  • describe=v3.2.5-7-gb381c698
  • export GOPATH=/Users/mohan.karapundi/repo/orchestrator/.gopath
    // Scenarios where we might cancel the promotion.
  • GOPATH=/Users/mohan.karapundi/repo/orchestrator/.gopath
  • cd .gopath/src/github.com/openark/orchestrator
  • go build -i -o /Users/mohan.karapundi/repo/orchestrator/bin/orchestrator -ldflags '-X main.AppVersion=b381c69896604a719ee72563d4d2f81015778ecb -X main.BuildDescribe=v3.2.5-7-gb381c698' ./go/cmd/orchestrator/main.go

github.com/openark/orchestrator/go/logic

go/logic/topology_recovery.go:883:4: caughtUpReplica declared but not used

--To fix the error, I had to replace this line in go/logic/topology_recovery.go
From
caughtUpReplica, err := inst.WaitForSQLThreadUpToDate(&promotedReplica.Key, 0, 0)
To
_, err := inst.WaitForSQLThreadUpToDate(&promotedReplica.Key, 0, 0)

It then generated orchestrator binary file. I copied this to my server and restarted orchestrator.
But orchestrator wouldn't come up. Here's the error in orchestrator log file.

/bin/bash: /usr/local/orchestrator/orchestrator: cannot execute binary file

@shlomi-noach
Copy link
Collaborator

Apologies, the build was broken and I didn't even notice. It's now fixed, and here's the binary: https://github.com/openark/orchestrator/suites/3073155843/artifacts/70038639

@shlomi-noach
Copy link
Collaborator

In a perfect world, if orchestrator detects master's failure,
it should run CHANGE MASTER TO RELAY_LOG_FILE='slave-relay-bin.xxx', RELAY_LOG_POS=xxx; until relay log on slave should be fully processed and then should promote that slave to master. This is somewhat explained here - https://mariadb.com/kb/en/change-master-to/#relay-log-options.

I confess this is confusing to me. The replica is already at those relay log coordinates. I'm not sure if the intention is that we re-apply those same coordinates via change master to? The documentation is a bit fuzzy, too.

FWIW in #1374 and as illustrated, orchestrator does not promote the replica as primary, and that is the intended behavior. therefore the comment might be irrelevant? In the scenario I've presented the failover is prevented (this is expected behavior), and then the replica is still unable to process its relay logs.
I can look into this again next week.

@mohankara
Copy link
Author

Apologies, the build was broken and I didn't even notice. It's now fixed, and here's the binary: https://github.com/openark/orchestrator/suites/3073155843/artifacts/70038639

Thank you Sir. I appreciate it. I will give it a shot and let you know.

@mohankara
Copy link
Author

In a perfect world, if orchestrator detects master's failure,
it should run CHANGE MASTER TO RELAY_LOG_FILE='slave-relay-bin.xxx', RELAY_LOG_POS=xxx; until relay log on slave should be fully processed and then should promote that slave to master. This is somewhat explained here - https://mariadb.com/kb/en/change-master-to/#relay-log-options.

I confess this is confusing to me. The replica is already at those relay log coordinates. I'm not sure if the intention is that we re-apply those same coordinates via change master to? The documentation is a bit fuzzy, too.

FWIW in #1374 and as illustrated, orchestrator does not promote the replica as primary, and that is the intended behavior. therefore the comment might be irrelevant? In the scenario I've presented the failover is prevented (this is expected behavior), and then the replica is still unable to process its relay logs.
I can look into this again next week.

Didn't make sense to me either when I read it. What I find interesting is that, even if you take orchestrator out of this equation, the behavior between MySQL and MariaDB differ a lot when the slave thread is restarted. I will do some testing and let MariaDB support know. What would be ideal is... Primary goes down, orchestrator waits for one of the replicas to be caught up then promote.

@shlomi-noach
Copy link
Collaborator

So, I tried to reproduce without any orchestrator involvement whatsoever:

  • one primary, two semi sync replicas
  • insert on primary
  • stop sql thread on replicas
  • insert on primary
  • stop primary

from this point, I ran start slave sql_thread on the replicas -- and they caught up just fine. So it does seem like orchestrator does something here, though I'm not sure what it is: it actually does not issue a change master on the designated replica. And it does not stop the sql thread on that replica. But again, smething happens that causes that replica to then misbehave upon start slave sql_thread.

@mohankara
Copy link
Author

Looks like GLIBC is required by orchestrator. I will try to download and see if I can get past this error.
/usr/local/orchestrator/orchestrator: /lib64/libc.so.6: version `GLIBC_2.28' not found (required by /usr/local/orchestrator/orchestrator)

@shlomi-noach
Copy link
Collaborator

shlomi-noach commented Jun 24, 2021

what OS are you running on?

Try ./script/build to build locally.

@mohankara
Copy link
Author

I am on CentOS

@mohankara
Copy link
Author

I was able to build successfully by downloading from the branch recovery-mariadb-delay-replication.
I added DelayMasterPromotionIfSQLThreadNotUpToDate=true to conf. Restarted orchestrator on all 3 nodes.
I tested few times with slave thread being down on both replicas.
Unfortunately, I still see one of the replicas being promoted as master. No change in behavior.
I am not sure if I am missing something here.

@shlomi-noach
Copy link
Collaborator

Logs with --debug please?

@shiweng
Copy link

shiweng commented Jun 25, 2021

This issue happens only with Mariadb. MySQL works fine. I have some cycles for testing if I can be of any assistance to you.

Yes, of course. I used MariaDB 10.5.10, same as reported.

hi,may be I have similar issues,
Orchestrator version: 3.0.14
mysql version: mysql-community-8.0.19
a replication cluster with 1 master and 2 slaves,use pt-heartbeat as slowquery monitor, and I set following parameters:
DelayMasterPromotionIfSQLThreadNotUpToDate: true
FailMasterPromotionIfSQLThreadNotUpToDate: false
FailMasterPromotionOnLagMinutes:0
when both slaves were lag behind master for more than 60 seconds, I issue "shutdown;" on master, and I hope orchestrator will wait for one of slaves sqlthreaduptodate with master, then begin a promotion.however, promotion begin immediately when master gone,so promoted slave will lost some transactions. this is not what we are expected.
now,I have to set parameters on my production environment:
DelayMasterPromotionIfSQLThreadNotUpToDate: false
FailMasterPromotionIfSQLThreadNotUpToDate: true
FailMasterPromotionOnLagMinutes:0
with these settings, promotion will fail when all slaves are lag behind with master.
will you do me a favor,thanks.
by the way, I use sysbench to simulate pressure on master,and I set "set global binlog_group_commit_sync_delay=20000;" on slaves to simulate lags.

I couldn't simulate this issue (replica getting promoted as master while there is lag) on MySQL 5.7. But I was using orchestrator 3.2.4. I know there are fixes related to this issue in 3.2.4. Try that version or 3.2.5 and see. Hope that works for you.

thanks,mohankara! I tested successfully with orchestrator 3.2.4

@mohankara
Copy link
Author

mohankara commented Jun 25, 2021

Attaching orchestrator log file from the raft leader node.
orchestrator.log.zip
From other 2 nodes as well
orchestrator_node1.log
orchestrator_node2.log

@shlomi-noach
Copy link
Collaborator

I just added a little bit more audit messages. Can you please run again? On my end the behavior is correct.
Looking at your logs, it appears like orchestrator thinks the SQL thread is up-to-date. But we'll know better with the extra logging.

@mohankara
Copy link
Author

Attached are the zip files from orchestrator logs.
I hope I am picking the right code here,
[root@node1 orchestrator]# ./orchestrator --version
5347e58

orch_logs_0627.zip

@shlomi-noach
Copy link
Collaborator

shlomi-noach commented Jun 28, 2021

Your binary is not the right version. It should be b381c69896604a719ee72563d4d2f81015778ecb (latest commit).

I assume your CentOS is CentOS 6.X? (which has an older glibc). How were you able to use the orchestrator binaries in the first place? I've long since stopped shipping orchestrator binaries for CentOS 6.

To build, please follow these steps:

git clone git@github.com:openark/orchestrator.git
cd orchestrator
git fetch origin recovery-mariadb-delay-replication
git checkout recovery-mariadb-delay-replication
./script/build
ls -l bin/orchestrator # this is the binary

@mohankara
Copy link
Author

I am running CentOS 7.9. I will rebuild using the steps you have provided me and let you know. Hope this is it ;)

@mohankara
Copy link
Author

ok, tried rebuilding again. The version is 0308d... which I think is the latest with the audit changes you made. Let me test and see.
[root@node1 orchestrator]# ./orchestrator --version
0308d61

@mohankara
Copy link
Author

I tested few times today. I can see the orchestrator is NOT promoting replicas with lag (or when SQL_THREAD is stopped). That's a good sign.
However I noticed when Node 1 is master and down, I noticed Node 2 is pointing at Node 1 as master (correct) but Node 3 is pointing as Node 2 as master. I think you also highlighted/noticed the same in your test.
Is that normal or can something be done to change it in this situation ? But node 2 or node3 is a master which is right.
As soon as I promoted Node 2 as master, Node 3 became a replica of Node 2.

To make Node 2 as master, I did these steps. There was no data loss of any kind.

  1. On Node 2 -
  2. STOP SLAVE SQL_THREAD;
    CHANGE MASTER TO
    RELAY_LOG_FILE='<relay-bin.000002>',
    RELAY_LOG_POS=;
    START SLAVE SQL_THREAD;
    Once Node 2 slave is caught up, stop slave; Promote Node 2 as master.
    Then make Node 3 as a replica of Node 2.
    Somehow if the above logic can be incorporated in to orchestrator, that would be superb !!!
    I will run some more tests and let you know.
    THANK YOU SO MUCH FOR YOUR HELP !!!

@shlomi-noach
Copy link
Collaborator

Is that normal or can something be done to change it in this situation

That's the normal and expected behavior, and is unlikely to change in the near future. orchestrator prepares the topology for promotion, picks the best candidate, and just before promotion validates that the candidate's state still complies with the constraints. It may be possible in the future to push down some conditions down the topology.

I'm still surprised about the need to repoint the replica onto the same relay log position, just to prevent it from purging relay logs. I'll await your tests.

@mohankara
Copy link
Author

mohankara commented Jun 29, 2021

I'm still surprised about the need to repoint the replica onto the same relay log position, just to prevent it from purging relay logs. I'll await your tests.

Yeah me too. In a normal scenario when I stop slave SQL_THREAD and start SQL_THREAD, I don't lose any data, it starts from where it left off. This happens only when CHANGE MASTER command is executed and change master as per the doc, purges the relay bin logs as mentioned here https://mariadb.com/kb/en/change-master-to/#relay-log-options.
This behavior is different in MySQL.

I will run tests and let you know.

@mohankara
Copy link
Author

Ran few more tests, seems fine to me. Didn't see orchestrator promoting a replica with lag. I will continue to run more tests and let you know if I see anything different.
Will this be rolled into 3.2.6 ? When is that planned ? Please let me know.

@shlomi-noach
Copy link
Collaborator

Will this be rolled into 3.2.6 ? When is that planned ? Please let me know.

It will be rolled out into the next release; I don't work with strict schedule, to be honest, I just release when it makes sense or when users remind me I'm late (very shameful).

I'm a bit dissatisfied that I don't have CI coverage for MariaDB, I'd like to have this fix tested & validated. But I don't have the time right now to invest in that path.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants