Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProxySQL 2.0.2-1-g533442f4 'Incorrect number of fields, please report a bug' in 'monitor_galera_thread' #1978

Closed
Lt-Flash opened this issue Mar 25, 2019 · 23 comments
Assignees

Comments

@Lt-Flash
Copy link

Lt-Flash commented Mar 25, 2019

Hi,
We're using latest ProxySQL in our production database, monitoring three Galera servers and having one of them as a writer, and three of them as readers (no read_only flag is set on any of the servers). From time to time we're getting a message in our application saying 'MySQL server has gone away'. At the same time in logs I can see that for some reason ProxySQL is closing all connections to active server and chooses another one as a master, then returning usual master back after next monitor cycle. I'm not sure how to reproduce this issue, but here are the logs.

Here's a definition of two groups of servers we're using. Primary cluster is hostgroups 10 and 20, that's the one we're having problems with. MySQL servers are physical ones running MariaDB with Galera Cluster based on Ubuntu Server 18.10.

MySQL [(none)]> select * from mysql_servers;
+--------------+--------------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname     | port | gtid_port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+--------------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 1            | 10.22.20.200 | 3306 | 0         | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 1            | 10.22.20.203 | 3306 | 0         | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 1            | 10.22.20.211 | 3306 | 0         | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 10           | 10.22.20.141 | 3306 | 0         | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 10           | 10.22.20.142 | 3306 | 0         | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 20           | 10.22.20.143 | 3306 | 0         | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+--------------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
6 rows in set (0.00 sec)

Here's a normal working state:

MySQL [(none)]> select hostgroup, srv_host, status, ConnUsed, ConnOK, ConnFree, MaxConnUsed, Queries from stats.stats_mysql_connection_pool order by hostgroup,srv_host;
+-----------+--------------+--------------+----------+--------+----------+-------------+----------+
| hostgroup | srv_host     | status       | ConnUsed | ConnOK | ConnFree | MaxConnUsed | Queries  |
+-----------+--------------+--------------+----------+--------+----------+-------------+----------+
| 0         | 10.22.20.211 | ONLINE       | 0        | 17     | 17       | 10          | 59084    |
| 1         | 10.22.20.200 | ONLINE       | 0        | 5      | 5        | 4           | 14001    |
| 1         | 10.22.20.203 | ONLINE       | 0        | 4      | 4        | 3           | 13879    |
| 1         | 10.22.20.211 | ONLINE       | 0        | 4      | 4        | 3           | 13898    |
| 3         | 10.22.20.200 | ONLINE       | 0        | 0      | 0        | 0           | 0        |
| 3         | 10.22.20.203 | ONLINE       | 0        | 0      | 0        | 0           | 0        |
| 10        | 10.22.20.141 | ONLINE       | 0        | 28     | 28       | 17          | 1005250  |
| 10        | 10.22.20.142 | ONLINE       | 0        | 72     | 61       | 44          | 10912974 |
| 10        | 10.22.20.143 | ONLINE       | 0        | 18     | 18       | 11          | 196272   |
| 20        | 10.22.20.143 | ONLINE       | 9        | 31     | 22       | 19          | 374549   |
| 21        | 10.22.20.141 | ONLINE       | 0        | 0      | 0        | 0           | 0        |
| 21        | 10.22.20.142 | ONLINE       | 0        | 0      | 0        | 0           | 0        |
| 40        | 10.22.20.141 | OFFLINE_HARD | 0        | 0      | 0        | 0           | 0        |
| 40        | 10.22.20.143 | OFFLINE_HARD | 0        | 0      | 0        | 0           | 0        |
+-----------+--------------+--------------+----------+--------+----------+-------------+----------+
14 rows in set (0.01 sec)

Here's definition of Galera Hostrgoups:

MySQL [(none)]> select * from mysql_galera_hostgroups;
+------------------+-------------------------+------------------+-------------------+--------+-------------+-----------------------+-------------------------+-------------+
| writer_hostgroup | backup_writer_hostgroup | reader_hostgroup | offline_hostgroup | active | max_writers | writer_is_also_reader | max_transactions_behind | comment     |
+------------------+-------------------------+------------------+-------------------+--------+-------------+-----------------------+-------------------------+-------------+
| 0                | 3                       | 1                | 4                 | 1      | 1           | 1                     | 100                     | Development |
| 20               | 21                      | 10               | 40                | 1      | 1           | 1                     | 100                     | Production  |
+------------------+-------------------------+------------------+-------------------+--------+-------------+-----------------------+-------------------------+-------------+
2 rows in set (0.00 sec)

errors.log

``

@Lt-Flash
Copy link
Author

Lt-Flash commented Mar 25, 2019

One more thing, logs are actually containing this at the very beginning of the issue:

2019-03-25 12:20:07 MySQL_Session.cpp:3278:handler(): [WARNING] Error during query on (20,10.22.20.143,3306): 1213, Deadlock found when trying to get lock; try restarting transaction                                                        
2019-03-25 12:23:16 [INFO] Galera: max_writers=1 , moving 1 nodes from backup HG 21 to writer HG 20                                                                                                                                           
2019-03-25 12:23:16 MySQL_HostGroups_Manager.cpp:3892:update_galera_set_offline(): [WARNING] Galera: setting host 10.22.20.143:3306 offline because: primary_partition=NO

And after this ProxySQL start to move 143 and 142 back and forth to writer role, dropping all connections!

2019-03-25 12:23:16 MySQL_HostGroups_Manager.cpp:1220:commit(): [WARNING] Removed server at address 140374151775392, hostgroup 21, address 10.22.20.142 port 3306. Setting status OFFLINE HARD and immediately dropping all free connections. 
2019-03-25 12:23:16 MySQL_HostGroups_Manager.cpp:1220:commit(): [WARNING] Removed server at address 140374175000544, hostgroup 20, address 10.22.20.143 port 3306. Setting status OFFLINE HARD and immediately dropping all free connections. 
2019-03-25 12:23:16 MySQL_HostGroups_Manager.cpp:1220:commit(): [WARNING] Removed server at address 140374151775072, hostgroup 10, address 10.22.20.143 port 3306. Setting status OFFLINE HARD and immediately dropping all free connections.
2019-03-25 12:23:16 MySQL_HostGroups_Manager.cpp:4166:update_galera_set_writer(): [WARNING] Galera: setting host 10.22.20.142:3306 as writer                                                                                                  
2019-03-25 12:23:16 [INFO] Dumping current MySQL Servers structures for hostgroup ALL

Why is it doing so? If deadlock found during Galera commit - application should just repeat the failed transaction, but it looks like this deadlock triggers ProxySQL to start moving backup writers into primary writer position and then demote it back, which shouldn't be happening.

@mrfaiz
Copy link

mrfaiz commented Mar 25, 2019

I also facing the same problem 'MySQL server has gone away'

@renecannao
Copy link
Contributor

@Lt-Flash : can you please provide more of the error log? Several hours before the incident, if possible.

What I can tell you already is:

  • the problem is not related to the deadlock
  • for some reason, the connection used by proxysql's Monitor is in a weird state, this causes Incorrect number of fields, please report a bug
  • because the Monitor's connection is in a weird state, it fails to know the right status of the backend, and it assumes it is unhealthy

This is clearly a bug in ProxySQL's Monitor module, because it seems it doesn't perform a correct error handling for failed requests. I am already working on implementing a better error handling.
The reason why I am asking more of the error log is to try to determine what caused the Monitor's connection to end in a bad state.

@Lt-Flash
Copy link
Author

Hi @renecannao ,
Here's a large log, there were several ProxySQL restarts in it but anyway - it contains all the information. Hope this helps.

I can also remove group 1 - test MySQL Galera cluster, if that would help, as that would remove checks of that cluster.

proxysql.zip

@Lt-Flash
Copy link
Author

Hi,
Is there any update on this issue? We had to move all the services back to MaxScale because of this error, unfortunately. But I can do some tests if required, just let me know what needs to be done. Thanks.

@kubicgruenfeld
Copy link

We are facing the same issue:

2019-04-01 14:59:19 MySQL_Monitor.cpp:1243:monitor_galera_thread(): [ERROR] Incorrect number of fields, please report a bug
2019-04-01 14:59:24 MySQL_Monitor.cpp:1243:monitor_galera_thread(): [ERROR] Incorrect number of fields, please report a bug
2019-04-01 14:59:34 MySQL_Monitor.cpp:1243:monitor_galera_thread(): [ERROR] Incorrect number of fields, please report a bug
2019-04-01 14:59:39 MySQL_Monitor.cpp:1243:monitor_galera_thread(): [ERROR] Incorrect number of fields, please report a bug
2019-04-01 14:59:49 MySQL_Monitor.cpp:1243:monitor_galera_thread(): [ERROR] Incorrect number of fields, please report a bug
2019-04-01 14:59:59 MySQL_Monitor.cpp:1243:monitor_galera_thread(): [ERROR] Incorrect number of fields, please report a bug
2019-04-01 15:00:04 MySQL_Monitor.cpp:1243:monitor_galera_thread(): [ERROR] Incorrect number of fields, please report a bug
2019-04-01 15:00:14 MySQL_Monitor.cpp:1243:monitor_galera_thread(): [ERROR] Incorrect number of fields, please report a bug
...
2019-04-01 15:00:24 MySQL_HostGroups_Manager.cpp:1220:commit(): [WARNING] Removed server at address 139759328520352, hostgroup 0, address 10.1.2.3 port 3306. Setting status OFFLINE HARD and immediately dropping all free connections. Used connections will be dropped when trying to use them

@mrszop
Copy link

mrszop commented Apr 2, 2019

+1

@sherlock04
Copy link

I also encountered the same problem. At first, I thought it was network delay, but after increasing the parameters later, it was still not solved. I hope who can give the solution.

@sherlock04
Copy link

One more thing, logs are actually containing this at the very beginning of the issue:

2019-03-25 12:20:07 MySQL_Session.cpp:3278:handler(): [WARNING] Error during query on (20,10.22.20.143,3306): 1213, Deadlock found when trying to get lock; try restarting transaction                                                        
2019-03-25 12:23:16 [INFO] Galera: max_writers=1 , moving 1 nodes from backup HG 21 to writer HG 20                                                                                                                                           
2019-03-25 12:23:16 MySQL_HostGroups_Manager.cpp:3892:update_galera_set_offline(): [WARNING] Galera: setting host 10.22.20.143:3306 offline because: primary_partition=NO

And after this ProxySQL start to move 143 and 142 back and forth to writer role, dropping all connections!

2019-03-25 12:23:16 MySQL_HostGroups_Manager.cpp:1220:commit(): [WARNING] Removed server at address 140374151775392, hostgroup 21, address 10.22.20.142 port 3306. Setting status OFFLINE HARD and immediately dropping all free connections. 
2019-03-25 12:23:16 MySQL_HostGroups_Manager.cpp:1220:commit(): [WARNING] Removed server at address 140374175000544, hostgroup 20, address 10.22.20.143 port 3306. Setting status OFFLINE HARD and immediately dropping all free connections. 
2019-03-25 12:23:16 MySQL_HostGroups_Manager.cpp:1220:commit(): [WARNING] Removed server at address 140374151775072, hostgroup 10, address 10.22.20.143 port 3306. Setting status OFFLINE HARD and immediately dropping all free connections.
2019-03-25 12:23:16 MySQL_HostGroups_Manager.cpp:4166:update_galera_set_writer(): [WARNING] Galera: setting host 10.22.20.142:3306 as writer                                                                                                  
2019-03-25 12:23:16 [INFO] Dumping current MySQL Servers structures for hostgroup ALL

Why is it doing so? If deadlock found during Galera commit - application should just repeat the failed transaction, but it looks like this deadlock triggers ProxySQL to start moving backup writers into primary writer position and then demote it back, which shouldn't be happening.

Hello,I did a performance manometry on proxysql, and the scene will be repeated soon.
2019-04-10 22:51:58 MySQL_HostGroups_Manager.cpp:3892:update_galera_set_offline(): [WARNING] Galera: setting host xxx offline because: primary_partition=NO

It is certain that proxysql made a mistake in judging the state of Galera nodes

@renecannao
Copy link
Contributor

@sherlock04 , please note my previous comment #1978 (comment)

because the Monitor's connection is in a weird state, it fails to know the right status of the backend, and it assumes it is unhealthy

I am working in understanding why this happens and how to prevent it, but at the same time I am trying to implement a workaround (tracked in #1994).

@sherlock04
Copy link

@sherlock04 , please note my previous comment #1978 (comment)

because the Monitor's connection is in a weird state, it fails to know the right status of the backend, and it assumes it is unhealthy

I am working in understanding why this happens and how to prevent it, but at the same time I am trying to implement a workaround (tracked in #1994).

Can I avoid this problem by upgrading to 2.0.4?

@renecannao
Copy link
Contributor

Can I avoid this problem by upgrading to 2.0.4?

Unfortunately, not yet

@sherlock04
Copy link

Can I avoid this problem by upgrading to 2.0.4?

Unfortunately, not yet

I also have a question about why I always make the following mistakes when I do manometry useing sysbench.
2019-04-11 11:13:34 MySQL_Session.cpp:2148:handler_again___status_CHANGING_CHARSET(): [ERROR] Detected a broken connection during SET NAMES on 100.119.150.69 , 3308 : 2019, Can't initialize character set (null) (path: compiled_in)
Can't initialize character set (null) (path: compiled_in)
Can't initialize character set (null) (path: compiled_in)
2019-04-11 11:13:58 MySQL_Session.cpp:2148:handler_again___status_CHANGING_CHARSET(): [ERROR] Detected a broken connection during SET NAMES on 100.119.150.69 , 3308 : 2019, Can't initialize character set (null) (path: compiled_in)

@sherlock04
Copy link

Can I avoid this problem by upgrading to 2.0.4?

Unfortunately, not yet

My Proxy SQL has changed its status more than 30,000 times in the past three days. How long will it take to fix this bug?

MySQL [(none)]> select * from runtime_checksums_values;
+-------------------+---------+------------+--------------------+
| name              | version | epoch      | checksum           |
+-------------------+---------+------------+--------------------+
| admin_variables   | 0       | 0          |                    |
| mysql_query_rules | 1       | 1556605873 | 0x0000000000000000 |
| mysql_servers     | 29067   | 1557285794 | 0x2130FCF6BA9F77F0 |
| mysql_users       | 2       | 1556607077 | 0xEF8800C8C26DD313 |
| mysql_variables   | 0       | 0          |                    |
| proxysql_servers  | 1       | 1556605873 | 0x0000000000000000 |
+-------------------+---------+------------+--------------------+

@Lt-Flash
Copy link
Author

Lt-Flash commented May 8, 2019

We had to switch to MaxScale, no way we can use ProxySQL with this bug. MaxScale doesn't allow to route based on query, but our product is using prepared statements anyway that are not supported by ProxySQL, so MaxScale works just fine for us.

@MortezaBashsiz
Copy link

Hi all specially dear @Lt-Flash
First of all sorry for my poor English
I had the same problem about "Mysql has gone away" with haproxy in front of galera servers .
I had a lot of researches with no result and any of the answers did not resolve my problem .
Using proxysql instead of haproxy does not resolve my problem .
But when i changed my db backend from Galera to Master-Slave mysql the problem does not happened for me any more .
Haproxy with mysql Master-Slave was OK
Proxysql with mysql Master-Slave was OK
Even i use DNS health check and load balancing instead of haproxy and proxysql with galera and the result was amazing , every thing was OK
Finally i found out that this is not completely related to haproxy or proxysql or galera
Maybe it was related to my client behavior or the ODBC driver
The most important note is that the error "mysql has gone away" did not happened in my new clients with Centos 7 and mysql-connector-odbc-5.2.5-8 , unixODBC-2.3.1-11.el7.x86_64 , every thing works fine with galera and both proxysql and haproxy .
and the error just happened on Centos 6 with mysql-connector-odbc-5.1.5r1144-7.el6.x86_64 , unixODBC-2.2.14-14.el6.x86_64
please pay attention to mysql connection timeout too
That was my experience maybe it helps you to make the best choice .

@sherlock04
Copy link

Hi all specially dear @Lt-Flash
First of all sorry for my poor English
I had the same problem about "Mysql has gone away" with haproxy in front of galera servers .
I had a lot of researches with no result and any of the answers did not resolve my problem .
Using proxysql instead of haproxy does not resolve my problem .
But when i changed my db backend from Galera to Master-Slave mysql the problem does not happened for me any more .
Haproxy with mysql Master-Slave was OK
Proxysql with mysql Master-Slave was OK
Even i use DNS health check and load balancing instead of haproxy and proxysql with galera and the result was amazing , every thing was OK
Finally i found out that this is not completely related to haproxy or proxysql or galera
Maybe it was related to my client behavior or the ODBC driver
The most important note is that the error "mysql has gone away" did not happened in my new clients with Centos 7 and mysql-connector-odbc-5.2.5-8 , unixODBC-2.3.1-11.el7.x86_64 , every thing works fine with galera and both proxysql and haproxy .
and the error just happened on Centos 6 with mysql-connector-odbc-5.1.5r1144-7.el6.x86_64 , unixODBC-2.2.14-14.el6.x86_64
please pay attention to mysql connection timeout too
That was my experience maybe it helps you to make the best choice .

I can't change my back-end galera cluster to master-slave. After all, Galera is already in use and performs very well

Indeed, we use maxsacle, but because of routing problems, we feel we need proxysql@Lt-Flash

@Lt-Flash
Copy link
Author

Lt-Flash commented May 8, 2019

Hi all specially dear @Lt-Flash
First of all sorry for my poor English
I had the same problem about "Mysql has gone away" with haproxy in front of galera servers .
I had a lot of researches with no result and any of the answers did not resolve my problem .
Using proxysql instead of haproxy does not resolve my problem .
But when i changed my db backend from Galera to Master-Slave mysql the problem does not happened for me any more .
Haproxy with mysql Master-Slave was OK
Proxysql with mysql Master-Slave was OK
Even i use DNS health check and load balancing instead of haproxy and proxysql with galera and the result was amazing , every thing was OK
Finally i found out that this is not completely related to haproxy or proxysql or galera
Maybe it was related to my client behavior or the ODBC driver
The most important note is that the error "mysql has gone away" did not happened in my new clients with Centos 7 and mysql-connector-odbc-5.2.5-8 , unixODBC-2.3.1-11.el7.x86_64 , every thing works fine with galera and both proxysql and haproxy .
and the error just happened on Centos 6 with mysql-connector-odbc-5.1.5r1144-7.el6.x86_64 , unixODBC-2.2.14-14.el6.x86_64
please pay attention to mysql connection timeout too
That was my experience maybe it helps you to make the best choice .

Thanks for your reply, but for us it's critical to have Galera monitoring, because Master may fail at any time and in dedicated master-slave mode you would need to switch over to new Master manually, unless you're doing Galera monitoring. I've increased all timeout parameters, but that didn't help. The problem we're facing is that Galer monitor suddenly reports 'Incorrect number of fields' and dropping all connections, 'MySQL server has gone away' seen in application logs - is the consequence of that, not the reason. Right now on MaxScale we don't have any issues with our DB/Application.

@renecannao
Copy link
Contributor

renecannao commented May 8, 2019

For your information, ProxySQL 2.0.4 is now released, and this bug is solved.
You are kindly recommended to try it out. Thanks
Closing.

@renecannao
Copy link
Contributor

s/not/now/

@sherlock04
Copy link

For your information, ProxySQL 2.0.4 is now released, and this bug is solved.
You are kindly recommended to try it out. Thanks
Closing.

Thank you very much. I have upgraded all proxysql to Proxy SQL version 2.0.4-116-g7d371cf. The basic environment used makes CentOS 7.2. If there are no problems in the future, it gives me great confidence to use proxysql. I will also introduce the surrounding teams to use proxysql and make sincere suggestions.

@sherlock04
Copy link

We had to switch to MaxScale, no way we can use ProxySQL with this bug. MaxScale doesn't allow to route based on query, but our product is using prepared statements anyway that are not supported by ProxySQL, so MaxScale works just fine for us.

Hello, do you have the problem of switching nodes frequently when you use maxscale?

2019-05-13 21:43:29   notice : Server changed state: dbserv2[10.231.4.180:3306]: new_master. [Running] -> [Master, Synced, Running]
2019-05-13 21:43:29   notice : Server changed state: dbserv3[10.231.5.52:3306]: new_slave. [Master, Synced, Running] -> [Slave, Synced, Running]
2019-05-13 21:53:19   notice : Server changed state: dbserv1[10.231.3.244:3306]: lost_slave. [Slave, Synced, Running] -> [Running]
2019-05-13 21:53:29   notice : Server changed state: dbserv1[10.231.3.244:3306]: new_slave. [Running] -> [Slave, Synced, Running]
2019-05-13 21:54:34   notice : Server changed state: dbserv1[10.231.3.244:3306]: lost_slave. [Slave, Synced, Running] -> [Running]
2019-05-13 21:54:44   notice : Server changed state: dbserv1[10.231.3.244:3306]: new_slave. [Running] -> [Slave, Synced, Running]
2019-05-13 22:26:15   notice : Server changed state: dbserv1[10.231.3.244:3306]: lost_slave. [Slave, Synced, Running] -> [Running]
2019-05-13 22:26:15   notice : Server changed state: dbserv2[10.231.4.180:3306]: master_down. [Master, Synced, Running] -> [Down]
2019-05-13 22:26:15   notice : Server changed state: dbserv3[10.231.5.52:3306]: new_master. [Slave, Synced, Running] -> [Master, Synced, Running]
2019-05-13 22:26:25   notice : Server changed state: dbserv1[10.231.3.244:3306]: new_slave. [Running] -> [Slave, Synced, Running]
2019-05-13 22:26:25   notice : Server changed state: dbserv2[10.231.4.180:3306]: master_up. [Down] -> [Master, Synced, Running]
2019-05-13 22:26:25   notice : Server changed state: dbserv3[10.231.5.52:3306]: new_slave. [Master, Synced, Running] -> [Slave, Synced, Running]
2019-05-13 22:35:21   notice : Server changed state: dbserv2[10.231.4.180:3306]: lost_master. [Master, Synced, Running] -> [Running]
2019-05-13 22:35:21   notice : Server changed state: dbserv3[10.231.5.52:3306]: new_master. [Slave, Synced, Running] -> [Master, Synced, Running]
2019-05-13 22:35:31   notice : Server changed state: dbserv2[10.231.4.180:3306]: new_master. [Running] -> [Master, Synced, Running]
2019-05-13 22:35:31   notice : Server changed state: dbserv3[10.231.5.52:3306]: new_slave. [Master, Synced, Running] -> [Slave, Synced, Running]
2019-05-13 22:50:11   error  : There are no cluster members
2019-05-13 22:50:11   notice : Server changed state: dbserv1[10.231.3.244:3306]: slave_down. [Slave, Synced, Running] -> [Down]
2019-05-13 22:50:11   notice : Server changed state: dbserv2[10.231.4.180:3306]: master_down. [Master, Synced, Running] -> [Down]
2019-05-13 22:50:11   notice : Server changed state: dbserv3[10.231.5.52:3306]: lost_slave. [Slave, Synced, Running] -> [Running]
2019-05-13 22:50:25   notice : Found cluster members
2019-05-13 22:50:25   notice : Server changed state: dbserv2[10.231.4.180:3306]: master_up. [Down] -> [Master, Synced, Running]
2019-05-13 22:50:25   notice : Server changed state: dbserv3[10.231.5.52:3306]: new_slave. [Running] -> [Slave, Synced, Running]

@Lt-Flash
Copy link
Author

We had to switch to MaxScale, no way we can use ProxySQL with this bug. MaxScale doesn't allow to route based on query, but our product is using prepared statements anyway that are not supported by ProxySQL, so MaxScale works just fine for us.

Hello, do you have the problem of switching nodes frequently when you use maxscale?

2019-05-13 21:43:29   notice : Server changed state: dbserv2[10.231.4.180:3306]: new_master. [Running] -> [Master, Synced, Running]
2019-05-13 21:43:29   notice : Server changed state: dbserv3[10.231.5.52:3306]: new_slave. [Master, Synced, Running] -> [Slave, Synced, Running]
2019-05-13 21:53:19   notice : Server changed state: dbserv1[10.231.3.244:3306]: lost_slave. [Slave, Synced, Running] -> [Running]
2019-05-13 21:53:29   notice : Server changed state: dbserv1[10.231.3.244:3306]: new_slave. [Running] -> [Slave, Synced, Running]
2019-05-13 21:54:34   notice : Server changed state: dbserv1[10.231.3.244:3306]: lost_slave. [Slave, Synced, Running] -> [Running]
2019-05-13 21:54:44   notice : Server changed state: dbserv1[10.231.3.244:3306]: new_slave. [Running] -> [Slave, Synced, Running]
2019-05-13 22:26:15   notice : Server changed state: dbserv1[10.231.3.244:3306]: lost_slave. [Slave, Synced, Running] -> [Running]
2019-05-13 22:26:15   notice : Server changed state: dbserv2[10.231.4.180:3306]: master_down. [Master, Synced, Running] -> [Down]
2019-05-13 22:26:15   notice : Server changed state: dbserv3[10.231.5.52:3306]: new_master. [Slave, Synced, Running] -> [Master, Synced, Running]
2019-05-13 22:26:25   notice : Server changed state: dbserv1[10.231.3.244:3306]: new_slave. [Running] -> [Slave, Synced, Running]
2019-05-13 22:26:25   notice : Server changed state: dbserv2[10.231.4.180:3306]: master_up. [Down] -> [Master, Synced, Running]
2019-05-13 22:26:25   notice : Server changed state: dbserv3[10.231.5.52:3306]: new_slave. [Master, Synced, Running] -> [Slave, Synced, Running]
2019-05-13 22:35:21   notice : Server changed state: dbserv2[10.231.4.180:3306]: lost_master. [Master, Synced, Running] -> [Running]
2019-05-13 22:35:21   notice : Server changed state: dbserv3[10.231.5.52:3306]: new_master. [Slave, Synced, Running] -> [Master, Synced, Running]
2019-05-13 22:35:31   notice : Server changed state: dbserv2[10.231.4.180:3306]: new_master. [Running] -> [Master, Synced, Running]
2019-05-13 22:35:31   notice : Server changed state: dbserv3[10.231.5.52:3306]: new_slave. [Master, Synced, Running] -> [Slave, Synced, Running]
2019-05-13 22:50:11   error  : There are no cluster members
2019-05-13 22:50:11   notice : Server changed state: dbserv1[10.231.3.244:3306]: slave_down. [Slave, Synced, Running] -> [Down]
2019-05-13 22:50:11   notice : Server changed state: dbserv2[10.231.4.180:3306]: master_down. [Master, Synced, Running] -> [Down]
2019-05-13 22:50:11   notice : Server changed state: dbserv3[10.231.5.52:3306]: lost_slave. [Slave, Synced, Running] -> [Running]
2019-05-13 22:50:25   notice : Found cluster members
2019-05-13 22:50:25   notice : Server changed state: dbserv2[10.231.4.180:3306]: master_up. [Down] -> [Master, Synced, Running]
2019-05-13 22:50:25   notice : Server changed state: dbserv3[10.231.5.52:3306]: new_slave. [Running] -> [Slave, Synced, Running]

Hi,
No, we don't have any of these errors, but I'd recommend you to increase timeout values for your service, here's my example:

[RW]
type=service
router=readwritesplit
servers=node1,node2,node3
user=
password=
max_slave_connections=100%
connection_timeout=3000

Also, set galera monitor interval to 1000:

[Galera Monitor]
type=monitor
module=galeramon
servers=node1,node2,node3
user=
password=
monitor_interval=1000
disable_master_failback=true
available_when_donor=true

Hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants