use aws mysql jdbc driver 1.0.0 HikariCP 3.4.5 only write failover time too long than HikariCP 2.7.5 #209

zixubingfeng · 2022-05-20T06:18:08Z

Describe the issue
use aws mysql jdbc driver HikariCP 3.4.5 only write failover time too long than HikariCP 2.7.5.

Driver Version?
1.0.0

Java Version?
java-1.8-openjdk

OS Version?
aws cloud 4.14.209-160.339.amzn2.x86_64

MySQL Version?
5.7.12-log

To Reproduce
use aws mysql jdbc driver 1.0.0 HikariCP 3.4.5，insert data and then aurora execute failover ,check recover time
use aws mysql jdbc driver 1.0.0 HikariCP 2.7.5，insert data and then aurora execute failover ,check recover time
in the end compare two result

Expected behaviour
expected use aws mysql jdbc driver 1.0.0 HikariCP 3.4.5 when happen failover , recover time can be in 30s

Logs
if not recover , when insert data , it will happen an read only exception

karenc-bq · 2022-05-30T17:29:18Z

Hi @zixubingfeng, thank you for raising this issue. We're currently looking into it and will share more information once we know more about what is going on. Thank you for your patience.

abedali · 2022-06-20T20:52:35Z

Hi, Any update on this? with aws jdbc mySQL driver 1.0.0 and Hikari 4.03, it took 3 mins to recover the connection after RDS AZ fail-over.

sergiyvamz · 2022-06-21T22:16:29Z

Hi @abedali

We're still working on reproducing the issue and finding a solution for it. We'd appreciate if you can provide mode details. That would helps us to narrow down our investigation.

Exact description of RDS cluster that you use (cluster type, Single-AZ or Multi-AZ, etc.)
a detailed driver log
if possible, a code sample that reproduces the issue.

Thank you!

sergiyvamz · 2022-07-07T23:45:51Z

Hello @zixubingfeng

There's a fix to the issue is available and a snapshot build is ready. Are you able to confirm if the issue is fixed?
#242
https://github.com/awslabs/aws-mysql-jdbc#using-a-snapshot-of-the-driver

abedali · 2022-07-11T13:48:26Z

@sergiyvamz Thanks for the update. I'll let you know the feedback once I test using 1.1.1 driver.

zixubingfeng · 2022-07-12T10:09:17Z

@abedali I test using aws-mysql-jdbc-1.1.1-20220708.155324-4.jar driver and hikaricp 3.4.5 ，The insertion request still takes 30 minutes to recover。

test code like this:
```
String userId = "08cf149b-64ac-4101-88eb-4227d243564e";
// 多线程调用
Integer threadCount = 50;
ExecutorService executor = Executors.newFixedThreadPool(threadCount);
CountDownLatch countDownLatch = new CountDownLatch(threadCount);
for (int i = 0; i < threadCount; i++) {
executor.submit(() -> {
while (true) {
try {
String userIdNew = "test_" + UUID.randomUUID().toString();
DeviceBindingEntity deviceBindingEntity = new DeviceBindingEntity();
deviceBindingEntity.setUserId(userIdNew);
deviceBindingEntity.setAppName("com.xiaomi.hm.health");
deviceBindingEntity.setDeviceId("123");
deviceBindingEntity.setDeviceType(2);
this.deviceBindingRepository.create(deviceBindingEntity);
logger.info("insert success = " + deviceBindingEntity.getUserId());
} catch (Exception ex) {
logger.error(ex.getCause().getMessage());
}
}
});
}
countDownLatch.await();

abedali · 2022-07-13T13:38:43Z

@sergiyvamz Please refer to an update above.
Any idea when does 1.1.1 version of driver available in maven?

sergiyvamz · 2022-07-13T17:10:07Z

Hello @zixubingfeng

Can you provide driver logs and give more details about your application and libraries/framework that you use?

sergiyvamz · 2022-07-13T17:14:27Z

Hello @abedali

Unfortunately there's no particular release date available at the moment. We continue working on other issues and the driver improvements and they will be available as snapshot builds.

I'll notify you when a new release is available.

zixubingfeng · 2022-07-28T09:40:28Z

"2022-07-28 17:34:34,255 WARN pool-5-thread-45 o.h.e.j.s.SqlExceptionHelper:137 - SQL Error: 1290, SQLState: HY000
2022-07-28 17:34:34,255 ERROR pool-5-thread-45 o.h.e.j.s.SqlExceptionHelper:142 - The MySQL server is running with the --read-only option so it cannot execute this statement "

I use cn region, jdbc:mysql:aws://xxx.cluster-c6w5wba58yqu.rds.cn-northwest-1.amazonaws.com.cn:3306/xxx?autoReconnect=true&failOverReadOnly=false&useUnicode=true&characterEncoding=UTF-8&useSSL=false&clusterInstanceHostPattern=?.c6w5wba58yqu.rds.cn-northwest-1.amazonaws.com.cn

I am not sure , whether “HY000” of the errorcode is the problem or not.

@sergiyvamz @abedali

sergiyvamz · 2022-07-28T17:43:00Z

Hello @zixubingfeng

Thank you for reaching out. It's hard to say whether HY000 is expected in this case or not. HY000 is quite generic error code and it could mean different causes. Would you mind to provide more details about the issue?

What is a scenario you refer? What does your application do?
Is the issue you're reporting connected to database cluster failover?
Could you provide driver logs to investigate the issue?
Do you use Hikari connection pool?
What DB cluster configuration do you use?
Is there any similarity in the issue mentioned above and the issue you're reporting?

Thank you!

zixubingfeng · 2022-07-29T10:03:10Z

@sergiyvamz

My application only insert test, , for example use aws mysql jdbc driver 1.0.0 HikariCP 3.4.5 only write failover time too long than HikariCP 2.7.5 #209 (comment)
yes，when database cluster failover, I guess this failover scenes can not trigger connection switch ,due to throw the exception of HY000 . but Do not throw exceptions starting with "08".
ok, I cannot see any problems , please see awsfailoverdebug.txt
yes, use Hikari 3.4.5
I use cn region, jdbc:mysql:aws://huami-test.cluster-c6w5wba58yqu.rds.cn-northwest-1.amazonaws.com.cn:3306/basic_services?autoReconnect=true&failOverReadOnly=false&useUnicode=true&characterEncoding=UTF-8&useSSL=false&clusterInstanceHostPattern=?.c6w5wba58yqu.rds.cn-northwest-1.amazonaws.com.cn
yes, I am trying find problem , why failover time is so long.

sergiyvamz · 2022-07-29T19:25:36Z

Hello @zixubingfeng

Thank you for provided details. Based on driver log provided, the failover process works fine. Here are some extracts from the log file.

Line 32899: Fri Jul 29 11:48:18 CST 2022 - [156] - TRACE: [ClusterAwareConnectionProxy] Detected an exception while executing a command.
Line 33474: Fri Jul 29 11:48:24 CST 2022 - [156] - DEBUG: [ClusterAwareConnectionProxy] Starting writer failover procedure.
Line 33727: Fri Jul 29 11:48:25 CST 2022 - [156] - DEBUG: [ClusterAwareWriterFailoverHandler] Successfully connected to the new writer instance: 'huami-test.c6w5wba58yqu.rds.cn-northwest-1.amazonaws.com.cn:3306'
Line 33728: Fri Jul 29 11:48:25 CST 2022 - [156] - DEBUG: [ClusterAwareConnectionProxy] Connected to: software.aws.rds.jdbc.mysql.shading.com.mysql.cj.conf.HostInfo@735b164a :: {host: "huami-test.c6w5wba58yqu.rds.cn-northwest-1.amazonaws.com.cn", port: 3306, hostProperties: {failOverReadOnly=false, logger=software.aws.rds.jdbc.mysql.shading.com.mysql.cj.log.StandardLogger, autoReconnect=true, sslMode=DISABLED, useSSL=false, TOPOLOGY_SERVICE_SERVER_ID=huami-test, dbname=basic_services, TOPOLOGY_SERVICE_SESSION_ID=MASTER_SESSION_ID, clusterInstanceHostPattern=?.c6w5wba58yqu.rds.cn-northwest-1.amazonaws.com.cn, connectTimeout=0, socketTimeout=0, TOPOLOGY_SERVICE_LAST_UPDATE_TIMESTAMP=2022-07-29 03:48:25.768902, characterEncoding=UTF-8, TOPOLOGY_SERVICE_REPLICA_LAG_IN_MILLISECONDS=1.8446744073709552E16}}
Line 33729: Fri Jul 29 11:48:25 CST 2022 - [156] - ERROR: Transaction resolution unknown. Please re-configure session state if required and try restarting transaction.

As you can see the driver successfully reconnected to a new writer node "huami-test" (line 33728) and raise an exception with sqlState 08007. This code means that the failover is successful but transaction state is unknown and a user application needs to handle it. Hikari connection pool catches this exception and mark the connection as broken. As result the connection is evicted and disposed.

2022-07-29 11:48:25,769 WARN pool-5-thread-1 c.z.h.p.ProxyConnection:182 - HikariPool-1 - Connection software.aws.rds.jdbc.mysql.shading.com.mysql.cj.jdbc.ConnectionImpl@665e54e7 marked as broken because of SQLSTATE(08007), ErrorCode(0) java.sql.SQLException: Transaction resolution unknown. Please re-configure session state if required and try restarting transaction.

When a new connection is needed, Hikari connection pool opens a new connection with a connection string provided by your application. I can assume that you use a cluster writer endpoint as a connection string. The tricky point is that a cluster writer endpoint isn't up-to-date with the current state of the cluster right after database cluster failover is completed. DNS resolution uses stale data and, most probably, points your application to a wrong node. As result your application connects to a "huami-test-reader" node that is a reader. Further attempts to execute any DML statements on this node causes an exception:

java.sql.SQLException
MESSAGE: The MySQL server is running with the --read-only option so it cannot execute this statement

This explanation is based on few assumptions that I would like to confirm.

Do you use a cluster writer endpoint to connect to your cluster?
Do you configure your Hikari connection pool as it's explained here? https://github.com/awslabs/aws-mysql-jdbc#connection-pooling

zixubingfeng · 2022-08-02T01:47:44Z

hello @sergiyvamz

-I connect cluster endpoint not writer endpoint, and I not configure override hikari sqlerrorcode, for example: jdbc:mysql:aws://huami-test.cluster-c6w5wba58yqu.rds.cn-northwest-1.amazonaws.com.cn:3306/basic_services?autoReconnect=true&failOverReadOnly=false&useUnicode=true&characterEncoding=UTF-8&useSSL=false&clusterInstanceHostPattern=?.c6w5wba58yqu.rds.cn-northwest-1.amazonaws.com.cn
-I configure override hikari sqlerrorcode, and try it ,but this result still cannot connect aurora and the result of SQL Error is null.

another question, I restart application , to try connect db, connection still cannot recover, that why
restartcannotconnect.txt
.

sergiyvamz · 2022-08-02T16:31:33Z

Hello @zixubingfeng

The cluster endpoint you provided is a cluster writer endpoint and it points out to a writer node. The issue with a cluster endpoint is that it isn't accurate after failover at DB cluster gets completed. It takes some time till a cluster endpoints is updated with a new elected writer node. The log you've provided shows precisely this issue. Your application uses a cluster endpoints and it actually connects to a reader node (that was a writer node before DB failover). Attempts to execute DML states on a reader nodes throws an exception:

java.sql.SQLException
MESSAGE: The MySQL server is running with the --read-only option so it cannot execute this statement

congoamz · 2022-09-22T00:35:02Z

The fix for this issue was merged in with this commit and will be included in the next release. I'll close this issue for now since there hasn't been any more discussion. Please reopen if the issue still persists. Thanks you!

zixubingfeng mentioned this issue May 26, 2022

After failover sometimes connections are readonly when should not. #214

Closed

hsuamz added the bug Something isn't working label May 30, 2022

sergiyvamz mentioned this issue Jul 6, 2022

fix issue with long failover time #242

Merged

github-actions bot added the Stale label Sep 5, 2022

github-actions bot removed the Stale label Sep 12, 2022

congoamz closed this as completed Sep 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use aws mysql jdbc driver 1.0.0 HikariCP 3.4.5 only write failover time too long than HikariCP 2.7.5 #209

use aws mysql jdbc driver 1.0.0 HikariCP 3.4.5 only write failover time too long than HikariCP 2.7.5 #209

zixubingfeng commented May 20, 2022

karenc-bq commented May 30, 2022

abedali commented Jun 20, 2022

sergiyvamz commented Jun 21, 2022

sergiyvamz commented Jul 7, 2022

abedali commented Jul 11, 2022

zixubingfeng commented Jul 12, 2022

abedali commented Jul 13, 2022

sergiyvamz commented Jul 13, 2022

sergiyvamz commented Jul 13, 2022

zixubingfeng commented Jul 28, 2022 •

edited

Loading

sergiyvamz commented Jul 28, 2022

zixubingfeng commented Jul 29, 2022

sergiyvamz commented Jul 29, 2022

zixubingfeng commented Aug 2, 2022

sergiyvamz commented Aug 2, 2022

congoamz commented Sep 22, 2022 •

edited

Loading

use aws mysql jdbc driver 1.0.0 HikariCP 3.4.5 only write failover time too long than HikariCP 2.7.5 #209

use aws mysql jdbc driver 1.0.0 HikariCP 3.4.5 only write failover time too long than HikariCP 2.7.5 #209

Comments

zixubingfeng commented May 20, 2022

karenc-bq commented May 30, 2022

abedali commented Jun 20, 2022

sergiyvamz commented Jun 21, 2022

sergiyvamz commented Jul 7, 2022

abedali commented Jul 11, 2022

zixubingfeng commented Jul 12, 2022

abedali commented Jul 13, 2022

sergiyvamz commented Jul 13, 2022

sergiyvamz commented Jul 13, 2022

zixubingfeng commented Jul 28, 2022 • edited Loading

sergiyvamz commented Jul 28, 2022

zixubingfeng commented Jul 29, 2022

sergiyvamz commented Jul 29, 2022

zixubingfeng commented Aug 2, 2022

sergiyvamz commented Aug 2, 2022

congoamz commented Sep 22, 2022 • edited Loading

zixubingfeng commented Jul 28, 2022 •

edited

Loading

congoamz commented Sep 22, 2022 •

edited

Loading