Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to define multiple retry strategies for JDBC connection #20860

Merged
merged 1 commit into from
Mar 1, 2024

Conversation

kokosing
Copy link
Member

Allow to define multiple retry strategies for JDBC connection

Thanks to that now Oracle connector will retry opening a JDBC connection on
any SQLRecoverableException or SQLTransientException. Previously it
retried only on the former.

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense.

Have we considered making this apply to JdbcRecordCursor too sometime? Currently retries only save us when opening connections.

IIRC @wendigo ran into a case where reading results can fail because of connection being closed due to being idle for long time (e.g. table scan in join query waiting for other table scan).

LGTM. The questions above are to start a discussion only.

Copy link
Member

@dominikzalewski dominikzalewski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like an almost exact copy of the PR I left behind when we discussed it here:

https://github.com/trinodb/trino/pull/20132/files

I believe you are closer to the context, as I was writing the code 2 months ago. For me it looks good to go, but you can cross check yourself against that WiP PR whether you didn't miss anything.

@hashhar
Copy link
Member

hashhar commented Feb 29, 2024

what irony is this, test failed because of SQLRecoverableException.

Error:  io.trino.plugin.oracle.TestOracleCaseInsensitiveMapping.testSchemaNameClash -- Time elapsed: 6.572 s <<< ERROR!
java.lang.RuntimeException: java.sql.SQLException: java.sql.SQLRecoverableException: No more data to read from socket
	at io.trino.plugin.oracle.TestingOracleServer.execute(TestingOracleServer.java:125)
	at io.trino.plugin.oracle.TestingOracleServer.execute(TestingOracleServer.java:115)
	at io.trino.plugin.oracle.TestOracleCaseInsensitiveMapping.lambda$withSchema$0(TestOracleCaseInsensitiveMapping.java:75)
	at io.trino.plugin.jdbc.BaseCaseInsensitiveMappingTest.testSchemaNameClash(BaseCaseInsensitiveMappingTest.java:156)
	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
	at java.base/java.util.concurrent.RecursiveAction.exec(RecursiveAction.java:194)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:387)
	at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1312)
	at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1843)
	at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1808)
	at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:188)
Caused by: java.sql.SQLException: java.sql.SQLRecoverableException: No more data to read from socket
	at io.trino.plugin.jdbc.TracingDataSource.getConnection(TracingDataSource.java:52)
	at io.trino.plugin.jdbc.DriverConnectionFactory.openConnection(DriverConnectionFactory.java:78)
	at io.trino.plugin.jdbc.jmx.StatisticsAwareConnectionFactory.lambda$openConnection$0(StatisticsAwareConnectionFactory.java:45)
	at io.trino.plugin.jdbc.jmx.JdbcApiStats.wrap(JdbcApiStats.java:34)
	at io.trino.plugin.jdbc.jmx.StatisticsAwareConnectionFactory.openConnection(StatisticsAwareConnectionFactory.java:45)
	at io.trino.plugin.jdbc.RetryingConnectionFactory.lambda$openConnection$2(RetryingConnectionFactory.java:67)
	at dev.failsafe.Functions.lambda$toCtxSupplier$11(Functions.java:243)
	at dev.failsafe.Functions.lambda$get$0(Functions.java:46)
	at dev.failsafe.internal.RetryPolicyExecutor.lambda$apply$0(RetryPolicyExecutor.java:74)
	at dev.failsafe.SyncExecutionImpl.executeSync(SyncExecutionImpl.java:187)
	at dev.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:376)
	at dev.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:112)
	at io.trino.plugin.jdbc.RetryingConnectionFactory.openConnection(RetryingConnectionFactory.java:67)
	at io.trino.plugin.oracle.TestingOracleServer.execute(TestingOracleServer.java:120)
	... 10 more
Caused by: java.sql.SQLRecoverableException: No more data to read from socket
	at oracle.jdbc.driver.T4CMAREngineNIO.prepareForUnmarshall(T4CMAREngineNIO.java:811)
	at oracle.jdbc.driver.T4CMAREngineNIO.unmarshalUB1(T4CMAREngineNIO.java:449)
	at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:410)
	at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:269)
	at oracle.jdbc.driver.T4CTTIoauthenticate.doOAUTH(T4CTTIoauthenticate.java:450)
	at oracle.jdbc.driver.T4CTTIoauthenticate.doOAUTH(T4CTTIoauthenticate.java:1433)
	at oracle.jdbc.driver.T4CTTIoauthenticate.doOAUTH(T4CTTIoauthenticate.java:1147)
	at oracle.jdbc.driver.T4CConnection.logon(T4CConnection.java:794)
	at oracle.jdbc.driver.PhysicalConnection.connect(PhysicalConnection.java:820)
	at oracle.jdbc.driver.T4CDriverExtension.getConnection(T4CDriverExtension.java:80)
	at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:820)
	at oracle.jdbc.driver.OracleDriver.connect(OracleDriver.java:624)
	at io.trino.plugin.jdbc.TracingDataSource$JdbcDataSource.getConnection(TracingDataSource.java:74)
	at io.opentelemetry.instrumentation.jdbc.datasource.OpenTelemetryDataSource.wrapCall(OpenTelemetryDataSource.java:156)
	at io.opentelemetry.instrumentation.jdbc.datasource.OpenTelemetryDataSource.getConnection(OpenTelemetryDataSource.java:94)
	at io.trino.plugin.jdbc.TracingDataSource.getConnection(TracingDataSource.java:49)
	... 23 more

@kokosing
Copy link
Member Author

what irony is this, test failed because of SQLRecoverableException.

Does it mean that retry didn't work?

@kokosing
Copy link
Member Author

Have we considered making this apply to JdbcRecordCursor too sometime? Currently retries only save us when opening connections.

We did and the conclusion was that retrying during the read/write is complex. So it is out of the scope now.

@hashhar
Copy link
Member

hashhar commented Feb 29, 2024

indeed looks like retry didn't work for some reason.

	at dev.failsafe.internal.RetryPolicyExecutor.lambda$apply$0(RetryPolicyExecutor.java:74)
	at dev.failsafe.SyncExecutionImpl.executeSync(SyncExecutionImpl.java:187)
	at dev.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:376)
	at dev.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:112)
	at io.trino.plugin.jdbc.RetryingConnectionFactory.openConnection(RetryingConnectionFactory.java:67)

You might want to log the Throwables.getCausalChain in the AdditionalRetryStrategy to inspect.

@kokosing kokosing force-pushed the origin/master/004_retries branch from 4ce9956 to 477b3bd Compare February 29, 2024 15:59
@kokosing
Copy link
Member Author

2024-02-29T16:43:06.9094122Z 2024-02-29T10:43:06.724-0600	DEBUG	SplitRunner-20240229_164306_00019_zgqqr.1.0.0-6-4597	io.trino.plugin.jdbc.RetryingConnectionFactory	Failed to open the connection. Retrying.
2024-02-29T16:43:06.9098649Z java.sql.SQLException: java.sql.SQLRecoverableException: IO Error: Socket read interrupted, Authentication lapse 0 ms.

Retrying works. I had to be very unlucky last time.

@kokosing
Copy link
Member Author

Please take a look at the new second commit.

Comment on lines 77 to 78
Logging logging = Logging.initialize();
logging.setLevel(RetryingConnectionFactory.class.getName(), DEBUG);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this useful? I don't think anyone would see the logs unless they explicitly go looking for it.

Since you've already proved retries work (since it was logged in current CI run) maybe we can remove this.

Let's keep the log.debug in the connection factory though - useful to have.

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks to that now Oracle connector will retry opening a JDBC connection on
any SQLRecoverableException or SQLTransientException. Previously it
retried only on the former.
@kokosing kokosing force-pushed the origin/master/004_retries branch from 477b3bd to 6e9635e Compare March 1, 2024 10:47
@kokosing kokosing merged commit 6e9635e into trinodb:master Mar 1, 2024
1 check passed
@kokosing kokosing deleted the origin/master/004_retries branch March 1, 2024 10:47
@kokosing
Copy link
Member Author

kokosing commented Mar 1, 2024

I dropped the second commit.

@github-actions github-actions bot added this to the 440 milestone Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants