Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HikariCP not adhering connectionTimeout #867

Closed
MikeN123 opened this issue Apr 12, 2017 · 9 comments
Closed

HikariCP not adhering connectionTimeout #867

MikeN123 opened this issue Apr 12, 2017 · 9 comments

Comments

@MikeN123
Copy link

MikeN123 commented Apr 12, 2017

Environment

HikariCP version: 2.5.1
JDK version     : 1.8.0_121
Database        : MSSQL / JTDS
Driver version  : 1.3.1

I just rebooted our databaseserver, and it seems HikariCP had some issues reconnecting. Looking at the logs, I see a request timed out after 600 seconds, but the Hikari connectionTimeout is set to 10 seconds, the JTDS socketTimeout is set to 300 seconds and the JTDS loginTimeout is set to 5 seconds.

Shouldn't the connection acquire timeout after 10 seconds, instead of hanging for 10 minutes?
Only 3 customers sites experienced this issue, and they have in common that they run HikariCP 2.5.1. Other sites, not showing the same issue, run HikariCP 2.4.1 or 2.4.7.

I haven't been able to reproduce this problem locally, making it difficult to pinpoint the exact problem.

Detailed HikariCP settings:

hikaricp.connectionTestQuery=SELECT 1
hikaricp.connectionTimeout=10000

hikaricp.minimumIdle=2
hikaricp.maximumPoolSize=50

Stacktrace:

Caused by: java.sql.SQLTransientConnectionException: userprefs-pool - Connection is not available, request timed out after 600194ms.
	at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:548)
	at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:186)
	at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:145)
	at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:83)
	at org.springframework.jdbc.datasource.LazyConnectionDataSourceProxy$LazyConnectionInvocationHandler.getTargetConnection(LazyConnectionDataSourceProxy.java:403)
	at org.springframework.jdbc.datasource.LazyConnectionDataSourceProxy$LazyConnectionInvocationHandler.invoke(LazyConnectionDataSourceProxy.java:376)
	at com.sun.proxy.$Proxy89.prepareStatement(Unknown Source)
	at org.springframework.jdbc.core.JdbcTemplate$SimplePreparedStatementCreator.createPreparedStatement(JdbcTemplate.java:1521)
	at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:627)
	... 80 common frames omitted
Caused by: java.sql.SQLTimeoutException: The query has timed out.
	at net.sourceforge.jtds.jdbc.TdsCore.wait(TdsCore.java:4166)
	at net.sourceforge.jtds.jdbc.TdsCore.executeSQL(TdsCore.java:1096)
	at net.sourceforge.jtds.jdbc.JtdsStatement.executeSQL(JtdsStatement.java:597)
	at net.sourceforge.jtds.jdbc.JtdsStatement.executeImpl(JtdsStatement.java:842)
	at net.sourceforge.jtds.jdbc.JtdsStatement.execute(JtdsStatement.java:1315)
	at com.zaxxer.hikari.pool.PoolBase.isConnectionAlive(PoolBase.java:157)
	at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:169)
	... 87 common frames omitted
	Suppressed: java.sql.SQLException: I/O Error: Read timed out
		at net.sourceforge.jtds.jdbc.TdsCore.nextToken(TdsCore.java:2491)
		at net.sourceforge.jtds.jdbc.TdsCore.clearResponseQueue(TdsCore.java:769)
		at net.sourceforge.jtds.jdbc.JtdsStatement.reset(JtdsStatement.java:755)
		at net.sourceforge.jtds.jdbc.JtdsStatement.close(JtdsStatement.java:999)
		at com.zaxxer.hikari.pool.PoolBase.isConnectionAlive(PoolBase.java:158)
		... 88 common frames omitted
	Caused by: java.net.SocketTimeoutException: Read timed out
		at java.net.SocketInputStream.socketRead0(Native Method)
		at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
		at java.net.SocketInputStream.read(SocketInputStream.java:171)
		at java.net.SocketInputStream.read(SocketInputStream.java:141)
		at net.sourceforge.jtds.ssl.TdsTlsInputStream.read(TdsTlsInputStream.java:72)
		at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
		at sun.security.ssl.InputRecord.read(InputRecord.java:503)
		at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
		at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930)
		at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
		at java.io.DataInputStream.readFully(DataInputStream.java:195)
		at java.io.DataInputStream.readFully(DataInputStream.java:169)
		at net.sourceforge.jtds.jdbc.SharedSocket.readPacket(SharedSocket.java:871)
		at net.sourceforge.jtds.jdbc.SharedSocket.getNetPacket(SharedSocket.java:730)
		at net.sourceforge.jtds.jdbc.ResponseStream.getPacket(ResponseStream.java:477)
		at net.sourceforge.jtds.jdbc.ResponseStream.read(ResponseStream.java:114)
		at net.sourceforge.jtds.jdbc.TdsCore.nextToken(TdsCore.java:2378)
		... 92 common frames omitted
@brettwooldridge
Copy link
Owner

@MikeN123 Thanks for the report. I'll probably have to dig into the jTDS code to figure out what is going on. From the stacktrace I can tell you that the connectionTimeout is not coming into play. There was an existing connection that HikariCP was trying to validate (isConnectionAlive()), and was running a query Statement. Clearly that connection was broken, but for whatever reason that call
(TCP read) took 2x the socketTimeout. The difference between 2.4.x and 2.5.x is interesting but not necessarily determinative; it could be due to timing differences rather than algorithmic differences.

@MikeN123
Copy link
Author

Yeah, the difference between 2.4 and 2.5 is a bit weird.

So, your conclusion is: connection was broken -> HikariCP tries to validate -> hangs on network-level so needs to wait for the socket timeout (instead of connectionTimeout?

Is there any way to interrupt this while still allowing long-running queries? (which was a reason we increased the socket timeout) It would be fine if HikariCP declares the connection dead after 10 seconds, but it's not fine if that means we cannot run a query that takes more than 10 seconds.

Not sure why it is 2x the socketTimeout, but I think I saw that before. Maybe jTDS is doing something weird, or maybe HikariCP is checking two connections, instead of giving up after one?

@brettwooldridge
Copy link
Owner

@MikeN123 The isConnectionAlive() method where the stack indicates HikariCP was waiting when it received the socket timeout is identical between v2.5.1 and v2.4.7, so I tend to think that there was a timing difference that was merely luck-of-the-draw that led to one hanging and the other not.

I actually found the cause of the socketTimeout x2 event, though it took staring at the code for a minute to see it. In isConnectionAlive() we have this code:

152 try (Statement statement = connection.createStatement()) {
153    if (isNetworkTimeoutSupported != TRUE) {
154       setQueryTimeout(statement, (int) MILLISECONDS.toSeconds(Math.max(1000L, validationTimeout)));
155    }
156 
157    statement.execute(config.getConnectionTestQuery());
158 }

The log you provided above contains an exception and a nested exception. The first exception:

Caused by: java.sql.SQLTimeoutException: The query has timed out.
	at net.sourceforge.jtds.jdbc.TdsCore.wait(TdsCore.java:4166)
	at net.sourceforge.jtds.jdbc.TdsCore.executeSQL(TdsCore.java:1096)
	at net.sourceforge.jtds.jdbc.JtdsStatement.executeSQL(JtdsStatement.java:597)
	at net.sourceforge.jtds.jdbc.JtdsStatement.executeImpl(JtdsStatement.java:842)
	at net.sourceforge.jtds.jdbc.JtdsStatement.execute(JtdsStatement.java:1315)
	at com.zaxxer.hikari.pool.PoolBase.isConnectionAlive(PoolBase.java:157)

points to PoolBase.java:157, which is the statement.execute() above. This query (the validation query) ran until the socketTimeout kicked out the read -- 300 seconds.

The second exception:

Suppressed: java.sql.SQLException: I/O Error: Read timed out
		at net.sourceforge.jtds.jdbc.TdsCore.nextToken(TdsCore.java:2491)
		at net.sourceforge.jtds.jdbc.TdsCore.clearResponseQueue(TdsCore.java:769)
		at net.sourceforge.jtds.jdbc.JtdsStatement.reset(JtdsStatement.java:755)
		at net.sourceforge.jtds.jdbc.JtdsStatement.close(JtdsStatement.java:999)
		at com.zaxxer.hikari.pool.PoolBase.isConnectionAlive(PoolBase.java:158)
		... 88 common frames omitted

points to PoolBase.java:158, which looks at first like just a curly-brace, but is actually the implicit finally for the try-with-resources started on PoolBase:152. This is a close() of the Statement. Unfortunately, this call also ran until the socketTimeout kicked out the read -- another 300 seconds.

Combined the two resulted in a 600194ms timeout.

This is rather disappointing behavior by jTDS. I would have hoped that the first socket read timeout exception would have resulted in the Connection being marked as closed, and therefore the call to Statement.close() could immediately return as a no-op.

HikariCP does its best in the absence of support for Connection.setNetworkTimeout() to "time box" the alive check -- as seen by the setQueryTimeout() call preceding the validation query itself. Unfortunately, the Statement.setQueryTimeout() does not provide robust guarantees in the case of a network partition or dangling TCP connection, which was the reason that setNetworkTimeout() was introduced in JDBC 4.0.

Unfortunately, support for setNetworkTimeout() at the driver-level is the only real remedy. Unsatisfying answer, but there it is.

I went ahead and implemented support for get/setNetworkTimeout() in the Microsoft JDBC 6.0 "preview" driver, and submitted a pull request. We'll see if they merge it.

In the meantime, one option would be to truly scrutinize your code to consider what the true maximum query time is, and decrease the jTDC socketTimeout to something above that but less than the current 300 seconds. For example, if your longest query is ~40 seconds, you might consider doubling that for your socket timeout.

Keep in mind that if jTDS is not flagging the connection as closed when it encounters a socket timeout, you could easily see very long recovery times, either in the presence or absence of HikariCP. For example, if a thread is in user-land code (not in the pool) executing code like:

01 try (Connection conn = ds.getConnection();
02      Statement stmt = conn.createStatement()) {
03    try (ResultSet rs = stmt.executeQuery("SELECT something FROM sometable")) {
04       rs.getString("somefield");
05       ...
06    }
07 }

and the connection is severed just after the thread passes line 3, jTDS is going to encounter a socketTimeout length delay at line 4, a socketTImeout length delay at line 6 (close ResultSet), and two socketTimeout length delays at line 7 (close Statement and close Connection).

@MikeN123
Copy link
Author

Many thanks for your lengthy comment and explanation. And for the PR to the Microsoft JDBC driver - the lack of timeout support has always been an important reason for us to stick with jTDS.

Unfortunately, we have some long running migrations, which require a larger timeout. I will check if we can adjust the timeout temporarily during the migrations, or maybe we can use a separate connection for this.

@brettwooldridge
Copy link
Owner

brettwooldridge commented Apr 14, 2017

@MikeN123 Just so you know, the Microsoft JDBC 6.0 preview driver already has support for a global socket timeout (like jTDS).

My pull request is to add support for per-Connection socket timeouts, via Connection.get/setNetworkTimeout(). This allows HikariCP to bracket internal pool operations with socket timeouts.

It also looks like they are about to merge my change, as it passed review with only minor cosmetic changes.

Also, if you do decide to switch to the new MS driver, you can remove your validation query as their driver supports the Connection.isValid() method, which as preferred by HikariCP.

@fwi
Copy link

fwi commented Apr 17, 2017

You might want to prefer to use a configured validation query with the new MS driver. The new MS driver currenlty implements Connection.isValid(5) as a query with a timeout and the timeout is implemented via a TimeoutTimer thread, not Connection.get/setNetworkTimeout() or something similar. The latter would be a better (and more efficient) solution, but until that is the case, I think isValid might hang (in the case of this issue) just as long as the global/connection socket timeout. See also microsoft/mssql-jdbc#150

Since PoolBase.isConnectionAlive sets a hard socket read-timeout when isUseJdbc4Validation is false, I think I would prefer that (with new MS driver version >= 6.1.7 containing microsoft/mssql-jdbc#253).

@brettwooldridge
Copy link
Owner

microsoft/mssql-jdbc#253 has been merged.

@fwi
Copy link

fwi commented Apr 18, 2017

@brettwooldridge Thanks for the improvement, much appreciated. I did not expect a solution for this odd issue, but you fixed it.

@girimungi
Copy link

girimungi commented Jul 30, 2018

@brettwooldridge I want to implement querytimeout for my application.when query exceeds 5 minutes then it should timeout.Is there any prooperty available in Hikari Datasource.please suggest.

kollstrom pushed a commit to kollstrom/HikariCP that referenced this issue Feb 4, 2021
kollstrom pushed a commit to kollstrom/HikariCP that referenced this issue Feb 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants