Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throw exception when Statement#setQueryTimeout is reached when no network is available. #525

Closed
dave-r12 opened this issue Oct 11, 2017 · 20 comments

Comments

@dave-r12
Copy link

Driver version or jar name

6.2.1.jre8

SQL Server version

SQL Server 2012

Client operating system

CentOS 7

Java/JVM version

java version "1.8.0_91"

Table schema

N/A

Problem description

In our setup, SQL Server will issue TCP keepalives on idle connections to eagerly detect potentially bad connections. Unfortunately this behavior sometimes results in a good connection be flagged as bad if the keepalives do not reach the other host. We've seen instances where SQL Server will terminate the connection, but our application hosts think the connections are still good. (Presumably the keepalives are getting dropped.) When we then attempt to use the connection it hangs. SQL Server will silently drop the packets without responding with a TCP reset. Eventually, we reach the TCP packet retransmitted limit (aka tcp_retries2).

Attempting to set the Statement#setQueryTimeout only works if the server responds with data. If the server is not responding, then the timeout is essentially ignored.

Expected behavior and actual behavior

I believe the timeout should trigger the Statement to terminate immediately, preferably with an exception of some kind.

Currently the code will hang until the socket times out via the retransmitted limit being reached.

Repro code

Here is a simple Java class to execute:

  public static void main(String[] args) throws Exception {
    DriverManager.class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver");

    Connection connection = DriverManager.getConnection("jdbc:sqlserver://localhost;databaseName=mydatabase", "sa", "Password");
    System.out.println("Cause the TCP connection to be terminated from SQL Server side then hit enter.");
    System.in.read();
    Statement statement = connection.createStatement();
    statement.setQueryTimeout(1);

    try {
      statement.execute("WAITFOR DELAY '00:00:3'; SELECT 1");
    } catch (SQLException e) {
      e.printStackTrace();
    }
    System.out.println("Complete");
  }

Here's how I terminate the TCP connection from the app host. Basically don't respond to the TCP keepalives:

# Get the port.
ss -tn | grep 1433

# Drop the packets.
sudo iptables -I INPUT 1 -p tcp --dport <paste-port-number-here> -j DROP

# Wait for a bit...
# Remove the iptables rule once the connection is terminated by SQL Server.
sudo iptables -D INPUT 1

To determine when SQL Server terminates the connection, I use this query:

select * from sys.dm_exec_sessions
where host_name = 'localhost'
@AfsanehR-zz
Copy link
Contributor

Hi @dave-r12 . I tried running the commands you mentioned on the same client machine, and my application does not hang and I am still getting the query timeout. If I understand correctly, you are getting the socket timeout exception rather than the query timeout?
Are you using the default setting for Keep Alive and Keep Alive Interval values in your SQL Server configuration?

@dave-r12
Copy link
Author

If I understand correctly, you are getting the socket timeout exception rather than the query timeout?

Yes that's right. Doesn't the query timeout rely on receiving a packet back from the server? In this case, the server is unreachable so it never sends anything back.

Are you using the default setting for Keep Alive and Keep Alive Interval values in your SQL Server configuration?

Umm, not sure if they are default. Looks like it's set to 30 seconds. I don't see any option for interval.

@dave-r12
Copy link
Author

dave-r12 commented Nov 6, 2017

@v-afrafi any luck?

@AfsanehR-zz
Copy link
Contributor

Hi @dave-r12 . Apologies for responding late. I also got the socket timeout exception. Will be investigating this and will get back to you shortly. Thanks!

@AfsanehR-zz
Copy link
Contributor

AfsanehR-zz commented Nov 9, 2017

Hi @dave-r12. looking at this issue, I see that the driver is sending the attention packet to cancel the query, however, as you mentioned since it can't figure out that the connection is silently dropped, it gets stuck at that point until we get the connection timeout.

I am trying to push a fix, will update you here once done.

It also seems like that postgreSql also has a connection property for this.
according to this reference:
cancelSignalTimeout= int (since 9.4.1209) Cancel command is sent out of band over its own connection, so cancel message can itself get stuck. This property controls "connect timeout" and "socket timeout" used for cancel commands. The timeout is specified in seconds. Default value is 10 seconds.

@dave-r12
Copy link
Author

@v-afrafi cool, thanks for tracking this down.

@AfsanehR-zz
Copy link
Contributor

Hi @dave-r12 could you take a look at pr #560 and see if your issue is resolved? I would appreciate your thoughts on the fix. Thanks!

@AfsanehR-zz
Copy link
Contributor

AfsanehR-zz commented Dec 5, 2017

Hello @dave-r12 , did you get the chance to look at pr #560? Please let us know if you had any questions. Thanks!

@dave-r12
Copy link
Author

dave-r12 commented Dec 5, 2017

Hey @v-afrafi sorry not yet. I'll try to take a look by end of week.

@cheenamalhotra
Copy link
Member

Hi @dave-r12 It would be great if you can review the fix in PR #560 . We'd like to include that in one of our coming preview releases.

@dave-r12
Copy link
Author

Got it. Hoping to test this tomorrow morning.

@dave-r12
Copy link
Author

Alright, so I'm trying to test this. I checked out a version of that PR and built it locally and then ran my test program. I'm still seeing the connection hang even after setting the cancel query timeout. I'll continue investigating to make sure my setup is right. Here is the connection string I'm using:

jdbc:sqlserver://localhost;databaseName=mydb;cancelQueryTimeout=3

@dave-r12
Copy link
Author

So it seems this also requires setting the queryTimeout parameter? We had been setting this on the Statement itself:

statement.setQueryTimeout(1);

Any way to avoid setting it globally?

@dave-r12
Copy link
Author

I can confirm that setting both the queryTimeout and cancelQueryTimeout, it now throws an exception:

com.microsoft.sqlserver.jdbc.SQLServerException: Socket closed
        at com.microsoft.sqlserver.jdbc.SQLServerConnection.terminate(SQLServerConnection.java:2705)
        at com.microsoft.sqlserver.jdbc.TDSChannel.read(IOBuffer.java:1993)
        at com.microsoft.sqlserver.jdbc.TDSReader.readPacket(IOBuffer.java:6502)
        at com.microsoft.sqlserver.jdbc.TDSCommand.startResponse(IOBuffer.java:7767)
        at com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQLServerStatement.java:855)
        at com.microsoft.sqlserver.jdbc.SQLServerStatement$StmtExecCmd.doExecute(SQLServerStatement.java:757)
        at com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7373)
        at com.microsoft.sqlserver.jdbc.SQLServerConnection.executeCommand(SQLServerConnection.java:2748)
        at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeCommand(SQLServerStatement.java:224)
        at com.microsoft.sqlserver.jdbc.SQLServerStatement.executeStatement(SQLServerStatement.java:204)
        at com.microsoft.sqlserver.jdbc.SQLServerStatement.execute(SQLServerStatement.java:734)
        at Test.main(Test.java:52)
Caused by: java.net.SocketException: Socket closed
        at java.base/java.net.SocketInputStream.socketRead0(Native Method)
        at java.base/java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
        at java.base/java.net.SocketInputStream.read(SocketInputStream.java:171)
        at java.base/java.net.SocketInputStream.read(SocketInputStream.java:141)
        at com.microsoft.sqlserver.jdbc.TDSChannel.read(IOBuffer.java:1983)
        ... 10 more

For our use case, we want to use this to detect connections that have gone bad in a connection pool. I'm thinking when we borrow a connection from the pool we'd do this check first. I'm unsure what the consequences would be if we set these parameters globally. Ideally we'd only want this when setting the statement.setQueryTimeout.

@cheenamalhotra
Copy link
Member

Hi @dave-r12

These 2 properties queryTimeout and cancelQueryTimeout must both be set on connection only as TDSReader does not have reference of Statement being executed but just the active connection. And thus, these properties are read from connection to be able to trigger timer before starting to read packets, which if reaches timeout, throws exception as you mentioned.

statement.setQueryTimeout does not have any role here.

So, in case you want to fetch a connection from connection pool, the connection properties must be identified and reset if needed, before executing query to detect bad connections.

@cheenamalhotra
Copy link
Member

Hi @dave-r12

Do you have any questions before we decide to merge the PR?

@dave-r12
Copy link
Author

dave-r12 commented Apr 7, 2018

Hey, alright. I guess I'm not completely understanding the comment about statement.setQueryTimeout. If that has no role, why is the API there? Is there a limitation with the implementation?

@cheenamalhotra
Copy link
Member

Hi @dave-r12

I revisited my initial investigation and came up with a solution to pass the statement property instead.
PR #674 shall address this issue both from statement & connection levels. You can now avoid setting it globally and use only when needed on a Statement or set it globally to apply on all statements by default.

@dave-r12
Copy link
Author

@cheenamalhotra awesome!

cheenamalhotra added a commit to cheenamalhotra/mssql-jdbc that referenced this issue May 1, 2018
cheenamalhotra added a commit to cheenamalhotra/mssql-jdbc that referenced this issue May 2, 2018
cheenamalhotra added a commit to cheenamalhotra/mssql-jdbc that referenced this issue May 2, 2018
@cheenamalhotra
Copy link
Member

Closing the issue since PR #674 got merged to dev.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants