-
Notifications
You must be signed in to change notification settings - Fork 426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DeadLock in JDBC driver close() vs object finalizer #2265
Comments
We'll look into this, thank you. |
I'm not able to see the issue, the area in question is using proper thread locking, so a deadlock situation does not make sense. There are two things I'm interested in, if you're able to provide the information:
Thanks! |
Hi, I have enabled logging and if an error occurs I will provide you with a log. At the same time, if the error appears, then I will try the older version of JDBC 11 Thank you |
Using java.util.logging.FileHandler is very slow. Our application uses dozens of connections at the same time, and in the system overview you can see that all threads working with JDBC are blocked by only one other thread working with the database, precisely because of logging. The lock they are waiting for is FileHandler. |
Unfortunately, logging is so slow that it cannot be used during normal operation. Normally, our application reads data from the database at a speed of about 5 thousand. records per second, but with logging it is 166 records per second. Additionally, concurrent reading does not work thanks to logging because all other threads are waiting for the logging lock to be released. If there is a deadlock, I will try jdbc version 11. Thank you for now |
Thanks for trying out logging. We'll keep an eye on this to see if we can find any potential issues, but without anything being evident from the code, and without any way to reproduce the issue, we'll have to put this issue on hold. Please let us know if the situation ends up arising again. |
Deadlock again this morning with the same call stacks ID 16 ABRABI - Database connection checker WAITING java.util.concurrent.locks.ReentrantLock$NonfairSync@7fb06e71 Finalizer ID 3 (holds the lock) 0d 00:14:12,937 ID 3 Finalizer WAITING java.util.concurrent.locks.ReentrantLock$NonfairSync@8f1b8d0 ABRABI - Database connection checker ID 16 holds the lock 0d 00:00:04,859 |
Thank you for the update. We'll try taking another look, but its still unclear from our side why this deadlock is happening. Please let us know if you're able to implement the better logger, its results, and any results you get from using the older driver. |
I have currently finished a "faster" asynchronous implementation of the Logging Handler (https://github.com/KarelHaruda/ExtendedFileHandler) and deployed it. It has been running for several days but the error has not yet appeared. It is possible that it affects logging. If the error occurs, we should have about the last 2GB of logs. Best regards |
There was a deadlock again. I am attaching complete logs from Apache Tomcat. It can be seen in stdout that the deadlock stopped the thread that takes care of terminating old connections, and since PostgreSQL is limited to a maximum of 100 connections, the Too many connections error occurred. But the deadlock occurred again in jdbc mssql, see previous call stacks. The JDBC logs are in jdbc.0 and jdbc.1 files. Jdbc.0 was active when the dead lock occurred. Link for zipped logs |
Thank you for the additional logs. A question: is this callstack (posted on Dec 1) from when you used version 11.2.3 of the driver? As well, you said that jdbc.0 was active when the deadlock occurred? jdbc.0 is from Dec 8, and the only places I can see the deadlock occurring are on Dec 4 (from both the Tomcat and Catalina log) is there an additional log that was meant to be uploaded? |
Hi, all available logs has been uploaded. I have no additionals logs :( Tomcat log has been crated 4.12.2023 (topwet_tomcat9-stdout.2023-12-04.log) but tomcat doesnot use daily rolling for this log file. 2023-12-08 14:22:07.216|ERROR|AppTimer-app-igate|abra.connections.trace|||||Connection create failed This shows that the thread deadlocked about a little earlier, because PostGreSQL exhausted all 100 possible connections just when this error occurred, and the thread terminating the connection did not work for some time (I estimate it for 10+ min there) This thread takes care of closing all connections to all SQL servers, and its stopping will be the first to affect the PostGreeSQL server, because it has only a maximum of 100 connections and the most operations are performed to it, therefore it will be the first to be exhausted. Best regards |
What is your workflow when the deadlock occurs, is there any correlation between what you are doing and when you see the deadlock? If you have a way to repro this issue, that would be best. Also, 11.2.3 so far has no deadlocks for you, how long has it been deployed for, and when do you usually see the deadlock happening? Is it safe to say moving to 11.2.3 resolved the issue? |
I don't know the exact procedure to simulate the error. Our system uses many database connections at the same time. The displayed web pages need up-to-date data from the database, and snapshot tables are added in parallel, for which it is not possible to read the data online, but it is enough to have data that is, for example, a few hours old. After use, the connection is put into a pool with a reserve of 5 connections per SQL database. They are then taken from there and if there are none free, new ones are created. If the pool is full and the connection has nowhere to hang, it closes. Within an hour, sometimes hundreds or thousands of connections are spun like this, and simply during closing, a deadlock suddenly occurs. Downgrading to 11.2.3 seems to have fixed the problem. Mostly the deadlock appeared within 2-3 days. Currently, the system runs for 6 days without a deadlock. It is also important that before the first deadlock appeared, the system ran on version 11 without any problems. We were recently going through and updating the libraries we use and then the deadlock just started happening. |
It's great that 11.2.3 has resolved the issue for you! We're still having trouble discerning what the issue is though. My thoughts were that since the only major change to that part of the driver happened from 11.2.3 --> 12.1.0, that 11.2.3 is a good place to check, I would like to confirm that, if that's alright with you. If possible, can you try 12.1.0 as well? If the theory is correct, than 12.1.0 will once again cause issues, and the problem becomes clear. We can then work on resolving this for the next release. For the record, I don't plan on asking you to test every release from 11.2.3 onwards, this would be very time consuming. But testing 12.1.0 would help immensely. If not, we'll once again continue trying to reproduce this, but its not clear if we'll be able to reproduce the issue, much less resolve it. Thank you for the current information on how you arrived at this deadlock, we'll take this into consideration, along with the logs given. If any further information becomes available regarding this deadlock, please let us know, as this will help us in arriving at a solution. |
Ok I will try it and post result here :) |
The deadlock also appeared on version 12.1.0.jre11-preview (from the maven repo) cca after two and half days of work (18.12.2003 cca at 7:11 - first occurrence of word BLOCKED in abrabisysmon.log). For jdbc logging file jdbc.0 was active. Link for zipped logs AllThreadDump.txt |
This time the Deadlock did not occur when closing the connection vs the finalizer, but occurred when calling the test query (vs finalizer) before the connection is deferred to the pool |
We are also using UseShenandoahGC - better than standard GC and it is part of OpenJDK -XX:+UseShenandoahGC |
Thanks for the additional logging. I'm still not able to see anything in jdbc.0, even with the provided timestamps, so for now the likeliest scenario is the changes from Is it possible to try this custom 12.5 jar with changes reverted? mssql-jdbc-12.5.0.jre11-preview.zip The above would confirm these changes are the culprit. If that turns out to be the case, I'm not sure at the moment what the next steps are, but we'll look into a solution that solves the deadlock issues as well as retaining Loom support. |
Hi, Karel. |
We have deployed the version with JDBC 12.5.0-preview, so I will inform you about the progress |
With this version 12.0.5-preview, the system has been running for 6 days without an error. In most cases, the deadlock appeared within two days. Karel. |
Thank you for testing this custom version. Earlier you stated:
This is still the case? I understand its the nature of the issue, but still am a bit hesitant to roll back changes when we're not fully able to reproduce the problem. Either way we will discuss the findings you have provided and let you know what are next steps are regarding this issue. |
Yes it's right. If the connection to the pool cannot be deferred, it closes. Closing is done by a separate thread that takes care of deferred connections. The whole thing works in such a way that the system requests a connection from anywhere. It either gets it from the pool or a new one is created for it. This means that there can be more than 100 live connections. When it does not need a connection, it does not delay testing and putting it in the pool, but just puts it in a queue that is processed by another thread. This will take care of testing the connection and possibly putting it in the pool, or closing it when the pool is full. Here a deadlock occurs during testing or closing. Mostly the deadlock occurred during the CLOSE connection, once during the test SQL command "select 1", but there is always a deadlock with the finalizer thread. |
Thank you. Between your findings, and a more recent issue, we think we know how the locks may be causing the issue. We'll update this thread when we have progress on the fix. |
Ok, I will look forward to your solution. I'm quite curious as to what it is, as I've also looked into the source codes, but didn't see anything suspicious |
There were uses of Our PR is in review. This (mssql-jdbc-12.5.0.jre11-withDeadlockFix.zip) is the fix. If you are able to test this, this will help us out immensely, as we're still not able to replicate the issue on our end, but regardless we will have this fix merged for the 12.6 GA, releasing on Jan 31. |
Ok I will test it and report the result :) |
@karel-haruda-abra Have you seen any failures in the last few days since deployment? |
No. The system has been running for 4 days and 17 hours (since deployment) without any problems :) |
Glad to hear that. We've merged the fix and it will be included in our Jan 31 release. Thank you again for all the help you've provided in debugging and testing around this issue! |
you're welcome 👍 |
Driver version
12.4.1.jre11
SQL Server version
Windows Server 2019 Standard (10.0)
Client Operating System
Windows Server 2019 Standard (10.0)
JAVA/JVM version
JAVA 11
Problem description
A deadlock occurs in the JDBC driver
Error message/stack trace
I have only call stacks of threads which are in deadlock
Thread 1
ID16 ABRABI - Database connection checker WAITING java.util.concurrent.locks.ReentrantLock$NonfairSync@444316f1 Finalizer ID3 0d 00:20:59,953
java.base@11.0.20.1/jdk.internal.misc.Unsafe.park(Native Method)
java.base@11.0.20.1/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
java.base@11.0.20.1/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885)
java.base@11.0.20.1/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:917)
java.base@11.0.20.1/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1240)
java.base@11.0.20.1/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:267)
java.base@11.0.20.1/sun.security.ssl.SSLSocketImpl.waitForClose(SSLSocketImpl.java:1817)
java.base@11.0.20.1/sun.security.ssl.SSLSocketImpl.closeSocket(SSLSocketImpl.java:1795)
java.base@11.0.20.1/sun.security.ssl.SSLSocketImpl.shutdown(SSLSocketImpl.java:1750)
java.base@11.0.20.1/sun.security.ssl.SSLSocketImpl.bruteForceCloseInput(SSLSocketImpl.java:800)
java.base@11.0.20.1/sun.security.ssl.SSLSocketImpl.duplexCloseInput(SSLSocketImpl.java:780)
java.base@11.0.20.1/sun.security.ssl.SSLSocketImpl.close(SSLSocketImpl.java:590)
com.microsoft.sqlserver.jdbc.TDSChannel.disableSSL(IOBuffer.java:817)
com.microsoft.sqlserver.jdbc.TDSChannel.close(IOBuffer.java:2204)
com.microsoft.sqlserver.jdbc.SQLServerConnection.clearConnectionResources(SQLServerConnection.java:4542)
com.microsoft.sqlserver.jdbc.SQLServerConnection.close(SQLServerConnection.java:4526)
eu.abra.bologic.DBConnection.closeConnection(DBConnection.java:304)
eu.abra.bologic.DBConnection$Checker.run(DBConnection.java:486)
Thread 2
ID3 Finalizer WAITING java.util.concurrent.locks.ReentrantLock$NonfairSync@422dbc49 ABRABI - Database connection checker ID16 0d 00:00:07,359
java.base@11.0.20.1/jdk.internal.misc.Unsafe.park(Native Method)
java.base@11.0.20.1/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
java.base@11.0.20.1/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885)
java.base@11.0.20.1/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:917)
java.base@11.0.20.1/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1240)
java.base@11.0.20.1/java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:267)
com.microsoft.sqlserver.jdbc.TDSChannel$ProxyInputStream.readInternal(IOBuffer.java:1187)
com.microsoft.sqlserver.jdbc.TDSChannel$ProxyInputStream.read(IOBuffer.java:1183)
java.base@11.0.20.1/sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:484)
java.base@11.0.20.1/sun.security.ssl.SSLSocketInputRecord.readHeader(SSLSocketInputRecord.java:478)
java.base@11.0.20.1/sun.security.ssl.SSLSocketInputRecord.decode(SSLSocketInputRecord.java:160)
java.base@11.0.20.1/sun.security.ssl.SSLTransport.decode(SSLTransport.java:111)
java.base@11.0.20.1/sun.security.ssl.SSLSocketImpl.decode(SSLSocketImpl.java:1511)
java.base@11.0.20.1/sun.security.ssl.SSLSocketImpl.waitForClose(SSLSocketImpl.java:1821)
java.base@11.0.20.1/sun.security.ssl.SSLSocketImpl.closeSocket(SSLSocketImpl.java:1795)
java.base@11.0.20.1/sun.security.ssl.SSLSocketImpl.shutdown(SSLSocketImpl.java:1750)
java.base@11.0.20.1/sun.security.ssl.SSLSocketImpl.bruteForceCloseInput(SSLSocketImpl.java:800)
java.base@11.0.20.1/sun.security.ssl.SSLSocketImpl.duplexCloseOutput(SSLSocketImpl.java:665)
java.base@11.0.20.1/sun.security.ssl.SSLSocketImpl.close(SSLSocketImpl.java:585)
java.base@11.0.20.1/sun.security.ssl.BaseSSLSocketImpl.finalize(BaseSSLSocketImpl.java:275)
java.base@11.0.20.1/java.lang.System$2.invokeFinalize(System.java:2138)
java.base@11.0.20.1/java.lang.ref.Finalizer.runFinalizer(Finalizer.java:87)
java.base@11.0.20.1/java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:171)
The text was updated successfully, but these errors were encountered: