Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace timeout handling to use SharedTimer #920

Merged
merged 14 commits into from
Jan 10, 2019

Conversation

sehrope
Copy link
Contributor

@sehrope sehrope commented Jan 5, 2019

PR to fix #909 following the reference counting model outlined in the comments. This is an alternative to #914. Was easier to put together a separate PR as the code changes were completely different.

This PR does not track the actual references or connection IDs, just a total count of how many times the SharedTimer has been requested (increment reference) and released (decrement reference). Connections that make use of the SharedTimer cache a reference to it and clean it up in Connection.close(). If the reference count drops to zero then the SharedTimer is shutdown and the core thread used for scheduling interrupt tasks is stopped.

Timeout that are actually triggered, i.e. the timer expired and the action hasn't completed yet, create a new thread on the fly to execute the interrupt action. This is to ensure that they do not block the core timer thread. Timeouts that are cancelled prior to expiring, i.e. query completed prior to timeout, do not create a new handler thread.

All current tests, including timeout tests, pass with this PR but there are no new tests yet to check if things are being cleaned up correctly. Manual testing via jconsole shows that it is but it'd be nice to eventually include something in the core test suite to cover that. Something similar to cheenamalhotra#20 should work though we'd have to deal with the issue of the rest of the test suite cleaning up all its resources prior to running (probably a good idea anyway!).

Alternatively we could have the thread leak tests be in their own test suite that is only run when explicitly enabled. Travis / Appveyor could then run them in isolation as a secondary step via: ENABLE_TIMEOUT_TESTS=true mvn -Dtest=com.microsoft.sqlserver.jdbc.timeouts.TimeoutThreadLeakTest test

I'll be taking a look at adding the tests when I have some more time but if someone else wants to do it in the meantime I can review.

Couple of notes on minor changes in behavior:

  • Timeout operations are scheduled via ScheduledThreadPoolExecutor which internally sleeps until the next operation will execute. Previously they were being manually polled every 1-second so it was possible for a timeout to not execute for 2-seconds (ex: poller is running at t=0, t=1, t=2, ... and timeout is scheduled at t=.01). They will now execute as soon as they need to run.
  • The clock for cancellation of bulk inserts now starts when the bulk command is going to execute as opposed to when it is created. The only blocking operation in between is waiting on an internal synchronization monitor, schedulerLock, so in practice it'd be microseconds of difference.

@codecov-io
Copy link

codecov-io commented Jan 5, 2019

Codecov Report

Merging #920 into dev will decrease coverage by 0.1%.
The diff coverage is 70%.

Impacted file tree graph

@@             Coverage Diff              @@
##                dev     #920      +/-   ##
============================================
- Coverage     50.46%   50.36%   -0.11%     
+ Complexity     2932     2929       -3     
============================================
  Files           120      120              
  Lines         28117    28100      -17     
  Branches       4696     4694       -2     
============================================
- Hits          14190    14152      -38     
- Misses        11629    11645      +16     
- Partials       2298     2303       +5
Flag Coverage Δ Complexity Δ
#JDBC42 49.92% <70%> (-0.01%) 2888 <16> (-3)
#JDBC43 50.28% <70%> (-0.14%) 2925 <16> (-5)
Impacted Files Coverage Δ Complexity Δ
...om/microsoft/sqlserver/jdbc/SQLServerBulkCopy.java 51.47% <50%> (-0.21%) 255 <0> (-2)
...a/com/microsoft/sqlserver/jdbc/TDSTimeoutTask.java 59.09% <59.09%> (ø) 4 <4> (?)
.../microsoft/sqlserver/jdbc/SQLServerConnection.java 46.51% <75%> (+0.07%) 342 <2> (+3) ⬆️
...java/com/microsoft/sqlserver/jdbc/SharedTimer.java 80% <80%> (ø) 10 <10> (?)
...in/java/com/microsoft/sqlserver/jdbc/IOBuffer.java 55.51% <90.9%> (-0.25%) 0 <0> (ø)
...a/com/microsoft/sqlserver/jdbc/PLPInputStream.java 58.33% <0%> (-1.2%) 33% <0%> (-1%)
...om/microsoft/sqlserver/jdbc/ReaderInputStream.java 43.95% <0%> (-1.1%) 15% <0%> (ø)
...rc/main/java/com/microsoft/sqlserver/jdbc/DDC.java 46.2% <0%> (-1.09%) 111% <0%> (-1%)
...c/main/java/com/microsoft/sqlserver/jdbc/Util.java 61.77% <0%> (-0.44%) 89% <0%> (-1%)
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 57b504f...e3e188b. Read the comment docs.

@cheenamalhotra
Copy link
Member

Hi @sehrope

The implementation looks definitely cleaner although a few improvements can be done further to reduce code and achieve final design. I created PR sehrope#2 to your branch, which when merged will be final implementation for the timer thread that we would like to go ahead with.

Replaces the timeout handling for basic and bulk TDS commands to use a new SharedTimer
class. SharedTimer provides a static method for fetching an existing static object or
creating one on demand. Usage is tracked through reference counting and callers are
required to call removeRef() when they will no longer be using the SharedTimer. If the
SharedTimer does not have any more references then its internal ScheduledThreadPoolExecutor
will be shutdown.

The SharedTimer is cached at the Connection level so that repeated invocations do not
create new timers. Connections only create timers on first use so if no actions involve
a timeout then no timer is fetched or created. If a Connection does create a timer then
it will be released when the Connection closed.

Properly written JDBC applications that always close their Connection objects when
they are finished using them should not have any extra threads running after they are
all closed. Applications that do not use query timeouts will not have any extra threads
created as they are only done on demand.  Applications that use timeouts and use
a JDBC connection pool will have a single shared object across all JDBC connections as
long as there are some open connections in the pool with timeouts enabled.

Interrupt actions to handle a timeout are executed in their own thread. A handler thread is
created when the timeout occurs with the thread name matching the connection id of the
client connection that created the timeout. If the timeout is canceled prior to the interrupt
action being executed, say because the command finished, then no handler thread is created.

Note that the sharing of the timers happens across all Connections, not just Connections
with the same JDBC URL and properties.
Centralize checks for SharedTimer thread state in TimeoutTest to happen before and after
all test invocations in TimeoutTest via JUnit @before and @after annotations. Each test
now requires that the SharedTimer thread not be running before the test starts and after
the test completes.

Also expands the thread name prefix in SharedTimer to package private so that TimeoutTest
can reference it rather than having its own copy fo the string.
Centralizes constant timeout value used in TimeoutTest and reduces it
from ten seconds down to two seconds to speed up tests.
Centralizes the SQL command for WAIT FOR DELAY in TimeoutTest and changes
the format of the delay to use hour:minute:second for clarity.
Changes test in TimeoutTest to use assertThrows(...) rather than manually
checking for the thrown exception.

Also cleans up and adds internal helpers for creating a connection and changes
internal runQuery(...) to be void rather than returning a boolean.
Adds additional tests to verify creation and shutdown of SharedTimer core thread
when user commands specify a query timeout.
@sehrope sehrope force-pushed the shared-timer-for-timeouts branch from fd2f3d3 to e76e819 Compare January 10, 2019 16:13
@sehrope
Copy link
Contributor Author

sehrope commented Jan 10, 2019

Rebased atop latest dev branch and couple of additional commits for cleanup and tests added.

The first commit is the same as before (rebased on dev). The rest are broken down into small chunks to simplify review. Here's a summary:

  • Replace timeout handling to use SharedTimer (main work of this PR)
  • Normalize newlines in TimeoutTest - Replaces CRNL with just NL (no actual changes)
  • Move TimeoutTest to parent package so that it can access package private members (no actual changes)
  • Centralize SharedTimer thread checks in TimeoutTest - Adds @Before/@After methods to handle ensuring timer threads don't exist before and after tests run (rather than manually doing it in each test).
  • Centralize and reduce timeout value used in TimeoutTest - Pulls up the 10-second timeout used by the existing tests and changes it to two-seconds (test time goes down by 20 seconds, yay!)
  • Centralize SQL for WAIT FOR DELAY in TimeoutTest - Pulls up the WAITFOR SQL and formats it a bit to make it easier to read (00:01 always looks like 1-second to me whereas 00:01:00 is clearly 1-minute)
  • Clean up TimeoutTest to use assertThrows(...) - Changes existing tests to use assert
  • Add additional SharedTimer tests - Adds a couple tests to verify creation and destruction of core timer thread (similar to @cheenamalhotra's tests but split out into a couple separate ones).

I've left in the id field on SharedTimer as one of the tests uses it to verify that the same instance is being used by multiple timer retrievals. The comparison could be done on the actual reference instead but I think the id based one is better long term as getTimer() could be expanded to provide a proxy that only allows the original caller a single dereference call so it wouldn't be the exact same physical reference to each caller (only same logical reference).

Change waiting for SharedTimer thread to stop to happen in a loop with periodic
checks to see if the thread has stopped up to a max amount of waiting (10 seconds).
@sehrope
Copy link
Contributor Author

sehrope commented Jan 10, 2019

Looks like Travis passed fine but AppVeyor. I'm thinking it's just AppVeyor running a tad slower so it didn't have enough time to complete and clean up the thread.

I've pushed another commit that changes the wait logic to repeatedly check if the thread is stopped in a loop for up to 10-seconds (previously it was 500ms).

Copy link
Member

@cheenamalhotra cheenamalhotra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few changes required for standardization (Reported by Static Code Analysis)

@cheenamalhotra cheenamalhotra merged commit 78b1399 into microsoft:dev Jan 10, 2019
@sehrope
Copy link
Contributor Author

sehrope commented Jan 11, 2019

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Single timeout thread never stops and could block
5 participants