-
Notifications
You must be signed in to change notification settings - Fork 334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent Thread Pool Starvation in Synchronous Methods #62
Comments
This post describes a Synchronization Context that could be used to run all of the tasks in the executing thread: https://blogs.msdn.microsoft.com/pfxteam/2012/01/20/await-synchronizationcontext-and-console-apps/ I will take a shot at implementing this approach and measure performance |
Fundamentally, this is the problem with synchronous I/O, and why async is encouraged. It's not worth addressing in this library because it's a scalability bug in the caller.
This is not true (if you mean that a call to
That's why scalable web server benchmarks don't use synchronous I/O. |
What else would cause thread pool starvation to the point of complete application lock-up with such a low number of requests? (200 RPS isn't that much) It has to be related to the I agree that Async all the way is the correct approach, but we can't force this on all libraries that may want to depend on us. For example, Pomelo was hitting complete app lock-ups in Async performance tests when it was calling all Async code, but using the Synchronous transactions before #60 was merged. Asynchronous transactions aren't in the official DbTransaction interface, so we need to find a way to gracefully support synchronous transactions. I'll investigate further as to what is causing the app lockups in sync code and see if I can find a graceful workaround. There's probably a lot of apps out there that implement Sync code, and we don't want to cause them to lock up. |
@bgrainger I believe I have found the issue. The When Async methods hit this call, they stay in the same thread and complete, but we're not Async all the way through. When Sync methods (such as I think the |
When I'm not immediately sure that I can definitely debug this and verify that assumption. I should also be able to run the perf harness under IIS on Windows and drop in the (synchronous-only) MySql.Data library to see if it achieves similar (or much higher?) RPS. |
Correct, remove MySqlConnector from project.json and add MySql.Data/7.0.5-IR21 Would also be interesting to see if it locks up at a certain number of RPS or just gets slower. |
Under IIS (with this library), I was able to sustain 825 req/s for 20s. (Xeon E5-1650 v3, local MySQL 5.7.11 for Windows). Thread count peaked at 60. With |
I opened #63 to make the calls to As for Sync performance - there was a bug in the |
Here's my theory on what's happening: If I start 200 sync calls at once, I am seeing .NET spawn 200 threads (at the rate of about 2 threads per second) and then respond to all of the requests at once the 200th thread has been created. I think that the Managed Thread Pool is queuing requests for threads. When the flood of 200 requests hits, it allocates the entire thread pool (say 10 for example) to those incoming requests. Then it queues the remaining 190 to get threads from the pool as they become available. Next, the first sync request executes. It runs an async task that gets scheduled to run on a different Managed Thread Pool thread. But there is a line to get an available thread, so it is queued 191st in line to get a thread. Now some Managed Thread Pool algorithm kicks in and sees that all threads are locked up. It decides to allocate new threads to the Managed Thread Pool, at the rate of about 2 threads per second. The new threads process from the front of the queue, so this must happen 190 times before any async code gets executed. Finally, after 95 seconds, 190 threads have been spawned. The 191st thread starts executing async code, and flies through the backlog of async code. All of the outstanding requests complete very quickly. I have no idea how to fix this besides for making Sync code use Sync paths all of the way down (a huge effort). Any ideas? |
Makes sense; I've seen that before. (Pretty sure I've seen some good blog posts on it too, but can't find one right now.)
One thing about However, if the sync sockets API still needs to schedule some code on a threadpool thread, it won't help. |
This approach works. I've got some WIP where I've added an I am able to run Sync at 500RPS for 10 seconds using this WIP. Now what I don't like about the WIP is that I've exposed an What are your thoughts on putting the |
Options that come to mind:
Neither of the options is particularly appealing, but I think the "Sync Mode" is worth trying. |
Great, I'll work up a "Sync Mode" solution (unless you want to). May take me a few days. |
Another "con" for this is that it couldn't be used before a |
Either way, we'd want to introduce more |
But we can't in methods that use |
@caleblloyd I decided to work on the "asyncIO parameter" approach locally, just to see how big the changes were. It actually wasn't as bad as I was expecting, so I've pushed it here for discussion (and comparison with your WIP): ae30b7f I haven't actually yet run a stress test to verify that work isn't blocked when the threadpool is exhausted.
This is going to be true for any approach, since we ultimately need to verify that both the complete sync and async code paths work correctly. The most critical test cases will probably be the ones where the code needs to loop to send/receive payloads larger than a single MySQL packet; I need to add variants of those tests for both sync and async. |
@bgrainger I like it, the use of the enum makes things less confusing. I am curious to know, will Synchronous IO requests that are in progress get interrupted by a CancellationToken? I was trying to figure out how to interrupt a Synchronous We should revert to the Connection Timeout Logic that you wrote in: |
The synchronous It's probably useful to cancel socket I/O during |
From the MySQL Driver docs:
It sounds like Oracle's implementation of If a timeout is hit, can we reset the session and keep it in the pool? Also, in the case of I'll open another issue for |
Right now, synchronous methods are implement as:
This causes the running thread to schedule a background thread to do the work, and block until the result is available. This can lead to thread pool starvation, especially in the case of a web server.
For example, if the thread pool size is 25 and a web request comes in, there are 24 available threads. Then
Connection.Open
gets synchronously called, and that callsConnection.OpenAsync().GetAwaiter().GetResult()
. The current thread blocks and starts a background thread to open the connection, so now there are 23 available threads.If a flood of more requests come in, there can be a situation where there are no available threads to schedule the
Connection.OpenAsync().GetAwaiter().GetResult()
calls on. The app completely locks up at this point.I think we should look for a way to run these synchronous tasks on the currently executing thread if possible. Either that or schedule the Async work to run on a reserved background thread. @bgrainger , what's your opinion of the best approach?
You can see the thread pool starvation happen if you run the performance stress test with synchronous targets. I get lockups around 200+ Requests Per Second.
The text was updated successfully, but these errors were encountered: