Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ThreadPool Blocking Mitigation #47366

Closed

Conversation

benaadams
Copy link
Member

@benaadams benaadams commented Jan 23, 2021

Implementation of a technique; developed with @davidfowl, to prevent total ThreadPool starvation due to blocking Tasks on ThreadPool threads.

Main aim is to reduce chance of the process becoming irrecoverably unresponsive from "sync-over-async" in the ThreadPool; rather than making blocking on the ThreadPool a performant option.

Mechanism:

  1. First time a ThreadPool thread blocks; switch on work item callsite monitoring
  2. Subsequent times record the callsite as potentially blocking
  3. When queuing/executing a potentially blocking callsite use a limited concurrency queue*; which is processed as a IThreadPoolWorkItem
  4. When the blocking has subsided the marked callsites are unmarked, and blocking mitigation is disabled.

*Concurrency is limited to ThreadCount - 1 so there is always the availability of a ThreadPool thread for non-blocking work.

This allows non-blocking items to continue executing on the ThreadPool rather than be also blocked waiting for the blocking items to complete.

Test gist; also added as ThreadPool test.

Without mitigation -> Time: 1 hr 59 min 53.12 sec
With mitigation    -> Time:              0.56 sec

/cc @kouvel @stephentoub

@benaadams
Copy link
Member Author

Test ASP.NET Core endpoints:

public void Configure(IApplicationBuilder app, IWebHostEnvironment env)
{
    app.UseRouting();

    app.UseEndpoints(endpoints =>
    {
        endpoints.MapGet("/blocktp", context =>
        {
            var random = new Random();
            Task.Delay(random.Next(50, 500)).Wait();
            context.Response.WriteAsync("Hello World!").Wait();
            return Task.CompletedTask;
        });
        endpoints.MapGet("/async", async context =>
        {
            var random = new Random();
            await Task.Delay(random.Next(50, 500));
            await context.Response.WriteAsync("Hello World!");
        });
    });
}

Without the mitigation

$ wrk -c 128 http://.../blocktp
Running 10s test @ http://.../blocktp
  2 threads and 128 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.00us    0.00us   0.00us    -nan%
    Req/Sec     0.00      0.00     0.00      -nan%
  0 requests in 10.02s, 0.00B read
Requests/sec:      0.00
Transfer/sec:       0.00B

$ wrk -c 128 http://.../async
Running 10s test @ http://.../async
  2 threads and 128 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.00us    0.00us   0.00us    -nan%
    Req/Sec     0.00      0.00     0.00      -nan%
  0 requests in 10.02s, 0.00B read
Requests/sec:      0.00
Transfer/sec:       0.00B

With the mitigation

$ wrk -c 128 http://.../blocktp
Running 10s test @ http://...blocktp
  2 threads and 128 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.00s   565.36ms   1.94s    56.41%
    Req/Sec    13.54      8.87    50.00     83.58%
  219 requests in 10.02s, 26.31KB read
  Socket errors: connect 0, read 0, write 0, timeout 180
Requests/sec:     21.86
Transfer/sec:      2.63KB

$ wrk -c 128 http://.../async
Running 10s test @ http://.../async
  2 threads and 128 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   284.69ms  129.63ms 518.02ms   57.58%
    Req/Sec   220.79     51.76   343.00     75.00%
  4404 requests in 10.02s, 529.00KB read
Requests/sec:    439.70
Transfer/sec:     52.82KB

@stephentoub
Copy link
Member

I'll take a look at the code on Monday, but just from your description, lots of warning bells going off in my head ;-)

@benaadams benaadams force-pushed the ThreadPool-Blocking-Mitigation branch 3 times, most recently from 3335360 to f504dbc Compare January 23, 2021 17:16
@benaadams
Copy link
Member Author

lots of warning bells going off in my head

Not denying that 😅

Mostly reduces the chance of becoming irrecoverable; rather than making blocking on the ThreadPool a performant option.

e.g. currently if 3 blocking tests are run in sequence they all fail; as does an async one after:

~$ wrk -c 128 http://...:5000/blocktp
Running 10s test @ http://...:5000/blocktp
  2 threads and 128 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.00us    0.00us   0.00us    -nan%
    Req/Sec     0.00      0.00     0.00      -nan%
  0 requests in 10.02s, 0.00B read
Requests/sec:      0.00
Transfer/sec:       0.00B

~$ wrk -c 128 http://...:5000/blocktp
Running 10s test @ http://...:5000/blocktp
  2 threads and 128 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.00us    0.00us   0.00us    -nan%
    Req/Sec     0.00      0.00     0.00      -nan%
  0 requests in 10.02s, 0.00B read
Requests/sec:      0.00
Transfer/sec:       0.00B

~$ wrk -c 128 http://...:5000/blocktp
Running 10s test @ http://...:5000/blocktp
  2 threads and 128 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.00us    0.00us   0.00us    -nan%
    Req/Sec     0.00      0.00     0.00      -nan%
  0 requests in 10.02s, 0.00B read
Requests/sec:      0.00
Transfer/sec:       0.00B

~$ wrk -c 128 http://...:5000/async
Running 10s test @ http://...:5000/async
  2 threads and 128 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.00us    0.00us   0.00us    -nan%
    Req/Sec     0.00      0.00     0.00      -nan%
  0 requests in 10.02s, 0.00B read
Requests/sec:      0.00
Transfer/sec:       0.00B

Whereas after; still get timeouts for the blocking and fails at the start (as all threads start blocked); but it then recovers and the async still runs

~$ wrk -c 128 http://...:5000/blocktp
Running 10s test @ http://...:5000/blocktp
  2 threads and 128 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.00us    0.00us   0.00us    -nan%
    Req/Sec     0.00      0.00     0.00      -nan%
  0 requests in 10.02s, 0.00B read
Requests/sec:      0.00
Transfer/sec:       0.00B

~$ wrk -c 128 http://...:5000/blocktp
Running 10s test @ http://...:5000/blocktp
  2 threads and 128 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.00us    0.00us   0.00us    -nan%
    Req/Sec    29.11     20.92    80.00     75.41%
  192 requests in 10.02s, 23.06KB read
  Socket errors: connect 0, read 0, write 0, timeout 192
Requests/sec:     19.16
Transfer/sec:      2.30KB

~$ wrk -c 128 http://...:5000/blocktp
Running 10s test @ http://...:5000/blocktp
  2 threads and 128 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.82s   145.96ms   2.00s    85.07%
    Req/Sec    37.81     27.12   140.00     59.31%
  567 requests in 10.02s, 68.11KB read
  Socket errors: connect 0, read 0, write 0, timeout 232
Requests/sec:     56.61
Transfer/sec:      6.80KB

~$ wrk -c 128 http://...:5000/async
Running 10s test @ http://...:5000/async
  2 threads and 128 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   279.53ms  129.31ms 513.32ms   58.99%
    Req/Sec   224.29     48.89   340.00     72.50%
  4475 requests in 10.02s, 537.52KB read
Requests/sec:    446.51
Transfer/sec:     53.63KB

@KalleOlaviNiemitalo
Copy link

Would the tracking of call sites interfere with AssemblyLoadContext.Unload?

@davidfowl
Copy link
Member

davidfowl commented Jan 23, 2021

This looks really promising. I think the reflection part needs to be solved but the idea that we would attempt to educate the thread pool about blocking via calls that pass through the the BCL itself would handle a large set of problems.

We may even have something like this behind an app context switch if others are uncomfortable with it being on by default.

@benaadams
Copy link
Member Author

I think the reflection part needs to be solved

Solved... Forgot it was the runtime and could just add internal to somethings 😅

private static object? t_currentWorkItem;

private static ThreadPoolBlockingQueue? s_processor;
private static HashSet<object>? s_blockingItems;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any concern about how big this can get? I guess it's a function of the code since we're looking at types not work item instances.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its only recording the "methods" (which includes statemachine Types for async methods) that initiate the blocking call stack so hopefully should be of small size; if not then the app will have bigger issues...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are OS scheduler hiccups or little bursts of activity end up adding innocent victims to this list? Can this lead to close to everything that the app runs being in this list eventually, once the app runs for a while?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code will need to be at some point explicitly making a call to Task.Wait(), Task.Result, Task.WaitAll(...), .GetAwaiter().GetResult() (or ValueTask variants) that doesn't have a result; so not sure they are innocent 😉

Regular OS blocking/sync calls that don't block a Task shouldn't effect it (e.g. .Read()); but that's not so problematic as they will eventually unblock, whereas "sync-over-async"; which this aims to address, needs a second thread to unblock and if there's no spare on the ThreadPool and they are being blocked as fast as they are being injected they will never unblock.

The main down side is it marks the entry point from the last yield (as it doesn't break the stack); so anything branching off that same yield point will be effected; until it itself yields when it will go back to a regular ThreadPool item.

However it does mean mitigation patterns like:

await Task.Run(() => BlockingCall());

Will actually do something; scoping the ThreadPool mitigation at the lambda level rather than further back; and adding a yield

await Task.Yield();
BlockingCall();

Will again scope to the mitigation to async method rather than further back in the stack.

This could be for example calling Azure KeyVault async methods via the System.Security.Cryptography classes where they have no async apis; so would need to be .GetAwaiter.GetResult() behind the scenes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should think a bit about what we can do to make the finger printing a little better (can we do anything at JIT time or runtime) and try to figure out if we need to have a heuristic with how to remove these blocking call sites from this list. That can be improved over time (I can think of some crazy ideas...) but we just need to make sure that the idea is sound (I think it is).

It's already currently scoped to sync over async which is think is the big thing we need to tackle.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can self heal these blocking call sites by keeping some metrics of blocking call site execution and removing it from the list if multiple executions complete within some threshold (spitballing here) or don't make the same blocking call on re-execution.

@davidfowl
Copy link
Member

One problematic area are work items that batch items (basically hiding this knowledge from the thread pool. Like the IOQueue in kestrel and the Linux sockets implementation). The one to many relationship means that one bad inner work item causes all of them to be put into the blocking list.

We'd need to detect these types of work items or give them a way to opt out or participate themselves in blocking detection

@benaadams benaadams force-pushed the ThreadPool-Blocking-Mitigation branch 3 times, most recently from c619e36 to 31b5d5e Compare January 24, 2021 17:38
@benaadams
Copy link
Member Author

Added test (from https://gist.github.com/benaadams/9290257de1c907b3ecafc750779e2621)

Without mitigation -> Time: 01:59:53.1175255
With mitigation    -> Time: 00:00:00.5634481

@stephentoub
Copy link
Member

Thanks for suggesting an approach here. I skimmed the solution. On the one hand, it’s clever, and it has a nice impact on the hand-crafted problematic cases cited in the PR. On the other hand, I have a variety of initial concerns:

  • This effectively works via a blocklist approach, where a particular call site is blocklisted. Blocklisted call sites incur significantly more overhead. But that can significantly negatively impact a call site that’s shared infrastructure and just happens to be used in one place that’s blocking. For example, in ASP.NET MVC I assume there’s an async method somewhere that has the job of invoking controller actions; if just one synchronous controller action somewhere in the system ends up just once blocking on something via a Task, doesn’t that end up tainting every other controller operation in the whole system?
  • If anything in the system ever blocks on a task just once on a thread pool thread (which I expect is super likely in a real app), every subsequent thread pool operation incurs the cost of multiple static field accesses plus a HashSet lookup. That is non trivial overhead.
  • It seems like this could actually introduce deadlocks where they wouldn’t otherwise happen. The code is forcing all work items associated with the blocklisted call sites into a single queue, and then relying on a subset of threads to process that queue. That subset is limited, much more so than the number of threads possible in the pool. If all of those threads end up getting blocked on something that requires that queue to be processed, it’ll deadlock, and as far as I can tell, neither the hill climbing nor the starvation prevention mechanisms will kick in to help, because they’re unaware of the this blocked queue (if additional threads were added for other reasons, than one might help out here, but it wouldn’t be provoked to by this).
  • This increased the size of every work item by a reference, which in most cases is a 25% increase. If you instead change it to have a virtual call to avoid the extra field, it’s an extra virtual dispatch on every work item. If you instead change it to try to consolidate to a single field, it’s an extra cast on every work item (or non-trivial additional unsafe casting).
  • This adds significant complexity, duplicates key processing loops, incorporates somewhat arbitrary thresholds, and effectively adds a side pool. It would require serious testing on a myriad of workloads to feel confident that this is the right approach and right set of tradeoffs.

@davidfowl
Copy link
Member

This effectively works via a blocklist approach, where a particular call site is blocklisted. Blocklisted call sites incur significantly more overhead. But that can significantly negatively impact a call site that’s shared infrastructure and just happens to be used in one place that’s blocking. For example, in ASP.NET MVC I assume there’s an async method somewhere that has the job of invoking controller actions; if just one synchronous controller action somewhere in the system ends up just once blocking on something via a Task, doesn’t that end up tainting every other controller operation in the whole system?

See (#47366 (comment)). If the identification of the workitem is too coarse grained then it can be problematic. I see that as a problem but maybe one we can come up with creative solutions to (or maybe we need to opt out in those cases)

If anything in the system ever blocks on a task just once on a thread pool thread (which I expect is super likely in a real app), every subsequent thread pool operation incurs the cost of multiple static field accesses plus a HashSet lookup. That is non trivial overhead.

This is problematic and concerns me as well. Ben and I spoke about this a little but haven't come up with anything as yet. I have some crazy ideas that would require @jkotas approval 😄.

It seems like this could actually introduce deadlocks where they wouldn’t otherwise happen. The code is forcing all work items associated with the blocklisted call sites into a single queue, and then relying on a subset of threads to process that queue. That subset is limited, much more so than the number of threads possible in the pool. If all of those threads end up getting blocked on something that requires that queue to be processed, it’ll deadlock, and as far as I can tell, neither the hill climbing nor the starvation prevention mechanisms will kick in to help, because they’re unaware of the this blocked queue (if additional threads were added for other reasons, than one might help out here, but it wouldn’t be provoked to by this).

Temporary deadlocks because the thread pool would grow anyways? This commit, requests a new thread f5702e8 which should signal to the thread pool that work that more threads are needed.

This increased the size of every work item by a reference, which in most cases is a 25% increase. If you instead change it to have a virtual call to avoid the extra field, it’s an extra virtual dispatch on every work item. If you instead change it to try to consolidate to a single field, it’s an extra cast on every work item (or non-trivial additional unsafe casting).

I feel like if you can solve the hashset problem, you can potentially solve this as well. We need a fast way to identify if a workitem is potentially blocking. Are there any free bits in the object header 😄?

This adds significant complexity, duplicates key processing loops, incorporates somewhat arbitrary thresholds, and effectively adds a side pool.

Right but it aims at dramatically improving a problem that our customers have that isn't going away. Lots of code we have like this has these types of arbitrary thresholds (sometimes without knobs...)

It would require serious testing on a myriad of workloads to feel confident that this is the right approach and right set of tradeoffs.

👍🏾

@benaadams
Copy link
Member Author

This increased the size of every work item by a reference, ...

Could drop QUWIs from it; as they are explicit calls so not necessarily in the common path; and it would still pick up Task.Run

@stephentoub
Copy link
Member

maybe we need to opt out in those cases

The current issue is one of education. Solving that by introducing a new problem that requires education isn't great.

because the thread pool would grow anyways

Why would it grow? The starvation detection mechanism isn't paying attention to this, is it? As far as it's concerned, the work items that were queued were picked up successfully. In fact it seems like this could actually decrease the number of threads that get injected if hill climbing sees the work items that check IsRequired and immediately exit as successful completions.

@davidfowl
Copy link
Member

Why would it grow? The starvation detection mechanism isn't paying attention to this, is it? As far as it's concerned, the work items that were queued were picked up successfully. In fact it seems like this could actually decrease the number of threads that get injected if hill climbing sees the work items that check IsRequired and immediately exit as successful completions.

I don't know this part of the code base well but it seems like it could be changed to know right?

@davidfowl
Copy link
Member

The current issue is one of education. Solving that by introducing a new problem that requires education isn't great.

I don't now how widespread this would need to be, but it would require a public API. This novel batching work item thing we do basically hides information from the thread pool into our own queue because there's more overhead queueing directly to the threadpool. It's logical that something to help detect blocking at the working level is inadequate because we're lying to the threadpool 😢

@benaadams
Copy link
Member Author

benaadams commented Jan 25, 2021

I assume there’s an async method somewhere that has the job of invoking controller actions; if just one synchronous controller action somewhere in the system ends up just once blocking on something via a Task, doesn’t that end up tainting every other controller operation in the whole system?

Yes; and currently if it took more than 500 ms making 4 rps would DoS the server as that's faster than the thread injection rate (increase the rate if the blocking takes less time).

If the blocking can't be avoided (hidden behind an api) then adding await Task.Yield() before the call or executing it via an await Task.Run() lambda would scope the blocking just to that statemachine. More advanced would be if the await Task.Yield() could be injected automatically to break the stack at the latest point or similar.

every subsequent thread pool operation incurs the cost of multiple static field accesses plus a HashSet lookup...

or to tag the method itself in metadata rather than use a HashTable?

... neither the hill climbing nor the starvation prevention mechanisms will kick in ...

They can be made more aware; periodically scanning !.Empty and checking if anything has completed recently?

This increased the size of every work item by a reference, ...

Removed that as the common issue (from ASP.NET anyway) is probably on a Task/ValueTask root rather than QUWI.

duplicates key processing loops

Moved to method so both now use the same.

incorporates somewhat arbitrary thresholds

Only one is ThreadCount / 2? Which is due to if its just ThreadCount when hill climbing injects an extra thread the blocking will just eat it and there won't be a thread left to unblock.

and effectively adds a side pool

Definitely does!

Which is a common approach in many runtime solutions (rust, jvm, scala ZIO etc); 3 thread pools: 1 I/O, 1 CPU, 1 Blocking.

.NET already effectively has an I/O thread pool; and hill climbing mostly solves the issue with regular blocking OS calls being on same thread pool as CPU items; as it will eventually inject another thread and those items unblock themselves.

The issue in .NET is with "sync-over-async" as they do not unblock themselves and need another thread so they must be rate limited below the rate of thread pool injection (assuming don't get fancy and having heap thread stacks; so they can be removed off the threads and reattached when ready via stack bending #4654; like golang, java's project loom or Windows' fibres)

The issue with a blocking threadpool elsewhere; whether spawn_blocking in rust tokio's scheduler or Scala's ZIO effectBlocking; is you need to know upfront before calling that its going to be blocking then use that construct to queue it to the blocking pool (much like you'd do in .NET with a concurrency limited TaskScheduler).

However; its hard to know ahead of time that there is a .GetAwaiter().GetResult() hidden behind what might seem an innocuous property, method or Lazy evaluation; with the side-effect of the entire process is locked.

The approach here is to automatically detect those problematic paths and then try to mitigate them; by moving them to effectively a side pool.

Of course the implementation can probably be refined in various ways...

@jkotas
Copy link
Member

jkotas commented Jan 25, 2021

I think that detecting sync-over-async waiting and doing something when it is detected has as a merit as an idea.

Blacklisting currently executing threadpool workitem sounds problematic as @stephentoub suggested. Can we do something less problematic instead? For example, can we tell the threadpool to take the thread stuck on sync-over-async out of circulation temporarily to inject new threads more aggressively?

@benaadams
Copy link
Member Author

For example, can we tell the threadpool to take the thread stuck on sync-over-async out of circulation temporarily to inject new threads more aggressively?

Effectively increase ThreadPool.MinThreads by one before the Wait and decrease it by one after? Does have the side-effect of unbounded thread count growth as there is no rate limiting/backpressure on the number of blocking items entering the ThreadPool?

@GSPP
Copy link

GSPP commented Jan 27, 2021

Does this reduce the degree of parallelism for workloads that make use of blocking waits? That seems very undesirable. Blocking waits do occur in real-world apps and are, in some cases, a good solution. For example, to implement an interface it might be required to block. Or, blocking happens in an operation that does not consume many threads and you want to avoid infecting the entire call chain so you just block on a task. It would be unfortunate if such workloads were to suffer.

@davidfowl
Copy link
Member

Workloads like that block thread pool threads already suffer. This technique tries to blame the right stack which is hard to do but I think it's extremely promising as it does allow you to manually break up the stack to workaround things (Task.Run(...).Wait())

@stephentoub
Copy link
Member

Workloads like that block thread pool threads already suffer.

This is generally true for scalable request/response workloads like you find in a typical ASP.NET app. There are app types where it's not true, single user, where blocking is used for simplicity, etc, and the proposed solution will throttle such workloads artificially.

@davidfowl
Copy link
Member

davidfowl commented Jan 27, 2021

This is generally true for scalable request/response workloads like you find in a typical ASP.NET app. There are app types where it's not true, single user, where blocking is used for simplicity, etc, and the proposed solution will throttle such workloads artificially.

OK so a single user application might want to opt out of this so that it can block without being throttled by the limited concurrency. Just so I have a better picture of this, what does the code look like in a single user app that's using blocking waits on thread pool threads parallelism but has enough work to observe problems with limited concurrency?

@filipnavara
Copy link
Member

Just so I have a better picture of this, what does the code look like in a single user app that's using blocking waits on thread pool threads parallelism but has enough work to observe problems with limited concurrency?

I don't have a comprehensive answer for that but we do have a desktop app that occasionally hits it. It's an email client that can have multiple accounts connected to different servers. For servers that communicate over IMAP protocol we rely on pretty complex logic that involves System.IO.Pipelines and occasionally due to API constrains we resort to some sync-over-async patterns (with cancellation tokens that would interrupt the operation if necessary). The number of such task scales with the number of accounts that are setup for synchronization in the application. Not all servers behave the same so even though some of these operations can be performed really quickly on one server they may block on a different one.

@davidfowl
Copy link
Member

Thanks for that. I don't think this approach is bullet proof by any means but I think it gives some insight to what we might be able to accomplish and explore scenarios that end up being punished unnecessarily. I wonder if there's a staged approach here where there are thresholds for being put into the limited concurrency queue (lets call it the sunken place) and another threshold to get of that said place. I could see that mitigating some of the problems described here (though there's a slew of other problems), though it might lead to thrashing if your code ends up in and out of this list too frequently.

@benaadams benaadams force-pushed the ThreadPool-Blocking-Mitigation branch from 1d3d6ba to 71f6bdd Compare January 27, 2021 14:18
@benaadams
Copy link
Member Author

Notifying the ThreadPool of executing work (it didn't do originally) looks to allow ThreadCount - 1 rather than ThreadCount / 2; always leaving 1 free to execute non-blocking work; but otherwise growing with the ThreadPool.

Also changed to disengage when the blocking pressure has subsided. So mechanism changed to:

  1. First time a ThreadPool thread blocks; switch on work item callsite monitoring
  2. Subsequent times record the callsite as potentially blocking
  3. When queuing/executing a potentially blocking callsite use a limited concurrency queue*; which is processed as a IThreadPoolWorkItem
  4. When the blocking has subsided the marked callsites are unmarked, and blocking mitigation is disabled.

*Concurrency is limited to ThreadCount - 1 so there is always the availability of a ThreadPool thread for non-blocking work.

@stephentoub
Copy link
Member

Just so I have a better picture of this, what does the code look like in a single user app that's using blocking waits on thread pool threads parallelism but has enough work to observe problems with limited concurrency?

Many workloads for which tasks were originally introduced, actually. Consider, for example, parallelized divide and conquer. A worker partitions its work into multiple pieces, queues identical workers to process N - 1 of those pieces, processes its own, and then waits for those other workers to complete. Also think about algorithms that require coordination between those pieces, such as various operations used by PLINQ (e.g. OrderBy) that needs to block each of the workers until they reach a common point so that they can exchange data and all continue on, and what would happen if one of the workers was throttled indefinitely.

OK so a single user application might want to opt out of this

I think that's the wrong direction, at least for the foreseeable future. If we decided to go with an approach like the one proposed in this draft, I have trouble seeing it as anything other than opt-in, given all of the previously cited concerns, the potential for introducing deadlocks where they weren't previously, the potential for casting way too wide a net, the potential for negatively impacting the throughput of unrelated workloads, etc. _If such a mitigation was added as opt-in, and if enough testing demonstrated it was a good decision for certain workloads, then it could potentially be automatically opted-in for those, e.g. if you felt ASP.NET should always turn this on. Even there, though, I'd be surprised if in the current configuration that was a wise choice.

@davidfowl
Copy link
Member

davidfowl commented Jan 27, 2021

Many workloads for which tasks were originally introduced, actually. Consider, for example, parallelized divide and conquer. A worker partitions its work into multiple pieces, queues identical workers to process N - 1 of those pieces, processes its own, and then waits for those other workers to complete. Also think about algorithms that require coordination between those pieces, such as various operations used by PLINQ (e.g. OrderBy) that needs to block each of the workers until they reach a common point so that they can exchange data and all continue on, and what would happen if one of the workers was throttled indefinitely

Today do the implementation of those kick end up doing Task.Wait?

I think that's the wrong direction, at least for the foreseeable future. If we decided to go with an approach like the one proposed in this draft, I have trouble seeing it as anything other than opt-in, given all of the previously cited concerns, the potential for introducing deadlocks where they weren't previously, the potential for casting way too wide a net, the potential for negatively impacting the throughput of unrelated workloads, etc. _If such a mitigation was added as opt-in, and if enough testing demonstrated it was a good decision for certain workloads, then it could potentially be automatically opted-in for those, e.g. if you felt ASP.NET should always turn this on. Even there, though, I'd be surprised if in the current configuration that was a wise choice.

Yes I have similar fears but I'm looking for solutions. Maybe this isn't this one but there are aspects of this that seem interesting. Say we added a public API to enable this and it was opt-in, I could see the potential for adding a piece of middleware or filters that enables this on the current workitem before the blocking code executed. Something like this:

[HttpGet("/blocking")]
[EnableBlockingMitigation] // This name is terrible
public string BlockingLegacyCodez()
{
      return new WebClient().DownloadString(...); // I know we're never supposed to use this API
}

That filter could yield to thread pool to get off the current work item and enable mitigation on a unique workitem that's closer to this code. The equivalent of doing this:

[HttpGet("/blocking")]
public async Task<string> BlockingLegacyCodez()
{
      await Task.Yield(); // Make a new work item
      ThreadPool.EnableBlockingWorkitem();
      return new WebClient().DownloadString(...); // I know we're never supposed to use this API
}

Thoughts?

@stephentoub
Copy link
Member

Today do the implementation of those kick end up doing Task.Wait?

There are a variety of patterns folks employ. Sometimes it's:

Task t = Task.Run(() => ...);
... // do work
t.Wait();

Sometimes it's:

Task[] tasks = new Task[...];
for (int i = 0; i < tasks.Length; i++)
{
    tasks[i] = ...;
}
Task.WaitAll(tasks);

Sometimes it's:

Parallel.For(...); // or Invoke or ForEach

Sometimes it's:

var ce = new CountdownEvent(count);
for (int i = 0; i < count; i++)
{
    ThreadPool.QueueUserWorkItem(_ =>
    {
        try { ... } finally { ce.Signal(); }
    });
}
ce.Wait();

Etc.

Both Parallel.* and PLINQ use and wait on Tasks internally.

new WebClient().DownloadString(...); // I know we're never supposed to use this API

FWIW, that will use the sync methods on HttpClient now.

[EnableBlockingMitigation]

From a "make sure we don't negatively affect unexpected things" perspective, I'd be happier with that, assuming it wouldn't require anything in the runtime to do reflection on everything to look for things that might have such an attribute. It of course still requires someone to know there's a problem, what the problem is, and what solutions there are for addressing it: at that point, we're limiting ourselves to the subset of problems where the developer is prevented from actually doing the "right thing" (e.g. they're forced to implement a sync interface and have only async tools at their disposal to do so) rather than the subset where they don't realize they're doing the "wrong thing". From a complexity and impact perspective, while I do understand the appeal of a detect-bad-actors-and-throttle-them approach, I still think we should first be looking at and testing out simpler approaches, like tweaks to thread injection strategy we've talked about previously (e.g. ramp up beyond ProcessorCount faster than we do today).

@davidfowl
Copy link
Member

From a complexity and impact perspective, while I do understand the appeal of a detect-bad-actors-and-throttle-them approach, I still think we should first be looking at and testing out simpler approaches, like tweaks to thread injection strategy we've talked about previously (e.g. ramp up beyond ProcessorCount faster than we do today).

I'm not against this approach and it is much simpler but I'm worried we'll have a different problem that we have today at the global thread pool scale if that's all we do. But maybe we can make it more clever but combining this with that strategy: Record the blocking operation and decommission a thread pool thread from the accounting.

I do worry about increased contention on the global queue with this strategy which will negatively impact the performance as well (we already see this today on machines with more lots of cores).

@davidfowl
Copy link
Member

we're limiting ourselves to the subset of problems where the developer is prevented from actually doing the "right thing" (e.g. they're forced to implement a sync interface and have only async tools at their disposal to do so) rather than the subset where they don't realize they're doing the "wrong thing".

I don't think that's true. I think we need some knobs that workloads can use to change behavior, right now what we have works well when everyone plays ball but is quite catastrophic one there's a bad actor somewhere in the system. In that way it might not be the developer that needs to make the choice but potentially framework authors can have more granular control over things.

@halter73
Copy link
Member

halter73 commented Jan 27, 2021

I'm thinking about how this could be used to only dispatch blocking ASP.NET route delegates to the blocking queue without ever needing to dispatch non-blocking route delegates. It's too bad we have to schedule a new work item to see if a given continuation is blocking in isolation. Adding a Task.Yield() to schedule a new work item right before blocking gets the right granularity, but requires the person doing the blocking to know to do this (which I feel is unlikely) or requires framework code to take the perf hit of yielding before fanning out to any potentially-blocking code.

What if we had an API that allows you to tell the ThreadPool to track if there's any blocking after you call it for a given async continuation?[1] The API would take a key (maybe derived from the route pattern in ASP.NET's case). The next time you get to the same place (matched the same route or whatever), you could explicitly query the ThreadPool using the same key. If it blocked before, only then would you need yield to the blocking queue and you could do so explicitly.[2] This way, you don't need to yield after matching the route on every request.

The other upside of a more explicit API that allows framework code to query if a given continuation is blocking is that it would give the framework more options than just scheduling the blocking work item to blocking queue. Maybe, depending on the configuration, ASP.NET would skip the work item and respond with a 503 in a blocking-concurrent-request limit is hit. This is similar to what Microsoft.AspNetCore.ConcurrencyLimiter does. Maybe we could enable it by default for blocking routes.

1: This would require blocking detection for more than just the current work item but through the entire async flow until the blocking detection is explicitly stopped. In the case of a request delegate, ASP.NET could stop the detection once the request delegate's Task completes. It probably makes sense for this API to use IDisposable for this.
2: Again, yielding to the blocking queue would have to cause all subsequent awaits to yield to the blocking queue until explicitly stopped since we don't track exactly which work item blocked with this API.

@benaadams benaadams force-pushed the ThreadPool-Blocking-Mitigation branch from d3ff971 to 0079975 Compare January 30, 2021 02:12
Base automatically changed from master to main March 1, 2021 09:07
@ghost ghost closed this Mar 31, 2021
@ghost
Copy link

ghost commented Mar 31, 2021

Draft Pull Request was automatically closed for inactivity. It can be manually reopened in the next 30 days if the work resumes.

@ghost ghost locked as resolved and limited conversation to collaborators Apr 30, 2021
@karelz karelz added this to the 6.0.0 milestone May 20, 2021
This pull request was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants