Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASP.NET Schedulers #84

Open
tmds opened this issue Apr 2, 2020 · 16 comments
Open

ASP.NET Schedulers #84

tmds opened this issue Apr 2, 2020 · 16 comments

Comments

@tmds
Copy link
Owner

tmds commented Apr 2, 2020

@davidfowl @halter73 we've added control over the schedulers that get used.

We can control these:

  • c: where do async socket continuations run: inline/threadpool
  • i: input pipe reader scheduler: inline/threadpool
  • o: output pipe reader scheduler: inline/threadpool/ioqueue/iothread

The bold values are the default.

When we inline all these scheduler and use ProcessorCount threads, we measured 1m rps for middleware JSON on Citrine.

I wonder which of these would be safe to inline in a production app, where there may be potentially blocking code from the user?

I guess this would be c (because the application code is decoupled to ThreadPool later) and also o (because the application code runs on the output pipe writer scheduler). Is that correct?
What about i?

cc @antonfirsov @adamsitnik

@halter73
Copy link

halter73 commented Apr 3, 2020

For c, we've found inline scheduling is best on linux with Kestrel's Socket transport, so that's already the default there. See aspnet/KestrelHttpServer#2573.

For i (and the output pipe write scheduler) we've always used the threadpool as a safe default in case the app developer blocks. This used to be configurable via KestrelServerOptions.ApplicationSchedulingMode, but that option got removed when pipe creation was made a transport responsibility. Generally this did result in faster benchmark runs, but we have not added similar transport options yet.

For o, it would certainly be safe to be inline (we basically do this on a per-HTTP/2-stream in Http2OutputProducer in Kestrel), but the benchmarking I've done before with Kestrel's Socket transport has shown the ioqueue to be slightly faster.

@tmds
Copy link
Owner Author

tmds commented Apr 3, 2020

For c, we've found inline scheduling is best on linux with Kestrel's Socket transport, so that's already the default there

This c is a different one. It controls whether the Socket continuations are run on threadpool or stay on epoll thread. I assume it is safe to inline, because i defers to ThreadPool later.

For i (and the output pipe write scheduler) we've always used the threadpool as a safe default in case the app developer blocks.

ok, I will consider it unsafe to use inline here.

There is another use of ThreadPool not mentioned yet. The Handler gets put on ThreadPool by KestrelConnection class (see stacktrace below of JSON middleware). Though I assume that doesn't make it safe to set i=inline? Do you recall the rationale for running the handler on ThreadPool? Is this also happening for JSON platform benchmark?

For o, it would certainly be safe to be inline (we basically do this on a per-HTTP/2-stream in Http2OutputProducer in Kestrel), but the benchmarking I've done before with Kestrel's Socket transport has shown the ioqueue to be slightly faster.

Yes, benchmarks we ran showed the same thing, though it's hard to understand why.

   at web.JsonMiddleware.Invoke(HttpContext httpContext) in /home/tmds/repos/Tmds.LinuxAsync/test/web/JsonMiddleware.cs:line 20
   at web.PlaintextMiddleware.Invoke(HttpContext httpContext) in /home/tmds/repos/Tmds.LinuxAsync/test/web/PlaintextMiddleware.cs:line 25
   at Microsoft.AspNetCore.HostFiltering.HostFilteringMiddleware.Invoke(HttpContext context)
   at Microsoft.AspNetCore.Hosting.HostingApplication.ProcessRequestAsync(Context context)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequests[TContext](IHttpApplication`1 application)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequests[TContext](IHttpApplication`1 application)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequestsAsync[TContext](IHttpApplication`1 application)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.ProcessRequestsAsync[TContext](IHttpApplication`1 application)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnection.ProcessRequestsAsync[TContext](IHttpApplication`1 httpApplication)
   at System.Runtime.CompilerServices.AsyncMethodBuilderCore.Start[TStateMachine](TStateMachine& stateMachine)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnection.ProcessRequestsAsync[TContext](IHttpApplication`1 httpApplication)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.HttpConnectionMiddleware`1.OnConnectionAsync(ConnectionContext connectionContext)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.ConnectionDispatcher.<>c__DisplayClass8_0.<StartAcceptingConnectionsCore>b__1(ConnectionContext c)
   at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Infrastructure.KestrelConnection`1.ExecuteAsync()

@tmds
Copy link
Owner Author

tmds commented Apr 3, 2020

Though I assume that doesn't make it safe to set i=inline?

I figured out this isn't safe. The stacktrace changes for successive requests on the same connection.
I still have these questions:

Do you recall the rationale for running the handler on ThreadPool?
Is this also happening for JSON platform benchmark?

@tmds
Copy link
Owner Author

tmds commented Apr 3, 2020

Do you recall the rationale for running the handler on ThreadPool?

I think I got this too. If there is already a request in the input pipe the request handling would start, and stops Kestrel from accepting more connections until the handling goes async. It reduces kestrels accept connections per second performance.

Is this also happening for JSON platform benchmark?

I guess so. It doesn't matter much for the TE benchmarks, it only affects the first request from a connection.

@tmds
Copy link
Owner Author

tmds commented Apr 3, 2020

Adding some graphs from benchmarks.

Using threadpool

i=threadpool
c=threadpool

tp2

(x-axis is nr of epoll threads from 0 to ProcessorCount, I cut off the values because excel webview renders them wrong in chrome)

Since epoll threads don't do much work, we don't need many of them.

Here IoQueue has better performance than Inline.
This raises the question: could ThreadPool improve so Inline performs as good as IoQueue?

Using epoll threads

i=inline
c=inline

in2

Since epoll threads do much work, we need a lot of them.

Here IoQueue reduces worse than Inline.

Other

@antonfirsov and I discussed running benchmarks also for this combination:

c=inline
i=threadpool

It would be meaningful also in light of previous discussion because this could be used safely by Kestrel even when user may have blocking handlers.

@davidfowl
Copy link

That would be an ideal option IMO for kestrel and it should be opt-in at the socket level

@halter73
Copy link

halter73 commented Apr 3, 2020

It looks like you answered all of your own questions @tmds 😄.

I agree an option to enable the inline scheduling of app code (i.e. ApplicationSchedulingMode) is something we should add back as a SocketTransportOption. I'll look into this.

@halter73
Copy link

halter73 commented Apr 3, 2020

This c is a different one. It controls whether the Socket continuations are run on threadpool or stay on epoll thread.

That makes sense. IIRC, the reason we run Socket.ReceiveAsync continuations inline on linux, but not Windows, is because on linux the continuation is already dispatched to a threadpool thread, whereas on Windows we're still on an I/O thread.

I assume it is safe to inline, because i defers to ThreadPool later.

Agreed. As long as we're not calling into non-framework code it should be safe to inline.

@tmds
Copy link
Owner Author

tmds commented Apr 3, 2020

As long as we're not calling into non-framework code it should be safe to inline.

@halter73 Is combination of c=inline, i=threadpool safe to use with potentially blocking application code? I think so, since i=threadpool will put that code on the threadpool. Is that correct?

@halter73
Copy link

halter73 commented Apr 3, 2020

Yes. Running continuations inline is safe if all they do is write to a PipeWriter and the reader scheduler is set to the threadpool.

@antonfirsov
Copy link
Collaborator

antonfirsov commented Apr 6, 2020

@tmds @halter73:
I did a comparison between the configurations discussed here.

image

Explanation of labels:

Optimal c=inline c=threadpool
c=inline i=inline a=true o=iothread c=inline i=threadpool a=true o=iothread c=threadpool i=threadpool a=true o=iothread

Note: I did run this on asp-citrine-lin, while last weekend's benchmarks were run on asp-citrine-db which has 5.5 kernel.

@tmds
Copy link
Owner Author

tmds commented Apr 8, 2020

Combining with results from #78 (comment), and taking best results for (c,i) combinations:

  Config 1 Config 2 Config 3 Config 4
c threadpool inline inline threadpool
i threadpool inline threadpool inline
t 1 ProcessorCount ProcessorCount/2 1
o ioqueue iothread/inline ioqueue/inline ioqueue/inline
rps, AIO 900k 1026k/1000k 900k/860k 870k/840k
rps, no AIO 887k 978k/861k
app safe yes no yes yes

Config 1 is most similar to the current behavior.
To do better (900k++), we need to move to config 2, which is not suitable for running user-code that may block the epoll threads.

Some thoughts about o:

  • iothread requires the pipe to be able to use the epoll thread as a Scheduler. Personally, I find the API addition to Socket to get a PipeScheduler not worth the gain we've measured here (+2,6%).
  • ioqueue. I wish we could get rid of ioqueue by making ThreadPool be smarter about this workload, but I don't know if this is feasible.
  • inline. This is my preferred mode because it means doing the send as soon as possible. Any deferring (threadpool, ioqueue, iothread) is going to add some delay. There may be issues with Pipelines due to Add IsFlush to Pipelines ReadResult dotnet/runtime#25059. (And it may not be the best for TE benchmarks.)

edit: added rps, no AIO row

@tmds
Copy link
Owner Author

tmds commented Apr 8, 2020

To do better (900k++), we need to move to config 2

The gain we get in config 2 is thanks to using AIO.
With o=inline and AIO disabled, we get 860k.

@adamsitnik
Copy link
Collaborator

I wish we could get rid of ioqueue by making ThreadPool be smarter about this workload, but I don't know if this is feasible.

@kouvel might be able to answer this question

@kouvel
Copy link

kouvel commented Apr 9, 2020

I wish we could get rid of ioqueue by making ThreadPool be smarter about this workload, but I don't know if this is feasible.

Yes I think that would be feasible and would be a better option than having another pool of threads, but it may have some tradeoffs, would have to try it out.

@kouvel
Copy link

kouvel commented Apr 14, 2020

I misunderstood and thought there was another pool of threads. I did some experiments relevant to using the thread pool directly instead of the IOQueue, so far I have not seen the thread pool performing better. It seems there is benefit from batching and maybe other things. I have posted some more info in our Teams channel. Based on the info I have so far it does not appear to be feasible to remove the IOQueue by using the thread pool instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants