-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTP/2 has poor throughput with content larger than initial receive window size #43086
Comments
Tagging subscribers to this area: @dotnet/ncl |
This is almost certainly due to our using a fixed size receive window rather than growing it based on utilization/latency. I verified our bandwidth utilization for HTTP/2 is affected -- the result is that if we're receiving something larger than the per-stream receive window, the bandwidth is negatively affected as latency grows. Kestrel was affected by this too, when I last checked, though they used a larger buffer size and so they would be a little less affected. Regardless, we should try to share this window management code as part of this fix. |
@scalablecory I ran my repro with tracing enabled for the Microsoft-System-Net-Http trace provider, and I see a ton of window increments being generated for very small sizes, typically the exact size of the previously received frame. Example:
Examining an HTTP/2 transaction with (fumbles about for any other implementation I can lay hands on, grabs Firefox) and I noticed two things:
I can share the full log + decrypted Firefox capture should you be so interested. |
There's code to defer window updates until they meet a minimum threshold. The threshold is 1/8th of window size. For the connection window, the window size is large -- 64MB. For the stream window, it's 64K, which explains why you see updates each time when you receive 8K. So it's really the same problem here -- stream receive window is too small. See #1587. We either need to If Kestrel already has an implementation for (a) then we should just steal it. |
One might look at compatibly-licensed MsQuic to find a good way to dynamically size the receive window. |
Kestrel connection and streams window sizes are not dynamic, but they are configurable. The default sizes are 128KB for the connection window and 96KB for the stream window. Kestrel does send less window updates because Kestrel's threshold is half of the window size instead of 1/8th of the window size for both the connection and stream. This hasn't been tuned. I do worry that sending window updates too infrequently causes stalls in high-latency scenarios. We should work together to improve this in Kestrel as well. |
Any updates on this? This affects quite a bit of run-of-the-mill application logic that reaches out to 3rd party APIs. I haven't tested it with large downloads (which is another use-case I have), but I assume that is negatively affected considerably. |
@kamronbatman the issue being discussed in this PR will only affect downloads >64KB. If you've got a perf case with content smaller than that, we'd be very interested -- please file a separate issue so we can keep each one focused on a single problem. |
I have downloads ranging from 135 bytes to 600MB. So in general I am interested to know the status since it definitely affects my use cases one way or another. :) |
Is there any workaround until .NET 6 release? We are facing this problem since .NET Core 3. |
@DenSmoke My solution is basically this: If you know the response body will be larger than 64 KB, then use HTTP 1.1. You can do so like this: It looks like the .NET team will improve things for .NET 6, but until then, this seems to be the best workaround. |
This impacts gRPC clients that send/receive large messages. Unfortunatly switching to HTTP/1.1 isn't an option with gRPC. |
+1 to @JamesNK's comment, a switch from the Google gRPC client to the .NET client in a system that does ~100MB downloads results in a download speed of 4.5MB/sec over WAN, as opposed to link saturation (1Gbps, 125MB/sec max theoretical) like the Google client. If I park the .NET client on the same datacenter backbone as the server it goes back to link saturation. |
@Erikma I see very similar results. |
We have plans to address the issue in .NET 6 -- we have proposal for a proper fix and we also started working last week on initial window update size settings to enable advanced custom configuration (see #49897). |
That's pretty unfortunate especially if there will be large changes needed to go from .NET 5 to .NET 6. Damn. |
Why would you expect large changes to go from 5 to 6? |
I have no expectations that there will be, and I am also hoping there won't. :) |
Fixes #43086 by introducing automatic scaling of the HTTP/2 stream receive window based on measuring RTT of PING frames.
@antonfirsov @karelz Did this make it into net6-preview6 or do we need to wait for preview7 to retest? |
It is part of Preview 7, you can check daily builds, or preview branch early builds to check it out. |
Description
SocketsHttpHandler request performance is roughly 5X slower when using HTTP/2.0 than when using HTTP/1.1 to the same destination. There are no meaningful variations in flow control characteristics at the transport (TCP) protocol layers. It appears to be an issue in the HTTP/2.0 implementation itself.
It does not appear to be a server-side issue, as the equivalent HTTP/2.0 request issued using WinHttpHandler does not exhibit the performance reduction against the same HTTP/2.0 server.
Minimal repro:
Configuration
Windows 20H2. .NET Core 3.1 & .NET 5.0 were both tested with the same results.
Regression?
Not certain, I didn't test back before .NET Core 3.1.
Data
Example run of the above repro to my test server at 4ms RTT:
SocketsHttpHandler (Success: True) HTTP/1.1 in 622ms (42.000 MB/s)
SocketsHttpHandler (Success: True) HTTP/2.0 in 2220ms (11.000 MB/s)
WinHttpHandler (Success: True) HTTP/1.1 in 465ms (56.000 MB/s)
WinHttpHandler (Success: True) HTTP/2.0 in 489ms (53.000 MB/s)
Analysis
I took a packet capture while exercising the above repro and noted a number of things. First and foremost, SocketsHttpHandler (in the HTTP/1.1 case) and WinHttpHandler (in both the HTTP/1.1 and HTTP/2 cases) appear to be exercising the congestive limit of the network I am on. Additionally, my server is capable of extracting TCP ESTAT data for each request, and the results are unambiguous (and confirmed via packet capture) - congestive loss conspired to reduce the congestion window and cause slower throughput overall.
There does appear to be an issue in read rate in the SocketsHttpHandler HTTP/1.1 case that causes the receive window to grow more slowly (accounting for the very repeatable pullback of ~10MB/s at 4ms RTT between SocketsHttpHandler and WinHttpHandler). That's not the issue in this bug, though.
The real issue is the enormous pullback for HTTP/2 in SocketsHttpHandler. That transfer rate is not explained by any activity at the TCP layer. Receive window space is ample, no congestive loss is observed (and in fact at the server there is ample congestion window space available). TCP is mostly not sending data because it has not been provided data to send. This would indicate an issue at the server, except alternative HTTP/2 client implementations do not exhibit this behavior to this same server (e.g. WinHttpHandler, curl, etc.). Note that in the above dataset, WinHttpHandler using HTTP/2 can hit 53MB/s (again, the congestive limit of this network confirmed with server-side TCP ESTATS and a packet capture).
Interestingly, a packet capture at the client shows that the segments follow a transmission pattern where a small number of bytes are sent punctuated by an RTT-based delay. This indicates a buffering issue.
Given the lack of TCP receive window flow-control impact and the lack of congestive loss in the slow SocketsHttpHandler HTTP/2 case, I suspect an issue in the implementation of HTTP/2 flow control in SocketsHttpHandler is causing the server to starve the TCP connection for bytes.
The text was updated successfully, but these errors were encountered: