Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to memory usage of HttpClient connections and requests #61223

Open
JamesNK opened this issue Nov 4, 2021 · 13 comments
Open

Improvements to memory usage of HttpClient connections and requests #61223

JamesNK opened this issue Nov 4, 2021 · 13 comments
Labels
area-System.Net.Http enhancement Product code improvement that does NOT require public API changes/additions tenet-performance Performance related issue
Milestone

Comments

@JamesNK
Copy link
Member

JamesNK commented Nov 4, 2021

gRPC supports bidirectional streaming over HTTP/2. One of the implications of streaming is an RPC call idling while reading data from the HTTP/2 response stream. This is the client waiting for the next message. It could arrive immediately or in 5 minutes.

For example, a frontend server could have many bidi streams open with a backend server. The gRPC client in the frontend server is waiting to read data from them and holding onto many buffers. The client app has higher memory usage and GC heap fragmentation.

Customer reported fragmentation hotspots (note: they are using .NET 5, some of these may have been addressed in .NET 6):

  • SslStream._internalBuffer
  • Http2Stream._responseBufffer
  • A byte array (address 0000021845447020) in HttpConnection (from an internal memory dump, ping me if you want a copy)
  • Also, opportunity to allow zero-byte reads with responseStream.ReadAsync from user code. Consider HTTP/1.1, HTTP/2 and HTTP/3. This would allow the gRPC client (and other users of HttpClient) to wait until there is data available before renting a buffer.

cc @geoffkizer

@JamesNK JamesNK added area-System.Net.Http tenet-performance Performance related issue labels Nov 4, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Nov 4, 2021
@ghost
Copy link

ghost commented Nov 4, 2021

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

gRPC supports bidirectional streaming over HTTP/2. One of the implications of streaming is an RPC call idling while reading data from the HTTP/2 response stream.

For example, a frontend server could have many bidi streams open with a backend server. The gRPC client in the frontend server is waiting to read data from them and holding onto many buffers. The client app has higher memory usage and GC heap fragmentation.

Customer reported fragmentation hotspots (note: they are using .NET 5, some of these may have been addressed in .NET 6):

  • SslStream._internalBuffer
  • Http2Stream._responseBufffer
  • A byte array (address 0000021845447020) in HttpConnection (from an internal memory dump, ping me if you want a copy)
  • Also, opportunity to allow zero-byte reads with responseStream.ReadAsync from user code. Consider HTTP/1.1, HTTP/2 and HTTP/3

cc @geoffkizer

Author: JamesNK
Assignees: -
Labels:

area-System.Net.Http, tenet-performance

Milestone: -

@geoffkizer
Copy link
Contributor

There are several semi-related issues we should consider addressing, all around reducing memory usage and (in some cases) avoiding long-lived pinned buffers:

(1) Zero-byte read in the HTTP2 connection receive logic. This would allow us to release the HTTP2 receive buffer back to the pool if we don't have any data to receive at the moment, and similarly for SslStream's receive buffer (SslStream._internalBuffer above). Also note the SslStream buffer will be pinned on Windows, since it's performing a read on NetworkStream (in the common case). However, this could introduce overhead in the receive path so we should measure the impact here and consider making this opt-in via a property on SocketsHttpHandler.
(2) Support zero-byte read in the various response stream implementations -- HTTP/1.1, HTTP/2, HTTP/3. This would allow consumers like GRPC to defer buffer usage until data is actually available.
(3) Look at how HttpConnection could reduce memory usage for send and receive buffers. We don't use ArrayPool for this today; maybe we should? We do perform a zero-byte read on idle connections, so we are avoiding the SslStream buffer allocation and pinning here for idle connections; but we don't actually free the HttpConnection's send and receive buffers for idle connections. We might even consider freeing these buffers in other situations, like for a Content-Length-based response where we can just read from the underlying connection into the user's buffer.

@scalablecory
Copy link
Contributor

A PinnedArrayPool would avoid the issue of long pinning. Bonus points, we can avoid GCHandle allocation every time we do I/O. I believe ASP.NET Core has something like this already, perhaps @JamesNK can get numbers on how much it moved the needle.

@scalablecory
Copy link
Contributor

I've also been wanting to try a ref-counted receive buffer. This is a bit involved of an idea that needs massive proving out, but, it's:

  • Use buffers of fixed size, like 4KB or something.
    • Keep a list of available buffers [bufA], [bufB], [bufC]
  • The connection reads into these buffers with scattered I/O.
  • We parse the frame header in the receive loop, ref count the buffer, and send the frame into essentially a Channel for the stream.
  • Once [bufA] ref goes to 0 (has been consumed entirely by the receive loop, and all streams have finished use), it goes to the end of the list.
    • We can have some heuristic to decide when there are too many free buffers and return them to pool.

For long-running H2 streams, this would allow us to not make two I/Os while also helping memory usage.

@stephentoub
Copy link
Member

A PinnedArrayPool would avoid the issue of long pinning.

It would. It would, however, also require more due diligence on returning arrays to the pool and on the pool keeping such arrays more aggressively (which would in turn have impact on how it's used). Such a pool would likely be backed by the pinned object heap, and current loosy-goosy-ness around how non-aggressively we return arrays to the pool as well as how liberally the pool is able to drop them could result in the POH becoming more heavily fragmented. This is one of the reasons we've shied away from just making ArrayPool.Shared itself use the POH.

@brianrob has been doing some thinking around this.

@wfurt
Copy link
Member

wfurt commented Nov 5, 2021

We may be able to eliminate SslStream._internalBuffer in some cases. I had PR up but it dod not made it to 6.0.
Note that the _internalBuffer comes from pool and returned every time when empty.

@geoffkizer
Copy link
Contributor

@wfurt What did you have in mind specifically?

@wfurt
Copy link
Member

wfurt commented Nov 5, 2021

If user buffer is big enough we can read into it and decrypt in place.

@geoffkizer
Copy link
Contributor

We have also talked about allowing Stream users to get direct access to buffered data for Streams that have internal buffers like SslStream. This would allow the HTTP2 code to avoid allocating its own buffer in at least some cases.

See #58216 -- "Peeking".

@geoffkizer
Copy link
Contributor

I added an issue specifically to track response stream zero-byte read support (my bullet (2) above): #61475

@karelz karelz added enhancement Product code improvement that does NOT require public API changes/additions and removed untriaged New issue has not been triaged by the area owner labels Nov 16, 2021
@karelz
Copy link
Member

karelz commented Nov 16, 2021

Triage: 0-byte reads is known work, tracked separately in #61475.

Next step would be to analyze the E2E scenario if there is more stuff to do to help.

There may be opportunity to save some allocation in SslStream per @wfurt

@MihaZupan
Copy link
Member

MihaZupan commented Dec 8, 2021

@JamesNK #61913 added support for zero-byte reads on response content streams. gRPC should be able to take advantage of this. Since you're using a large copy buffer, this should account for most of your memory. YARP's StreamCopier may be used as a reference for similar copy loops.

Potential improvements for reducing the memory footprint (some were already mentioned above):

  1. HttpConnection allocates a read+write buffer (4 kB each). We should release and pool these buffers when possible
    1.1. We already issue zero-byte reads on idle connections to detect unexpected data/early EOFs. We can return the read buffer until that read completes. Overlaps with SocketsHttpHandler should proactively handle EOF from server on idle HTTP/1.1 connections #60729
    1.2. When a connection is idle, there is no need for us to keep around the write buffer
    1.3. We can look into being even more aggressive with returning the buffer when writing the request content (e.g. in cases where we bypass the buffer to write directly to the underlying stream)
  2. On HTTP/2, we can issue a zero-byte read in the receive loop, allowing us to return our receive buffer and allow the transport stream to do the same
    2.1. This may have a (rather small) perf impact on active connections. Asp.Net allows users to opt-out by toggling a WaitForDataBeforeAllocatingBuffer flag. Do we care to make it configurable? A GitHub search shows virtually no use, and if the user cares a lot, they could provide a stream that no-ops on zero-byte reads.
  3. The improvements enabled by Support zero-byte reads on HTTP response streams #61913 account for the majority of potential memory savings, but require that users proactively issue zero-byte reads to take advantage of it. We should consider improving our CopyToAsync helpers to perform zero-byte reads on their behalf. This would account for the biggest win for the majority of users, without code changes on their part. Thoughts?

Potential improvements:

  1. 8 kB per HTTP/1.x connection
  2. 80 kB per HTTP/2 connection (16 kB receive buffer + 32 kB write buffer + 32 kB SslStream buffer)
  3. 40 kB per active request during response content transfers (8 kB receive buffer + 32 kB SslStream buffer)

@karelz karelz modified the milestones: 7.0.0, Future Jul 21, 2022
@karelz
Copy link
Member

karelz commented Jul 21, 2022

Triage: We improved the E2E with zero-byte reads in #61475. Further buffer size adjustment (as suggested above by @MihaZupan) are Future.

@MihaZupan MihaZupan changed the title [HTTP/2] Memory usage while reading from long running response streams Improvements to memory usage of HttpClient connections and requests Dec 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.Net.Http enhancement Product code improvement that does NOT require public API changes/additions tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

7 participants