-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce memory usage by SSLStream #49019
Comments
Tagging subscribers to this area: @dotnet/ncl, @vcsjones Issue DetailsDescriptionFor long running scenarios where reads are pending and data is only occasionally sent back and forth, the cost of using SSLStream is high from a memory POV. Today SSL Stream allocates a 8K buffer for the handshake and a 32K buffer per read. It optimizes for reading the TLS header and payload into the same buffer and in the end copies that to the user specified buffer. Regression?No DataA demo of 5000 websocket connections connected to an ASP.NET Core server using Kestrel: Here's a heap dump sorted by inclusive size. Most of what is being held onto is array pool buffers. Allocation profile of 5000 websocket connections. You can see the array pool has the most byte[] allocations. They're mostly in the handshake and reading (I'm not writing in this app), just sitting mostly idle with ping pongs: AnalysisWe support specifying the The problem is SslStream has 5 constructors... public class SslStream
{
+ public SslStream(Stream innerStream, bool leaveInnerStreamOpen, RemoteCertificateValidationCallback userCertificateValidationCallback, LocalCertificateSelectionCallback? userCertificateSelectionCallback, EncryptionPolicy encryptionPolicy, int bufferSize): base(innerStream, leaveInnerStreamOpen)
}
|
We need the entire frame buffered in order to decrypt it. So we can't allow the user to dictate the buffer size. |
I do wish we used the same buffer for the handshake and the app data. |
I don't understand this. Can you elaborate? Is it because the user has to pass a minimum size buffer in to decrypt the frame? |
BTW the handshake buffer should be returned back when handshake finishes. So it may not really matter. We could unify them but the renegotiation path would need to done carefully. e.g. handshake starts again during read/write |
It feels like this could be driven by buffer size passed to Read, right? If somebody passes buffer size of 1, we probably d not need huge buffer. While somebody can pass in large chunk it suggest that build read (e.g. multiple frames make make sense). This would have benefit that it can be related to HTTP data e.g. be more flexible based on content rather than set once in constructor. The note about the frames: I'm not sure if we can always decrypt less than one TLS Frame. While the API (schannel/OpensSSL) is not about frames, the buffering may happen internally. We may not see managed buffers but the memory may be taken anyway. Also note that the buffers are transient and rented from pool. That are not necessarily linked to life cycle of the stream. |
That's right, if the buffer is passed to ReadAsync is
That sounds reasonable. It's really about controller the memory usage of the internal buffer. FileStream has a similar optimization so I took from prior art. If you feel like we could mostly avoid the buffer completely in SSLStream without that, then I'll defer to you.
It's much better from a memory POV which is why you should let the caller control it so that they can optimize for the scenario they care about. In ASP.NET Core, the Stream that SSLStream wraps doesn't make sys calls, it's buffered so that cost is low compared to if you were wrapping a NetworkStream and making sys calls per read.
Take a look at the memory profile above. These reads are long running and the buffers will remain rented from the pool for as long as these reads are pending, which is a long time. The memory allocated today in a typical websocket scenario on the server side is 32K (SSLStream) + 4K (Kestrel's buffer) * number of websocket connections. We're extremely memory efficient without TLS for a number of reasons, but here's a working set comparison for the same workload: 5000 websocket connections without TLS - 200MB This also exacerbates the fact that the Kestrel memory pool doesn't currently shrink (our problem not yours). But this is less of an issue with non-TLS because we basically only use the memory pool once IO is ready. |
I will take a look @davidfowl. I'm remote this week and off-grid next week. But I think we should be able to look into this in 6.0. |
We have to pass the entire frame to SCHANNEL. Otherwise it can't decrypt the frame, and will give us an error like SEC_E_INCOMPLETE_MESSAGE. |
I see, so we can read the frame header to get the length and determine if we can full fill the request using the user's buffer or decide if we need to need an internal buffer to complete the operation successfully. |
One thing we could potentially do here is release the SslStream read buffer back to the pool when it becomes empty. This would add a small cost of acquiring/releasing the buffer in scenarios where we are reading in a tight loop, but it's probably not rhat expensive -- we should just measure to be sure. That said, I'm not sure how much this helps in practice. It only helps when the user's read code does something like this: It could be that WebSocket or something else that we care about would fit this pattern; not sure... thoughts? |
We want to read as much as we can in a single read. Syscalls are expensive, as you know :) |
See the comment above:
|
That is trade-off IMHO read-ahead and maybe use more memory or try to read as much as possible. It probably does not belong to SslStream but I have seen systems where the perf vs memory trade-off changes depending system state e.g. fast by default, cheap if resources are low. |
Stepping back for a minute, I would say that the BCL components do have to make tradeoffs to be as generic as possible, should also be as policy free as they can be and where they can't, let the caller choose. In this case, I'd like to tradeoff throughput for memory. |
I agree, that's a reasonable thing to do; the trick is figuring out how to do this effectively. |
I think it would help me if I understood better the scenarios where you would want to trade off throughput for memory. Is it for WebSockets that go idle? Something else? |
Yep, that's the typical scenario, lots of concurrent connections with very low traffic. Where a connected client/device is waiting for a server initiated notification. That's the scenario I'm currently testing. We optimize for this scenario at multiple layers in the stack by doing zero byte reads, but we need to do it in a couple more places to make it more optimal. |
Yeah, I almost mentioned this... I wonder if some sort of zero byte read support in SslStream would be useful here. |
|
SslStream allocations are high on the list for SqlClient performance as well, improvements here would be welcome. |
It already supports it. #37122 I think you're onto something, here's a strawman:
|
Didn't know that... awesome! I think we could do something even simpler here... When the user does a 0 byte read, and there's nothing in the internal buffer, then we can: Thoughts? Edit: As you said above, we can use the 0 byte read as an "indicator" of user desire here... specifically that they don't want buffers tied up until there is data available, and they are willing to trade off at least a small amount of throughput for this. |
Keep in mind |
Yes that works as well. I guess we'd end up allocating the buffer only until the entire frame is read.
Kestrel doesn't use a NetworkStream, that wouldn't be viable. I discussed this with @stephentoub this morning and he was thinks it might be possible to retcon the definition to allow 0 byte reads to Streams to not be an error. If we're not OK passing on the 0 byte read to the underlying stream then we could go with the approach where we read the header, its 5 bytes so it's within the range 😄 |
Special-case what? SslStream provides the "wait for data" semantics if the stream it's wrapping does as well. So as long as Kestrel uses such a stream, everything "just works". If someone wants to substitute their own custom stream, they would need to support 0-byte reads as well. If they're using a stream today that doesn't and Kestrel starts depending on it, yeah, they could be broken, but ASP.NET has taken much larger breaking changes :)
0-byte reads shouldn't be an error. Behavior is either:
There are basically three kinds of streams:
Almost every stream we have in category (1) supports 0-byte reads: I'm aware of one or two cases where that's not the case, but I think we can consider that a bug and just fix it (especially since in one case it's an inconsistency across platforms). (2) isn't relevant here. (3) is where I've seen the most inconsistency, but in most cases 0 isn't special-cased and it just exposes the underlying behavior. I think we should update guidance around Streams to clarify expected behavior for 0-byte reads.
and ensure our Streams map accordingly. But, I don't think any of this really matters for the discussion at hand. Kestrel just needs to state that Streams it uses need to support 0-byte read behavior. |
I'm looking at the code and I don't see where it's actually relying on a zero-byte read from the underlying stream. That is, SslStream is correctly implementing "wait for data" but it's doing it by simply always reading into its internal buffer. So I think the win we could harvest here is to do the following: If the user does a zero-byte read, and our internal buffer is empty (of both encrypted and decrypted data) and thus returned to the pool, then do a zero-byte read against the underlying stream and wait for it to return, before we allocate the internal buffer. This would mean that we wouldn't be holding on to the internal buffer during the zero byte read. Thoughts? |
Exactly. SslStream is in my category (3). |
I think that's reasonable. We just need to be aware that we may be wrapping a stream that doesn't have the wait for data semantics. In practice I think what that really means is, whereas we might otherwise have chosen to do a synchronous read (since we know there's data available), we still need to do the subsequent read asynchronously, just in case there wasn't actually data available. |
And to be clear, what are you proposing we do for category (3) in general, in terms of guidance/conformance tests/etc? Are you saying these should do what I'm proposing for SslStream above? That is: When the user does a zero-byte read, and the stream needs more data from the underlying stream, then the wrapping stream should: I can imagine adding a test to the conformance tests to this. Edit: To be clear, a conformance test could test for (1) above; (2) is an implementation detail not directly visible to the user. |
Yep, but as you say, the result here is benign and we would end up doing basically what we do today, just with the added cost of a zero-byte read that always succeeds immediately. |
Basically, don't add an upfront guard for |
That's my bullet (1) above. I think suggesting (2) is useful as well, because that's what would actually help with the original scenario here.
I think it's more complicated than that in general; wrapping streams are typically managing internal buffers etc. But that's a detail. |
That's what I meant by "or the equivalent" :) |
Description
For long running scenarios where reads are pending and data is only occasionally sent back and forth, the cost of using SSLStream is high from a memory POV. Today SSL Stream allocates a 8K buffer for the handshake and a 32K buffer per read. It optimizes for reading the TLS header and payload into the same buffer and in the end copies that to the user specified buffer.
Regression?
No
Data
A demo of 5000 websocket connections connected to an ASP.NET Core server using Kestrel:
Here's a heap dump sorted by inclusive size. Most of what is being held onto is array pool buffers.
Allocation profile of 5000 websocket connections. You can see the array pool has the most byte[] allocations.
They're mostly in the handshake and reading (I'm not writing in this app), just sitting mostly idle with ping pongs:
Analysis
We support specifying the
buffer size
forSslStream
, much like FileStream, so that the user provided buffer can be used instead of the internal buffer to reduce memory usage. A bufferSize of 1 would signal to the SslStream that the buffer shouldn't be used.I don't think we need a handshake buffer size...
The problem is SslStream has 5 constructors...
public class SslStream { + public SslStream(Stream innerStream, bool leaveInnerStreamOpen, RemoteCertificateValidationCallback userCertificateValidationCallback, LocalCertificateSelectionCallback? userCertificateSelectionCallback, EncryptionPolicy encryptionPolicy, int bufferSize) }
The text was updated successfully, but these errors were encountered: