-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve buffer handling inside SslStream #49743
Conversation
Tagging subscribers to this area: @dotnet/ncl, @vcsjones Issue DetailsThis is followup-up on discussion in #49000
This starts in ResetReadBuffer() but with the Memory, the impact on rest of the code is pretty small.
Testing: The SchSendAux* test do depend on particular read size and they take it guess that the AUX record is present - Windows only. That seems generally problematic and since I did not find any good way how to do better I deleted the tests. I feel that is something the PAL is responsible for and I don't see big value in the tests. Running the existing tests (SSL & HTTP) I found it difficult to hit some of the code branches. As produces write to SslStream we write it to Memory or socket and that shows up as whole on the other side. I added RandomIOStream that creates short reads and writes e.g. reads less then asked for and splits writes to multiple writes to underlying stream. That helped me to uncover interesting cases I may have missed otherwise. I still plan to run more benchmarks but I would like to get some feedback before that is all completed. contributes to (closed) #49019
|
@@ -41,6 +41,7 @@ private enum Framing | |||
private const int ReadBufferSize = 4096 * 4 + FrameOverhead; // We read in 16K chunks + headers. | |||
private const int InitialHandshakeBufferSize = 4096 + FrameOverhead; // try to fit at least 4K ServerCertificate | |||
private ArrayBuffer _handshakeBuffer; | |||
private bool receivedEOF; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: _receivedEOF
Giving compile errors "Field 'SslStream._rentedBuffer' is never assigned to, and will always have its default value null" |
I'm looking. _rentedBuffer is assigned in ResetReadBuffer(). |
|
||
Task reader = Task.Run(() => | ||
{ | ||
SHA256 hash = SHA256.Create(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SHA256 hash = SHA256.Create(); | |
using SHA256 hash = SHA256.Create(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we had static's now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, you can just do SHA256.HashData(...)
. No Create() and no dispose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works only when you have whole chunk. Here we feed TransformFinalBlock
one chunk at the time.
public override long Position | ||
{ | ||
get | ||
{ | ||
return _innerStream.Position; | ||
} | ||
|
||
set | ||
{ | ||
_innerStream.Position = value; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public override long Position | |
{ | |
get | |
{ | |
return _innerStream.Position; | |
} | |
set | |
{ | |
_innerStream.Position = value; | |
} | |
} | |
public override long Position | |
{ | |
get => _innerStream.Position; | |
set => _innerStream.Position = value; | |
} |
byte[] dataToCopy = RandomNumberGenerator.GetBytes(bufferSize); | ||
byte[] srcHash = SHA256.Create().ComputeHash(dataToCopy); | ||
|
||
var clientOptions = new SslClientAuthenticationOptions() { TargetHost = "localhost", }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var clientOptions = new SslClientAuthenticationOptions() { TargetHost = "localhost", }; | |
var clientOptions = new SslClientAuthenticationOptions() { TargetHost = "localhost" }; |
internal int _internalOffset; // This is offset where encrypted data starts. | ||
internal int _internalBufferCount; // Length of encrypted data. | ||
internal int _decryptedBytesOffset; // Offset of decrypted data relative to _internalBuffer. | ||
internal int _decryptedBytesCount; // Number of already decrypted bytes. Really valid only ion combination with _rentedBuffer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internal int _decryptedBytesCount; // Number of already decrypted bytes. Really valid only ion combination with _rentedBuffer. | |
internal int _decryptedBytesCount; // Number of already decrypted bytes. Really valid only in combination with _rentedBuffer. |
It feels to me like this PR is mixing together several changes that we should instead implement and measure independently. In particular: I think it would be better to tackle each one of these independently (starting with (1)), validate the win in benchmarks or user scenarios etc, rinse and repeat. Thoughts? Also:
|
In particular for (2): I think we could have two functions at the PAL layer -- something like DecryptToBuffer and DecryptInPlace. On Linux, DecryptInPlace would just call DecryptToBuffer with the original buffer. Thoughts? |
I also wonder how some of these (particularly 2 and 3) would relate to implementing a custom BIO, like we have discussed in the past. Is this something we are still considering? How would we prioritize it relative to other stuff here? |
macOS for example does not use BIO. (and we will need to re-write it when to go for ALPN & Tls13) The 2 & 3 is essentially same code. For sake of this PR I can roll that back. We could split the Decrypt*() into two parts but Windows always decrypt in place and Unix always copy. I was originally thinking about hiding the difference in Windows PAL but we may end up with unnecessary copy. I'll think about it more and we can discuss details on the PR. |
Yeah, but we don't really care about perf for macOS. I mean if we can get a win on macOS too, then great, but it's not a priority.
As I understand it, we are copying all input data into the memory BIO today. A custom BIO would allow us to avoid this copy, and possibly have other benefits as well. It seems like the benefit of that would by definition be more significant than just (2) above. So if the BIO win is insignificant, then (2) is as well; conversely, if we think there may be a win here, then the BIO win seems like it would be larger than (2).
I don't understand this. (3) applies to both Linux and Windows, right? Whereas (2) is only a benefit for Linux.
Sounds good. In general we should strive to encapsulate the platform differences in the PAL. This is not always possible, but if it's not then at the least we need to expose different plat-specific functions from the PAL, as opposed to having one that looks like it's platform neutral but actually is not. |
I suspect it might be causing a perf regression in YARP HTTPS (HTTP 1.1) scenario. Let's investigate that before merging. |
What's causing a regression? |
This PR. I tried running YARP with a privately built System.Net.Security.dll and saw a perf drop along with some bad responses in YARP HTTPS - HTTP (HTTP 1.1) scenario on Windows. |
Draft Pull Request was automatically closed for inactivity. Please let us know if you'd like to reopen it. |
😢 |
It is still coming but slow @davidfowl |
@wfurt Can we revive this? |
yes, it is on my todo list for 8 @davidfowl. I think I need better plan for Windows. |
Is there an issue tracking it? |
This is followup-up on discussion in #49000
This really have 2 separate change + some cleanup. I may be able to do them independently but they are somewhat intertwined so I did them and test them together.
Instead of doing it in PAL as original prototype this does it in ReadAsyncInternal() in platform independent (mostly) way.
This had two basic challenges to that complicated the matter.
the decrypted data are always smaller (or frame may produce 0 bytes) so when decrypted, they would land in disjoined fragments. using single offset and sound become problematic. To solve this, I turned the input to separate input
Span<byte>
. That also avoids several checks and calculations for the offset. It also makes the code less confusing IMHO as we no longer use "output" variable as input. In general, once we read the frames there is no longer async boundary and using Span as much as possible is nice and easy.While processing the extra frames we may reach end of stream. Either by actually reaching EOF on underlying stream or by reading end processing TLS close notification. To solve this I added extra bool so we can return previous data and we return EOF in next round.
Memory<byte>
. That either points to rented buffer or it can point to user provided buffer if that is large enough. That has three benefits:This starts in ResetReadBuffer() but with the Memory, the impact on rest of the code is pretty small.
Testing:
The SchSendAux* test do depend on particular read size and they take it guess that the AUX record is present - Windows only. That seems generally problematic and since I did not find any good way how to do better I deleted the tests. I feel that is something the PAL is responsible for and I don't see big value in the tests.
Running the existing tests (SSL & HTTP) I found it difficult to hit some of the code branches. As produces write to SslStream we write it to Memory or socket and that shows up as whole on the other side. I added RandomIOStream that creates short reads and writes e.g. reads less then asked for and splits writes to multiple writes to underlying stream. That helped me to uncover interesting cases I may have missed otherwise.
I still plan to run more benchmarks but I would like to get some feedback before that is all completed.
fixes #49000
fixes #49086
contributes to (closed) #49019