-
Notifications
You must be signed in to change notification settings - Fork 4.9k
More perf improvements for managed WebSocket client on Unix #9412
More perf improvements for managed WebSocket client on Unix #9412
Conversation
"Binary", | ||
"Text", | ||
"CloseOutputAsync"); | ||
nameof(WebSocketMessageType.Close), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: is this going to be "Close" or "WebSocketMessageType.Close"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll be "Close". Do you want it to be "WebSocketMessageType.Close"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Close" is the one we have in Desktop. Asked because I wasn't sure if this would change the exception text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks.
While there are app-compat issues, we should be able to fix #4900 and provide an opt-out for apps that have app-compat issues with sync completions. I saw that at least some of the samples on MSDN for Sockets provide a good pattern where the operations must be checked for sync completion.
I think it may be useful to add the code for WSPerf.exe to GitHub (unless it's very similar to what we already have in ToF). Thanks @stephentoub! LGTM |
} | ||
finally | ||
{ | ||
if (releaseSemaphore) _sendFrameAsyncLock.Release(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: This line breaks rule number 1 :).
... A single line statement block can go without braces but the block must be properly indented on its own line...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Some nit-picky stuff; but otherwise LGTM. |
LGTM. Nice job getting the allocations down! |
6ac2004
to
433b34e
Compare
I could, though it's super simple, so I expect if we do have perf tests, it mirrors one of them. It's basically just a simple echo client: var sw = new Stopwatch();
while (true)
{
int gen0 = GC.CollectionCount(0);
sw.Restart();
for (int i = 0; i < iterations; i++)
{
for (int j = 0; j < messagesPerIteration; j++)
{
await cws.SendAsync(sendBuffer, WebSocketMessageType.Text, true, CancellationToken.None);
}
for (int j = 0; j < messagesPerIteration; j++)
{
while (!(await cws.ReceiveAsync(receiveBuffer, CancellationToken.None)).EndOfMessage) ;
}
}
sw.Stop();
Console.WriteLine(sw.Elapsed.TotalSeconds + ", " + (GC.CollectionCount(0) - gen0));
} |
Thanks, everyone, for the feedback. |
6cf16be
to
aa69412
Compare
We only need to support a single read and a single write at a time, which means we can create two SocketAsyncEventArgs instances and repeatedly reuse them for all read and write operations on the stream. This avoids a significant amount of the overhead involved in reading/writing with the network.
The previous commit was beneficial in avoiding one large set of overhead, but it also has a flaw that, due to a current behavior in Sockets, sending to a socket inevitably completes asynchronously, which means SendFrameAsync almost always hits an await that yields. Since this is a very important code path, until the Socket issue can be addressed, we can optimize the common case to minimize the costs involved.
It's an internal type, so there's no need for the base class or the virtual methods. Also, since WebSocket is public, we want to avoid folks being able to cast Task.AsyncState to WebSocket and call methods on this internal object directly, just as a precaution against strange misuse.
Avoids an extra async method (and its associated allocations) on the path where we typically end up yielding waiting for data to arrive.
Happened to notice these while doing perf work.
And delete a bit of dead code.
Two issues, one preexisting this change and one introduced by it. - Preexisting: TcpClient on Unix simulates socket.Connect(string host, int port) by creating a new Socket for each possible address we're attempting to connect to. As a result, if the TcpClient is Dispose'd before the connection is successful, we won't yet have stored the actual Socket into the TcpClient and we won't end up Dispose'ing the Socket. Then subsequent attempts to send/receive on that socket will continue, as they will be operating on a valid Socket. - New: Previously the Socket would end up being Dispose'd of when we Dispose'd of the TcpClient we were holding on to. now that Dispose happens via the NetworkStream. But we weren't passing ownsSocket: true to the NetworkStream, so when it was Dispose'd, it wasn't Dispose'ing of the Socket, resulting in similar hangs when trying to send/receive in abort scenarios.
- Our closing the socket may trigger our own pending receive to return successfully with 0 bytes read, as if the other end had closed the connection. We need to treat this as an ObjectDisposedException to match the Windows behavior. - If the Socket ends with an operation canceled code, we should throw an OperationCanceledException to match the Windows behavior. - When using wss (https), we use an SslStream, and if the underlying stream is disposed of, SslStream may throw an InvalidOperationException due to the connection no longer being authenticated. That needs to be translated into the appropriate exception type to match Windows behavior.
aa69412
to
2e59fab
Compare
Primary changes:
Other miscellaneous:
Results:
I ran a simple test with a local echo server, where I'd repeatedly send a message and receive its echo. As a baseline, with the Windows WinHTTP implementation sending and receiving 20,000 messages, I get the following. Each line is a run of the 20,000 messages, with the first number being the elapsed time and the second number being the number of gen 0 GCs.
Instead using the managed implementation on Windows, before these changes, I got:
After these changes, using the managed implementation on Windows, I get:
On Linux, I'm using a different local echo server, so the timing numbers aren't directly comparable to the ones on Windows, however before the changes I got:
and after the changes I got:
At this point I'm not planning to do any more proactive work on improving the send/receive performance, as the performance after these changes on Linux looks to be on par with the Windows implementation. There are likely more opportunities which can be addressed individually and reactively, including once #4900 is addressed. There may also be opportunities to improve scenarios other than lots of sending/receiving with the same websocket, e.g. minimizing the overhead associated with creating lots of websockets that each do relatively little work.
cc: @ericeil, @bartonjs, @davidsh, @CIPop
Also fixes #9408 and fixes #9407