Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

More perf improvements for managed WebSocket client on Unix #9412

Merged
merged 10 commits into from
Jun 15, 2016

Conversation

stephentoub
Copy link
Member

@stephentoub stephentoub commented Jun 14, 2016

Primary changes:

  • NetworkStream's Read/WriteAsync simply uses the underlying Socket's Begin/EndReceive/Send methods, wrapped in tasks. While that could be optimized further, we have a more constrained scenario in WebSockets, in that we only support a single read at a time and a single write at a time. As a result, we can create, cache, and reuse a single SocketAsyncEventArgs instance for reading and another for writing, using the socket's Send/ReceiveAsync methods with that cached args instance. This avoids a bunch of per operation allocation.
  • Due to #4900 (and the behavior that was made to match on Unix), successful socket operations never complete synchronously. This means that code which awaits them typically ends up needing to yield, due to the callbacks being invoked asynchronously, even if the socket operation completed synchronously, which is very likely in the case of a send. Our SendFrameAsync method suffers from this, as it really just delegates to the socket's send method, albeit with a tiny amount of pre- and post-work. The real solution is to make it so that Socket.SendAsync can complete synchronously, after which SendFrameAsync will also typically complete synchronously and generally be allocation-free. Until then, though, since this is a hot path, we can optimize it for the common case to reduce the amount of allocation and general work involved when the socket operation does complete asynchronously.
  • For chatty cases where each send expects a receive, it's likely that each receive needs to read from the network. This means we end up going through several layers of async methods (ReceiveAsyncPrivate, EnsureBufferContainsHeaderAsync, EnsureBufferContainsAsync), which will all end up yielding and allocating when the read on the network doesn't complete synchronously. Since EnsureBufferContainsHeaderAsync is small and only used from one place, we can avoid all of the allocations associated with it by simply inlining it into its caller (ReceiveAsyncPrivate).

Other miscellaneous:

  • Removed the base class from the internal ManagedWebSocket. It's unnecessary and leads to virtual methods on the instance. More importantly, though, we end up potentially handing out this instance via a Task's AsyncState, and we don't want folks to be able to cast to the base class and call methods without going through the thin ClientWebSocket wrapper.
  • A bunch of resources weren't being used. I deleted them.
  • WebSocketReceiveResult had non-readonly backing fields for each of its properties, which had private setters purely to be able to set the properties in the constructor. Now that we're using C# 6, we can simply make these get-only properties, eliminating the setters and allowing the compiler to make the backing fields readonly.
  • Used nameof in a few more places.

Results:
I ran a simple test with a local echo server, where I'd repeatedly send a message and receive its echo. As a baseline, with the Windows WinHTTP implementation sending and receiving 20,000 messages, I get the following. Each line is a run of the 20,000 messages, with the first number being the elapsed time and the second number being the number of gen 0 GCs.

> corerun WSPerf.exe 28745 20000 1
2.6906097, 5
2.7209642, 5
2.6802832, 5
2.7098154, 5
2.7061351, 6
2.6964876, 5
2.7943476, 5
2.7638309, 5
2.7918148, 5
2.744804, 6

Instead using the managed implementation on Windows, before these changes, I got:

> corerun WSPerf.exe 28745 20000 1
3.1177403, 22
3.2149994, 22
2.6798198, 23
2.8078613, 23
2.6670316, 24
2.6744515, 23
2.6619906, 24
2.7596847, 23
2.6485313, 24
2.6559615, 23

After these changes, using the managed implementation on Windows, I get:

> corerun WSPerf.exe 28745 20000 1
2.576682, 9
2.6461474, 9
2.5376302, 9
2.5192637, 9
2.5189807, 10
2.5227088, 9
2.5305238, 9
2.5696738, 9
2.5766855, 9
2.5433629, 9

On Linux, I'm using a different local echo server, so the timing numbers aren't directly comparable to the ones on Windows, however before the changes I got:

$ ./corerun WSPerf.exe 7681 20000 1
1.6772411, 16
1.2316513, 11
1.1890394, 10
1.7308767, 19
1.1968002, 11
1.5696637, 19
1.5461821, 18
1.2686805, 12
1.1765528, 10
1.124586, 11

and after the changes I got:

$ ./corerun WSPerf.exe 7681 20000 1
1.2716779, 5
1.0913687, 5
1.0198689, 5
1.1038351, 4
1.0823826, 4
1.0074977, 4
1.0132344, 5
1.0704005, 4
1.1775556, 6
1.0341026, 6
1.0462667, 4

At this point I'm not planning to do any more proactive work on improving the send/receive performance, as the performance after these changes on Linux looks to be on par with the Windows implementation. There are likely more opportunities which can be addressed individually and reactively, including once #4900 is addressed. There may also be opportunities to improve scenarios other than lots of sending/receiving with the same websocket, e.g. minimizing the overhead associated with creating lots of websockets that each do relatively little work.

cc: @ericeil, @bartonjs, @davidsh, @CIPop
Also fixes #9408 and fixes #9407

"Binary",
"Text",
"CloseOutputAsync");
nameof(WebSocketMessageType.Close),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: is this going to be "Close" or "WebSocketMessageType.Close"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll be "Close". Do you want it to be "WebSocketMessageType.Close"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Close" is the one we have in Desktop. Asked because I wasn't sure if this would change the exception text.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks.

@CIPop
Copy link
Member

CIPop commented Jun 14, 2016

Due to #4900 (and the behavior that was made to match on Unix)

While there are app-compat issues, we should be able to fix #4900 and provide an opt-out for apps that have app-compat issues with sync completions. I saw that at least some of the samples on MSDN for Sockets provide a good pattern where the operations must be checked for sync completion.

corerun WSPerf.exe 28745 20000 1

I think it may be useful to add the code for WSPerf.exe to GitHub (unless it's very similar to what we already have in ToF).

Thanks @stephentoub!

LGTM

}
finally
{
if (releaseSemaphore) _sendFrameAsyncLock.Release();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: This line breaks rule number 1 :).

... A single line statement block can go without braces but the block must be properly indented on its own line...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@bartonjs
Copy link
Member

Some nit-picky stuff; but otherwise LGTM.

@ericeil
Copy link
Contributor

ericeil commented Jun 14, 2016

LGTM. Nice job getting the allocations down!

@stephentoub
Copy link
Member Author

I think it may be useful to add the code for WSPerf.exe to GitHub (unless it's very similar to what we already have in ToF).

I could, though it's super simple, so I expect if we do have perf tests, it mirrors one of them. It's basically just a simple echo client:

var sw = new Stopwatch();
while (true)
{
    int gen0 = GC.CollectionCount(0);
    sw.Restart();
    for (int i = 0; i < iterations; i++)
    {
        for (int j = 0; j < messagesPerIteration; j++)
        {
            await cws.SendAsync(sendBuffer, WebSocketMessageType.Text, true, CancellationToken.None);
        }
        for (int j = 0; j < messagesPerIteration; j++)
        {
            while (!(await cws.ReceiveAsync(receiveBuffer, CancellationToken.None)).EndOfMessage) ;
        }
    }
    sw.Stop();
    Console.WriteLine(sw.Elapsed.TotalSeconds + ", " + (GC.CollectionCount(0) - gen0));
}

@stephentoub
Copy link
Member Author

Thanks, everyone, for the feedback.

@stephentoub stephentoub force-pushed the websocket_networkstream branch 2 times, most recently from 6cf16be to aa69412 Compare June 15, 2016 13:08
We only need to support a single read and a single write at a time, which means we can create two SocketAsyncEventArgs instances and repeatedly reuse them for all read and write operations on the stream.  This avoids a significant amount of the overhead involved in reading/writing with the network.
The previous commit was beneficial in avoiding one large set of overhead, but it also has a flaw that, due to a current behavior in Sockets, sending to a socket inevitably completes asynchronously, which means SendFrameAsync almost always hits an await that yields.  Since this is a very important code path, until the Socket issue can be addressed, we can optimize the common case to minimize the costs involved.
It's an internal type, so there's no need for the base class or the virtual methods.  Also, since WebSocket is public, we want to avoid folks being able to cast Task.AsyncState to WebSocket and call methods on this internal object directly, just as a precaution against strange misuse.
Avoids an extra async method (and its associated allocations) on the path where we typically end up yielding waiting for data to arrive.
Happened to notice these while doing perf work.
And delete a bit of dead code.
Two issues, one preexisting this change and one introduced by it.
- Preexisting: TcpClient on Unix simulates socket.Connect(string host, int port) by creating a new Socket for each possible address we're attempting to connect to.  As a result, if the TcpClient is Dispose'd before the connection is successful, we won't yet have stored the actual Socket into the TcpClient and we won't end up Dispose'ing the Socket.  Then subsequent attempts to send/receive on that socket will continue, as they will be operating on a valid Socket.
- New: Previously the Socket would end up being Dispose'd of when we Dispose'd of the TcpClient we were holding on to.  now that Dispose happens via the NetworkStream.  But we weren't passing ownsSocket: true to the NetworkStream, so when it was Dispose'd, it wasn't Dispose'ing of the Socket, resulting in similar hangs when trying to send/receive in abort scenarios.
- Our closing the socket may trigger our own pending receive to return successfully with 0 bytes read, as if the other end had closed the connection.  We need to treat this as an ObjectDisposedException to match the Windows behavior.
- If the Socket ends with an operation canceled code, we should throw an OperationCanceledException to match the Windows behavior.
- When using wss (https), we use an SslStream, and if the underlying stream is disposed of, SslStream may throw an InvalidOperationException due to the connection no longer being authenticated.  That needs to be translated into the appropriate exception type to match Windows behavior.
@stephentoub stephentoub merged commit 774f536 into dotnet:master Jun 15, 2016
@stephentoub stephentoub deleted the websocket_networkstream branch June 15, 2016 15:40
@karelz karelz modified the milestone: 1.1.0 Dec 3, 2016
@karelz karelz added os-linux Linux OS (any supported distro) and removed X-Plat labels Mar 8, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Net os-linux Linux OS (any supported distro)
Projects
None yet
6 participants