-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix pinning in quic #52368
fix pinning in quic #52368
Conversation
Tagging subscribers to this area: @dotnet/ncl Issue DetailsThis should fix the native crashes and corrupted buffers. It would be great if you can give it try @CarnaViire To avoid constant pinning of SendQuicBuffers, I decided to allocate global memory since that is only passed to native msquic. I also added Pending state to indicate that we are actively writing and waiting for msquic callback. I stills see some timeouts on my 2 core VM. contributes to #52047
|
public SafeMsQuicBufferHandle(int count) | ||
: base(IntPtr.Zero, ownsHandle: true) | ||
{ | ||
IntPtr buffer = Marshal.AllocHGlobal(sizeof(QuicBuffer) * count); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't need to do this? What pattern is causing this to be required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean AllocHGlobal
or wrapping it in SafeMsQuicBufferHandle
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both. We do a similar thing in the ASP.NET Core servers and in sockets and we don't use this pattern AFAIK. What makes MSQuic unique here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the SafeHandle should be unnecessary here. The code has to be careful about managing the lifetimes of the async buffers. SafeHandle does not make it easier.
It is an interesting question whether it is better to use unmanaged memory or POH-allocated memory here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is another example from SslStream.
runtime/src/libraries/System.Net.Security/src/System/Net/Security/SslStreamPal.Windows.cs
Lines 262 to 268 in 7ea26e6
const int NumSecBuffers = 4; // header + data + trailer + empty | |
Interop.SspiCli.SecBuffer* unmanagedBuffer = stackalloc Interop.SspiCli.SecBuffer[NumSecBuffers]; | |
Interop.SspiCli.SecBufferDesc sdcInOut = new Interop.SspiCli.SecBufferDesc(NumSecBuffers) | |
{ | |
pBuffers = unmanagedBuffer | |
}; | |
fixed (byte* outputPtr = output) |
Because we need to only live through the one native call, we can allocate the buffer array on stack. For msquic we need to wait until HandleEventSendComplete()
is invoked. So we need to keep it alive past the sending function.
Anyway, we could use the pinning if we want to but as I mentioned this looked simpler and less work on each call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example you have to be careful about situation when somebody disposes the object while the operation is still in flight.
Yep, I'm aware and we also go back and forth on how useful SafeHandles are in certain places, but I'm surprised SafeHandle + native buffer is the solution here. Are we pooling these things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no pool at the moment. The main goal was to stabilize the test runs and avoid corruption and crashes and to unblock @ManickaP and @CarnaViire. I did 700+ runs and I did not see any failure. I think it would be better to refactor and/or optimize once we have solid CI in place.
I think we will need more work toward #5262 and #32142. The SafeHandle primarily helps with p/invokes. For writes, we will need to preserve the memory longer unless we change the handout model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SafeHandle does not help you much here. You are directly managing the lifetime of the MemoryHandle
s for the individual buffer, so you can directly managed the lifetime for the array of buffer pointers as well. SafeHandle here is a lot of overhead for negligible benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not to use Pinned Object Heap instead of native allocation & safe handles? ( |
I've tried the current fix and it worked for me -- I haven't seen neither AVE nor buffer corruptions on multiple runs. |
One still has to be very careful about lifetime management and ensuring that the pinned buffer is not collected while the unmanaged callback still works on it. Using POH won't make the code simpler. Also, POH has a problematic performance characteristics today. It always allocates in Gen2 and thus using POH will unnecessarily put pressure on the expensive Gen2 GC collections. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets get this merged to unblock other work. We have yet to do a full optimization pass, and I expect we will revisit this (likely removing the new safe handle type) at that point.
src/libraries/System.Net.Quic/src/System/Net/Quic/Implementations/MsQuic/MsQuicStream.cs
Outdated
Show resolved
Hide resolved
...System.Net.Quic/src/System/Net/Quic/Implementations/MsQuic/Interop/SafeMsQuicBufferHandle.cs
Outdated
Show resolved
Hide resolved
I do not see the point merging this with the useless safehandle handle. It is like 10 minute edit to delete it. |
and keep the native memory or go back to pinning? |
Keep the native memory. |
…ons/MsQuic/MsQuicStream.cs Co-authored-by: Cory Nelson <phrosty@gmail.com>
I removed the SafeMsQuicBufferHandle and did ~ 30 test runs on whole QUIC test suite. I will let the tests run in loop overnight. |
src/libraries/System.Net.Quic/src/System/Net/Quic/Implementations/MsQuic/MsQuicStream.cs
Show resolved
Hide resolved
src/libraries/System.Net.Quic/src/System/Net/Quic/Implementations/MsQuic/MsQuicStream.cs
Show resolved
Hide resolved
|
||
// Handle to pinned SendQuicBuffers. | ||
public GCHandle SendHandle; | ||
public IntPtr SendQuicBuffers = Marshal.AllocHGlobal(sizeof(QuicBuffer)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to allocate this only once it is needed, with the right size.
As written, SendQuicBuffers
maybe be allocated and never used; or SendQuicBuffers
may leak if the initialization fails mid-flight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. I can move it to constructor at some point or add logic to SendReadOnlyMemoryAsync(). The old code always allocated array of size 1 to optimize for using ReadOnlyMemory e.g. use from HttpClient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just questions to understand the change better. Otherwise, LGTM.
@@ -781,15 +777,15 @@ private static uint HandleEventSendComplete(State state, ref StreamEvent evt) | |||
|
|||
private static void CleanupSendState(State state) | |||
{ | |||
if (state.SendHandle.IsAllocated) | |||
lock (state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the lock here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We lock the state
in other place when we transition. So I think we should be either consists or avoid the lock completely via some other mechanism (like preventing duplicate operations)
@@ -798,6 +794,12 @@ private static void CleanupSendState(State state) | |||
ReadOnlyMemory<byte> buffer, | |||
QUIC_SEND_FLAGS flags) | |||
{ | |||
lock (_state) | |||
{ | |||
Debug.Assert(_state.SendState != SendState.Pending); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we want to prevent overlapping Sends. Shouldn't this then rather be a condition with throw QuicException...
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was not meant as guard agains overlaying Sends. (I think that should be done much earlier)
I added this to make sure our internal state logic (and msquic) always moves to some other state.
_state.BufferArrays[0] = handle; | ||
if (_state.SendQuicBuffers == IntPtr.Zero) | ||
{ | ||
_state.SendQuicBuffers = Marshal.AllocHGlobal(sizeof(QuicBuffer)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the major difference here is that we allocate SendQuicBuffers
aka QuicBuffer*
directly in unmanaged memory. While originally we were using pinned managed memory GCHandle.Alloc(_state.SendQuicBuffers, GCHandleType.Pinned);
.
What was wrong with the original approach? Was it Marshal.UnsafeAddrOfPinnedArrayElement(_state.SendQuicBuffers, 0);
? Should that be _state.SendHandle.AddrOfPinnedObject();
?
This comment is just me trying to properly understand what's going on here. It has absolutely no influence on mergeability of this PR 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we could keep pinning. But then we do more operations on each send and we need to maintain the handle. So I felt this would be simpler as the SendQuicBuffers are really only consumed by native code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general point here is that since we only use this array as unmanaged memory, there's not really any benefit to allocating it as managed memory, and some nontrivial cost (overhead of pin/unpin, additional GC overhead, etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But how does this solve the problem with byte mixing and AVE? Do we know the exact root cause of that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have doubts that this change is actually fixing the root cause of the crash. It is probably just moving it around.
This type has number of subtle issues like #52048 that will lead to use-after-free and similar crashes. Somebody will need to do a focused pass over it to fix them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But how does this solve the problem with byte mixing and AVE? Do we know the exact root cause of that?
Yeah, you are right. Seems like this probably doesn't fix the root cause. It might make it less likely to happen, because the memory will never move; but I don't think we actually understand the root cause yet.
(I had assumed we were previously allocating the array on the stack, per discussion above -- but we weren't, so that wasn't the root problem.)
It would probably help to clear out the BufferArray entries when we dispose them, here: https://github.com/dotnet/runtime/pull/52368/files#diff-55ed6a1c110b1a90d4900bce3075ab49f1d6212c223e69b48f711f4084d264acR787
If we have a use-after-free issue, that would help find it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems suspicious that we are calling CleanupSendState in Dispose -- how do we know that msquic isn't still holding our buffer array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type has number of subtle issues like #52048 that will lead to use-after-free and similar crashes. Somebody will need to do a focused pass over it to fix them.
Yeah, I agree we need a pass here. Do you have specific concerns? Dispose in particular looks questionable to me, anything else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
anything else?
Handling of different failure modes. For example:
If AllocHGlobal
fails, we will think that we have space for old SendBufferMaxCount
, but SendQuicBuffers
is actually going to be null. This will lead to NullReferenceException
on retry.
If new MemoryHandle[array.Length]
fails, we will think that we have space for new SendBufferMaxCount
, but BufferArrays
is actually going to have only have space for old SendBufferMaxCount
. This will lead to IndexOutOfBoundsException
on retry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to write new tests and cleanup the shutdown more. I would generally expect that if allocation fails, the stream (or whole system) will not be in usable state. I can certainly split the allocations and assignment to make sure it is consistent.
This should fix the native crashes and corrupted buffers. It would be great if you can give it try @CarnaViire
To avoid constant pinning of SendQuicBuffers, I decided to allocate global memory since that is only passed to native msquic. I also added Pending state to indicate that we are actively writing and waiting for msquic callback.
I stills see some timeouts on my 2 core VM.
I will submit separate PR to make some adjustments and to avoid hardcoded values. (e.g. use PassingTestTimeout)
contributes to #52047