Remove global spinlock for EH stacktrace #103076

janvorli · 2024-06-05T15:09:24Z

The global spinlock that was used to ensure that stack trace and the associated dynamic methods array were updated and read atomically. However, for the new EH, it has shown to cause a high contention in case many threads were handling exceptions at the same time.

This change replaces the two arrays by one object member in the exception class. It contains reference to either the byte[] of the stack trace (when there are no dynamic methods on the stack trace) or an object[] where the first element contains the stack trace byte[] reference and the following elements contain what used to be in the dynamic array. That allows atomic updates and reads of the stack trace and dynamic method keepalive references without a need of a lock.

It improves the performance of exceptions being handled on multiple threads at the same time 3 fold. For example, a testing app running on my Windows x64 machine with 10 CPU cores that throws an exception on 10 threads in parallel 1000000 times over 10 stack frames took 30s of each thread's time before and 9 seconds after the change.

The original code was quite convoluted, it was difficult to reason about and it had some races in it that were hidden behind the global lock. So I have decided to rewrite the whole thing from scratch.

The way it ensures that it is race free is that whenever it updates the exception stack trace and the one that's on the exception was created by a different thread, it creates a deep copy of both the stack trace and the keepalive array. When making the copy, it also handles a case when a frame that needs a keepalive entry is on the stack trace part, but the keepalive array extracted from the exception is stale (the other thread needed to resize the keepalive array, but not the stack trace). In that case, the stack trace is trimmed at first such entry found.

Since the case when multiple threads are throwing the same exception and so they are modifying its stack trace in parallel is pathological anyways, I believe the extra work spent on creating the clones of the arrays is a good tradeoff for ensuring easy to reason about thread safety.

I have also merged the StackTraceInfo::AppendElement and the StackTraceInfo::SaveStackTrace into one, as they were always being called in pair for a single frame. Before, the first one needed to store the frame and the second would extract it from that storage and append it to the stack trace.

Finally, since with the previous iteration of this change, a bug in building the stack trace was found, I have added a coreclr test to verify stack trace for an exception matches the expectations.

The global spinlock that was used to ensure that stack trace and the associated dynamic methods array were updated and read atomically. However, for the new EH, it has shown to cause a high contention in case many threads were handling exceptions at the same time. This change replaces the two arrays by one object member in the exception class. It contains reference to either the byte[] of the stack trace (when there are no dynamic methods on the stack trace) or an object[] where the first element contains the stack trace byte[] reference and the following elements contain what used to be in the dynamic array. That allows atomic updates and reads of the stack trace and dynamic method keepalive references without a need of a lock. The original code was quite convoluted, it was difficult to reason about and it had some races in it that were hidden behind the global lock. So I have decided to rewrite the whole thing from scratch. The way it ensures that it is race free is that whenever it updates the exception stack trace and the one that's on the exception was created by a different thread, it creates a deep copy of both the stack trace and the keepalive array. When making the copy, it also handles a case when a frame that needs a keepalive entry is on the stack trace part, but the keepalive array extracted from the exception is stale (the other thread needed to resize the keepalive array, but not the stack trace). In that case, the stack trace is trimmed at first such entry found. Since the case when multiple threads are throwing the same exception and so they are modifying its stack trace in parallel is pathological anyways, I believe the extra work spent on creating the clones of the arrays is a good tradeoff for ensuring easy to reason about thread safety. I have also removed a dead code path from the StackTraceInfo::SaveStackTrace. Finally, since with the previous iteration of this change, a bug in building the stack trace was found, I have added a coreclr test to verify stack trace for an exception matches the expectations.

janvorli · 2024-06-05T19:52:34Z

There are some test errors, I am investigating them.

src/coreclr/vm/excep.cpp

src/coreclr/vm/object.cpp

src/coreclr/vm/clrex.h

src/coreclr/vm/comutilnative.cpp

jkotas · 2024-06-06T03:25:43Z

src/coreclr/vm/comutilnative.cpp

 // Given an exception object, this method will extract the stacktrace and dynamic method array and set them up for return to the caller.
-FCIMPL3(VOID, ExceptionNative::GetStackTracesDeepCopy, Object* pExceptionObjectUnsafe, Object **pStackTraceUnsafe, Object **pDynamicMethodsUnsafe);
+FCIMPL2(VOID, ExceptionNative::GetStackTracesDeepCopy, Object* pExceptionObjectUnsafe, Object **pStackTraceUnsafe);


Suggested change

FCIMPL2(VOID, ExceptionNative::GetStackTracesDeepCopy, Object* pExceptionObjectUnsafe, Object **pStackTraceUnsafe);

FCIMPL1(Object *, ExceptionNative::FreezeStackTrace, Object* pStackTrace)

This can return the value now that there is just a single value to return

Updated the suggestion to FreezeStackTrace since the method does not actually perform a deep copy in typical case anymore.

src/coreclr/vm/object.cpp

jkotas · 2024-06-06T03:51:55Z

src/coreclr/vm/object.cpp

+                }
+                if (keepaliveObject == NULL)
+                {
+                    // Trim the stack trace at a point where a dynamic or collectible method is found without a corresponding keepalive object.


How is this possible?

If I understand the code correctly, the thread safety should be guaranteed by writing the size field last and reading it first. Is that right? (Whatever it is, it would be nice to have it documented in a comment somewhere.)

I do not think it should be possible to get methods items without corresponding keep alive items here. If it was possible, I suspect there may be situations where the method is not kept alive and the pMethod->IsLCGMethod() check above can crash.

Here is the case when it happens:

Thread A owns the stack trace and is adding a new entry that requires a keep alive object.

Thread B wants to update the stack trace too, so it fetches the stack trace array S1 and keep alive array K1 from the exception

Thread A finds the keep alive array is full, so it creates a larger clone of the previous one, let's call it K2

Thread A adds the new entry to keep alive array K2

Thread A adds the new entry to stack trace array S1

Thread A writes K2 to the exception

Now thread B creates clones of S1 and K1 (when it read it from the exception, the K2 was not there yet). But S1 already has the new entry added by thread A, which is not covered by K1. Since K1 won't change anymore, we solve this by trimming the S1 at the element where we've found a frame that needs keep alive object and was not covered by K1

However, I have realized you are right that in such case, calling IsLCGMethod on the MethodTable above could crash. For example, if the thread A's exception was collected together with its keep alive array K2 that can be the only thing holding the method alive and if that would happen before the thread B called the IsLCGMethod.
So I need to figure out some other way to handle the situation described above.

jkotas · 2024-06-06T04:23:33Z

Since the case when multiple threads are throwing the same exception and so they are modifying its stack trace in parallel is pathological anyways, I believe the extra work spent on creating the clones of the arrays is a good tradeoff for ensuring easy to reason about thread safety.

I agree that multiple threads throwing the same exception is a pathological case that can be slow. One exception being caught on one thread and rethrown on different threads using ExceptionDispatchInfo is not that uncommon in async code.

src/coreclr/vm/excep.cpp

* Missing calls to IsOverflow at few places * Added a flag on StackTraceElement to indicate that the element needs a keepalive entry. It removes the need to call IsLCGMethod / Collectible check on the method table stored in the element and eliminates a possible problem with the method being collected in one place. * Returned missing call to StackFrameInfo::Init to the x86 code path * Removed obsolete comment and code line

* Add keep alive items count to the stack trace header. * Implement the concept of frozen stack traces to eliminate copies in the ExceptionDispatchInfo storing / restoring exceptions.

src/coreclr/vm/object.h

In the StackTraceArray::Allocate

src/coreclr/System.Private.CoreLib/src/System/Exception.CoreCLR.cs

Plus a build break fix

Also rename GetStackTracesDeepCopy to GetFrozenStackTrace and move the return argument to return value.

src/coreclr/vm/comutilnative.cpp

src/coreclr/vm/clrex.h

jkotas · 2024-06-10T23:13:18Z

src/coreclr/vm/comutilnative.cpp

+
+    // There can be no GC after setting the frozenStackTrace until the Object is returned.


Suggested change

// There can be no GC after setting the frozenStackTrace until the Object is returned.

I am not sure what this comment is trying to say.

Ah, I've renamed the frozenStackTrace to result and forgotten to update the comment. It is trying to say that the gc.result is not protected after the HELPER_METHOD_FRAME_END

Ok, that's writing manually managed code 101.

(We will want to convert this method to QCALL in near future to get rid of the HELPER_METHOD_FRAME.)

src/coreclr/vm/comutilnative.cpp

src/coreclr/vm/excep.cpp

src/coreclr/vm/object.cpp

Plus an unused method removal and a little naming / contract cleanup

src/coreclr/vm/object.cpp

jkotas · 2024-06-11T16:12:49Z

src/coreclr/vm/object.cpp

-    m_array = (I1ARRAYREF) AllocatePrimitiveArray(ELEMENT_TYPE_I1, static_cast<DWORD>(src.Capacity()));
-
-    Volatile<size_t> size = src.Size();
+    uint32_t size = src.Size();


We should use volatile read for the Size. Otherwise, the C++ compiler is free to duplicate the read and use one size for value for the memcpy and return a different value from the method.

I have thought you were suggesting before that I get rid of the Volatile here, so I am bit confused.

I would:

Get rid of the volatile local variable. It does not make sense to tell the compiler to issue read/write barriers when reading/writing stack. It is deoptimization.

Change the read in Size() method to be volatile read, so that it is not reordered. This volatile read is counterpart of volatile write in StackTraceArray::Append (it is not actual volatile write, but MemoryBarrier + regular write that is equivalent in this situation). In general, volatile reads and volatile writes in lock-free algorithm have to come in pairs on producing/consuming sides.

Ok, I have added a commit with that change. I've modified the GetSize / SetSize and added such treatment to the keep alive count accessors too. And removed the MemoryBarrier from the StackTraceArray::Append where it is no longer needed after this change.

jkotas

LGTM otherwise. Thank you!

src/coreclr/inc/sospriv.idl

Also remove the memory barrier from the StackTraceArray::Append since it is not needed after that change.

janvorli · 2024-06-11T19:43:04Z

This PR needs to wait for merging until SOS with corresponding change is released.

I have also realized that when we need to trim, the keepAlive array is always fully populated, so we don't need to check for cases where there would be NULL in an entry of the array.

janvorli · 2024-06-13T08:09:57Z

/azp run runtime-coreclr gcstress0x3-gcstress0xc

azure-pipelines · 2024-06-13T08:10:14Z

Azure Pipelines successfully started running 1 pipeline(s).

janvorli · 2024-06-20T11:39:14Z

/azp run runtime-coreclr gcstress0x3-gcstress0xc

azure-pipelines · 2024-06-20T11:39:23Z

Azure Pipelines successfully started running 1 pipeline(s).

janvorli · 2024-07-08T12:53:22Z

/azp run runtime-coreclr gcstress0x3-gcstress0xc

azure-pipelines · 2024-07-08T12:53:31Z

Azure Pipelines successfully started running 1 pipeline(s).

janvorli added the area-ExceptionHandling-coreclr label Jun 5, 2024

janvorli added this to the 9.0.0 milestone Jun 5, 2024

janvorli requested a review from jkotas June 5, 2024 15:09

janvorli self-assigned this Jun 5, 2024

janvorli added 2 commits June 5, 2024 17:33

Fix MUSL build

96da965

Fix x86 build

fb8cc6b

This was referenced Jun 5, 2024

Failure in GC\API\NoGCRegion\Callback_Svr\Callback_Svr.cmd #100149

Closed

[x86] stress failure in RayTracer.GetNaturalColor with DOTNET_JitStress=2 #102590

Closed

jkotas reviewed Jun 5, 2024

View reviewed changes

src/coreclr/vm/excep.cpp Outdated Show resolved Hide resolved

src/coreclr/vm/excep.cpp Outdated Show resolved Hide resolved

src/coreclr/vm/object.cpp Outdated Show resolved Hide resolved

jkotas reviewed Jun 6, 2024

View reviewed changes

src/coreclr/vm/excep.cpp Outdated Show resolved Hide resolved

janvorli added 3 commits June 6, 2024 18:50

Few changes based on feedback

8eccdbc

* Add keep alive items count to the stack trace header. * Implement the concept of frozen stack traces to eliminate copies in the ExceptionDispatchInfo storing / restoring exceptions.

Rename keepalive to keepAlive

f774c60