-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove global spinlock for EH stacktrace #103076
Remove global spinlock for EH stacktrace #103076
Conversation
The global spinlock that was used to ensure that stack trace and the associated dynamic methods array were updated and read atomically. However, for the new EH, it has shown to cause a high contention in case many threads were handling exceptions at the same time. This change replaces the two arrays by one object member in the exception class. It contains reference to either the byte[] of the stack trace (when there are no dynamic methods on the stack trace) or an object[] where the first element contains the stack trace byte[] reference and the following elements contain what used to be in the dynamic array. That allows atomic updates and reads of the stack trace and dynamic method keepalive references without a need of a lock. The original code was quite convoluted, it was difficult to reason about and it had some races in it that were hidden behind the global lock. So I have decided to rewrite the whole thing from scratch. The way it ensures that it is race free is that whenever it updates the exception stack trace and the one that's on the exception was created by a different thread, it creates a deep copy of both the stack trace and the keepalive array. When making the copy, it also handles a case when a frame that needs a keepalive entry is on the stack trace part, but the keepalive array extracted from the exception is stale (the other thread needed to resize the keepalive array, but not the stack trace). In that case, the stack trace is trimmed at first such entry found. Since the case when multiple threads are throwing the same exception and so they are modifying its stack trace in parallel is pathological anyways, I believe the extra work spent on creating the clones of the arrays is a good tradeoff for ensuring easy to reason about thread safety. I have also removed a dead code path from the StackTraceInfo::SaveStackTrace. Finally, since with the previous iteration of this change, a bug in building the stack trace was found, I have added a coreclr test to verify stack trace for an exception matches the expectations.
There are some test errors, I am investigating them. |
src/coreclr/vm/comutilnative.cpp
Outdated
// Given an exception object, this method will extract the stacktrace and dynamic method array and set them up for return to the caller. | ||
FCIMPL3(VOID, ExceptionNative::GetStackTracesDeepCopy, Object* pExceptionObjectUnsafe, Object **pStackTraceUnsafe, Object **pDynamicMethodsUnsafe); | ||
FCIMPL2(VOID, ExceptionNative::GetStackTracesDeepCopy, Object* pExceptionObjectUnsafe, Object **pStackTraceUnsafe); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FCIMPL2(VOID, ExceptionNative::GetStackTracesDeepCopy, Object* pExceptionObjectUnsafe, Object **pStackTraceUnsafe); | |
FCIMPL1(Object *, ExceptionNative::FreezeStackTrace, Object* pStackTrace) |
This can return the value now that there is just a single value to return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the suggestion to FreezeStackTrace
since the method does not actually perform a deep copy in typical case anymore.
src/coreclr/vm/object.cpp
Outdated
} | ||
if (keepaliveObject == NULL) | ||
{ | ||
// Trim the stack trace at a point where a dynamic or collectible method is found without a corresponding keepalive object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this possible?
If I understand the code correctly, the thread safety should be guaranteed by writing the size
field last and reading it first. Is that right? (Whatever it is, it would be nice to have it documented in a comment somewhere.)
I do not think it should be possible to get methods items without corresponding keep alive items here. If it was possible, I suspect there may be situations where the method is not kept alive and the pMethod->IsLCGMethod()
check above can crash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the case when it happens:
- Thread A owns the stack trace and is adding a new entry that requires a keep alive object.
- Thread B wants to update the stack trace too, so it fetches the stack trace array S1 and keep alive array K1 from the exception
- Thread A finds the keep alive array is full, so it creates a larger clone of the previous one, let's call it K2
- Thread A adds the new entry to keep alive array K2
- Thread A adds the new entry to stack trace array S1
- Thread A writes K2 to the exception
- Now thread B creates clones of S1 and K1 (when it read it from the exception, the K2 was not there yet). But S1 already has the new entry added by thread A, which is not covered by K1. Since K1 won't change anymore, we solve this by trimming the S1 at the element where we've found a frame that needs keep alive object and was not covered by K1
However, I have realized you are right that in such case, calling IsLCGMethod on the MethodTable above could crash. For example, if the thread A's exception was collected together with its keep alive array K2 that can be the only thing holding the method alive and if that would happen before the thread B called the IsLCGMethod.
So I need to figure out some other way to handle the situation described above.
I agree that multiple threads throwing the same exception is a pathological case that can be slow. One exception being caught on one thread and rethrown on different threads using ExceptionDispatchInfo is not that uncommon in async code. |
* Missing calls to IsOverflow at few places * Added a flag on StackTraceElement to indicate that the element needs a keepalive entry. It removes the need to call IsLCGMethod / Collectible check on the method table stored in the element and eliminates a possible problem with the method being collected in one place. * Returned missing call to StackFrameInfo::Init to the x86 code path * Removed obsolete comment and code line
* Add keep alive items count to the stack trace header. * Implement the concept of frozen stack traces to eliminate copies in the ExceptionDispatchInfo storing / restoring exceptions.
In the StackTraceArray::Allocate
69d1e71
to
d056127
Compare
src/coreclr/System.Private.CoreLib/src/System/Exception.CoreCLR.cs
Outdated
Show resolved
Hide resolved
Plus a build break fix
Also rename GetStackTracesDeepCopy to GetFrozenStackTrace and move the return argument to return value.
src/coreclr/vm/comutilnative.cpp
Outdated
|
||
// There can be no GC after setting the frozenStackTrace until the Object is returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// There can be no GC after setting the frozenStackTrace until the Object is returned. |
I am not sure what this comment is trying to say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I've renamed the frozenStackTrace to result and forgotten to update the comment. It is trying to say that the gc.result is not protected after the HELPER_METHOD_FRAME_END
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, that's writing manually managed code 101.
(We will want to convert this method to QCALL in near future to get rid of the HELPER_METHOD_FRAME.)
Plus an unused method removal and a little naming / contract cleanup
m_array = (I1ARRAYREF) AllocatePrimitiveArray(ELEMENT_TYPE_I1, static_cast<DWORD>(src.Capacity())); | ||
|
||
Volatile<size_t> size = src.Size(); | ||
uint32_t size = src.Size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use volatile read for the Size. Otherwise, the C++ compiler is free to duplicate the read and use one size
for value for the memcpy and return a different value from the method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have thought you were suggesting before that I get rid of the Volatile here, so I am bit confused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would:
- Get rid of the volatile local variable. It does not make sense to tell the compiler to issue read/write barriers when reading/writing stack. It is deoptimization.
- Change the read in Size() method to be volatile read, so that it is not reordered. This volatile read is counterpart of volatile write in
StackTraceArray::Append
(it is not actual volatile write, but MemoryBarrier + regular write that is equivalent in this situation). In general, volatile reads and volatile writes in lock-free algorithm have to come in pairs on producing/consuming sides.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I have added a commit with that change. I've modified the GetSize / SetSize and added such treatment to the keep alive count accessors too. And removed the MemoryBarrier from the StackTraceArray::Append where it is no longer needed after this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM otherwise. Thank you!
Also remove the memory barrier from the StackTraceArray::Append since it is not needed after that change.
This PR needs to wait for merging until SOS with corresponding change is released. |
I have also realized that when we need to trim, the keepAlive array is always fully populated, so we don't need to check for cases where there would be NULL in an entry of the array.
/azp run runtime-coreclr gcstress0x3-gcstress0xc |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run runtime-coreclr gcstress0x3-gcstress0xc |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run runtime-coreclr gcstress0x3-gcstress0xc |
Azure Pipelines successfully started running 1 pipeline(s). |
The global spinlock that was used to ensure that stack trace and the associated dynamic methods array were updated and read atomically. However, for the new EH, it has shown to cause a high contention in case many threads were handling exceptions at the same time.
This change replaces the two arrays by one object member in the exception class. It contains reference to either the
byte[]
of the stack trace (when there are no dynamic methods on the stack trace) or anobject[]
where the first element contains the stack tracebyte[]
reference and the following elements contain what used to be in the dynamic array. That allows atomic updates and reads of the stack trace and dynamic method keepalive references without a need of a lock.It improves the performance of exceptions being handled on multiple threads at the same time 3 fold. For example, a testing app running on my Windows x64 machine with 10 CPU cores that throws an exception on 10 threads in parallel 1000000 times over 10 stack frames took 30s of each thread's time before and 9 seconds after the change.
The original code was quite convoluted, it was difficult to reason about and it had some races in it that were hidden behind the global lock. So I have decided to rewrite the whole thing from scratch.
The way it ensures that it is race free is that whenever it updates the exception stack trace and the one that's on the exception was created by a different thread, it creates a deep copy of both the stack trace and the keepalive array. When making the copy, it also handles a case when a frame that needs a keepalive entry is on the stack trace part, but the keepalive array extracted from the exception is stale (the other thread needed to resize the keepalive array, but not the stack trace). In that case, the stack trace is trimmed at first such entry found.
Since the case when multiple threads are throwing the same exception and so they are modifying its stack trace in parallel is pathological anyways, I believe the extra work spent on creating the clones of the arrays is a good tradeoff for ensuring easy to reason about thread safety.
I have also merged the
StackTraceInfo::AppendElement
and theStackTraceInfo::SaveStackTrace
into one, as they were always being called in pair for a single frame. Before, the first one needed to store the frame and the second would extract it from that storage and append it to the stack trace.Finally, since with the previous iteration of this change, a bug in building the stack trace was found, I have added a coreclr test to verify stack trace for an exception matches the expectations.