Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove global spinlock for EH stacktrace #103076

Merged
merged 15 commits into from
Jul 8, 2024

Conversation

janvorli
Copy link
Member

@janvorli janvorli commented Jun 5, 2024

The global spinlock that was used to ensure that stack trace and the associated dynamic methods array were updated and read atomically. However, for the new EH, it has shown to cause a high contention in case many threads were handling exceptions at the same time.

This change replaces the two arrays by one object member in the exception class. It contains reference to either the byte[] of the stack trace (when there are no dynamic methods on the stack trace) or an object[] where the first element contains the stack trace byte[] reference and the following elements contain what used to be in the dynamic array. That allows atomic updates and reads of the stack trace and dynamic method keepalive references without a need of a lock.

It improves the performance of exceptions being handled on multiple threads at the same time 3 fold. For example, a testing app running on my Windows x64 machine with 10 CPU cores that throws an exception on 10 threads in parallel 1000000 times over 10 stack frames took 30s of each thread's time before and 9 seconds after the change.

The original code was quite convoluted, it was difficult to reason about and it had some races in it that were hidden behind the global lock. So I have decided to rewrite the whole thing from scratch.

The way it ensures that it is race free is that whenever it updates the exception stack trace and the one that's on the exception was created by a different thread, it creates a deep copy of both the stack trace and the keepalive array. When making the copy, it also handles a case when a frame that needs a keepalive entry is on the stack trace part, but the keepalive array extracted from the exception is stale (the other thread needed to resize the keepalive array, but not the stack trace). In that case, the stack trace is trimmed at first such entry found.

Since the case when multiple threads are throwing the same exception and so they are modifying its stack trace in parallel is pathological anyways, I believe the extra work spent on creating the clones of the arrays is a good tradeoff for ensuring easy to reason about thread safety.

I have also merged the StackTraceInfo::AppendElement and the StackTraceInfo::SaveStackTrace into one, as they were always being called in pair for a single frame. Before, the first one needed to store the frame and the second would extract it from that storage and append it to the stack trace.

Finally, since with the previous iteration of this change, a bug in building the stack trace was found, I have added a coreclr test to verify stack trace for an exception matches the expectations.

The global spinlock that was used to ensure that stack trace and the
associated dynamic methods array were updated and read atomically.
However, for the new EH, it has shown to cause a high contention in case
many threads were handling exceptions at the same time.

This change replaces the two arrays by one object member in the
exception class. It contains reference to either the byte[] of the
stack trace (when there are no dynamic methods on the stack trace) or an
object[] where the first element contains the stack trace byte[]
reference and the following elements contain what used to be in the
dynamic array. That allows atomic updates and reads of the stack trace
and dynamic method keepalive references without a need of a lock.

The original code was quite convoluted, it was difficult to reason about
and it had some races in it that were hidden behind the global lock. So
I have decided to rewrite the whole thing from scratch.

The way it ensures that it is race free is that whenever it updates the
exception stack trace and the one that's on the exception was created by
a different thread, it creates a deep copy of both the stack trace and
the keepalive array. When making the copy, it also handles a case when
a frame that needs a keepalive entry is on the stack trace part, but the
keepalive array extracted from the exception is stale (the other thread
needed to resize the keepalive array, but not the stack trace). In that
case, the stack trace is trimmed at first such entry found.

Since the case when multiple threads are throwing the same exception and
so they are modifying its stack trace in parallel is pathological
anyways, I believe the extra work spent on creating the clones of the arrays
is a good tradeoff for ensuring easy to reason about thread safety.

I have also removed a dead code path from the
StackTraceInfo::SaveStackTrace.

Finally, since with the previous iteration of this change, a bug in
building the stack trace was found, I have added a coreclr test to
verify stack trace for an exception matches the expectations.
@janvorli janvorli added this to the 9.0.0 milestone Jun 5, 2024
@janvorli janvorli requested a review from jkotas June 5, 2024 15:09
@janvorli janvorli self-assigned this Jun 5, 2024
@janvorli
Copy link
Member Author

janvorli commented Jun 5, 2024

There are some test errors, I am investigating them.

src/coreclr/vm/excep.cpp Outdated Show resolved Hide resolved
src/coreclr/vm/excep.cpp Outdated Show resolved Hide resolved
src/coreclr/vm/object.cpp Outdated Show resolved Hide resolved
src/coreclr/vm/clrex.h Show resolved Hide resolved
src/coreclr/vm/clrex.h Outdated Show resolved Hide resolved
src/coreclr/vm/comutilnative.cpp Outdated Show resolved Hide resolved
// Given an exception object, this method will extract the stacktrace and dynamic method array and set them up for return to the caller.
FCIMPL3(VOID, ExceptionNative::GetStackTracesDeepCopy, Object* pExceptionObjectUnsafe, Object **pStackTraceUnsafe, Object **pDynamicMethodsUnsafe);
FCIMPL2(VOID, ExceptionNative::GetStackTracesDeepCopy, Object* pExceptionObjectUnsafe, Object **pStackTraceUnsafe);
Copy link
Member

@jkotas jkotas Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
FCIMPL2(VOID, ExceptionNative::GetStackTracesDeepCopy, Object* pExceptionObjectUnsafe, Object **pStackTraceUnsafe);
FCIMPL1(Object *, ExceptionNative::FreezeStackTrace, Object* pStackTrace)

This can return the value now that there is just a single value to return

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the suggestion to FreezeStackTrace since the method does not actually perform a deep copy in typical case anymore.

src/coreclr/vm/object.cpp Outdated Show resolved Hide resolved
}
if (keepaliveObject == NULL)
{
// Trim the stack trace at a point where a dynamic or collectible method is found without a corresponding keepalive object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this possible?

If I understand the code correctly, the thread safety should be guaranteed by writing the size field last and reading it first. Is that right? (Whatever it is, it would be nice to have it documented in a comment somewhere.)

I do not think it should be possible to get methods items without corresponding keep alive items here. If it was possible, I suspect there may be situations where the method is not kept alive and the pMethod->IsLCGMethod() check above can crash.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the case when it happens:

  • Thread A owns the stack trace and is adding a new entry that requires a keep alive object.
  • Thread B wants to update the stack trace too, so it fetches the stack trace array S1 and keep alive array K1 from the exception
  • Thread A finds the keep alive array is full, so it creates a larger clone of the previous one, let's call it K2
  • Thread A adds the new entry to keep alive array K2
  • Thread A adds the new entry to stack trace array S1
  • Thread A writes K2 to the exception
  • Now thread B creates clones of S1 and K1 (when it read it from the exception, the K2 was not there yet). But S1 already has the new entry added by thread A, which is not covered by K1. Since K1 won't change anymore, we solve this by trimming the S1 at the element where we've found a frame that needs keep alive object and was not covered by K1

However, I have realized you are right that in such case, calling IsLCGMethod on the MethodTable above could crash. For example, if the thread A's exception was collected together with its keep alive array K2 that can be the only thing holding the method alive and if that would happen before the thread B called the IsLCGMethod.
So I need to figure out some other way to handle the situation described above.

@jkotas
Copy link
Member

jkotas commented Jun 6, 2024

Since the case when multiple threads are throwing the same exception and so they are modifying its stack trace in parallel is pathological anyways, I believe the extra work spent on creating the clones of the arrays is a good tradeoff for ensuring easy to reason about thread safety.

I agree that multiple threads throwing the same exception is a pathological case that can be slow. One exception being caught on one thread and rethrown on different threads using ExceptionDispatchInfo is not that uncommon in async code.

src/coreclr/vm/excep.cpp Outdated Show resolved Hide resolved
* Missing calls to IsOverflow at few places
* Added a flag on StackTraceElement to indicate that the element needs a
  keepalive entry. It removes the need to call IsLCGMethod / Collectible
  check on the method table stored in the element and eliminates a
  possible problem with the method being collected in one place.
* Returned missing call to StackFrameInfo::Init to the x86 code path
* Removed obsolete comment and code line
* Add keep alive items count to the stack trace header.
* Implement the concept of frozen stack traces to eliminate copies in
  the ExceptionDispatchInfo storing / restoring exceptions.
src/coreclr/vm/object.h Outdated Show resolved Hide resolved
In the StackTraceArray::Allocate
@janvorli janvorli force-pushed the remove-eh-stacktrace-global-lock-new branch from 69d1e71 to d056127 Compare June 10, 2024 19:32
Also rename GetStackTracesDeepCopy to GetFrozenStackTrace and move the
return argument to return value.
src/coreclr/vm/clrex.h Outdated Show resolved Hide resolved
Comment on lines 161 to 162

// There can be no GC after setting the frozenStackTrace until the Object is returned.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// There can be no GC after setting the frozenStackTrace until the Object is returned.

I am not sure what this comment is trying to say.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I've renamed the frozenStackTrace to result and forgotten to update the comment. It is trying to say that the gc.result is not protected after the HELPER_METHOD_FRAME_END

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that's writing manually managed code 101.

(We will want to convert this method to QCALL in near future to get rid of the HELPER_METHOD_FRAME.)

src/coreclr/vm/excep.cpp Outdated Show resolved Hide resolved
src/coreclr/vm/object.cpp Outdated Show resolved Hide resolved
Plus an unused method removal and a little naming / contract cleanup
src/coreclr/vm/object.cpp Outdated Show resolved Hide resolved
m_array = (I1ARRAYREF) AllocatePrimitiveArray(ELEMENT_TYPE_I1, static_cast<DWORD>(src.Capacity()));

Volatile<size_t> size = src.Size();
uint32_t size = src.Size();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use volatile read for the Size. Otherwise, the C++ compiler is free to duplicate the read and use one size for value for the memcpy and return a different value from the method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have thought you were suggesting before that I get rid of the Volatile here, so I am bit confused.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would:

  • Get rid of the volatile local variable. It does not make sense to tell the compiler to issue read/write barriers when reading/writing stack. It is deoptimization.
  • Change the read in Size() method to be volatile read, so that it is not reordered. This volatile read is counterpart of volatile write in StackTraceArray::Append (it is not actual volatile write, but MemoryBarrier + regular write that is equivalent in this situation). In general, volatile reads and volatile writes in lock-free algorithm have to come in pairs on producing/consuming sides.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I have added a commit with that change. I've modified the GetSize / SetSize and added such treatment to the keep alive count accessors too. And removed the MemoryBarrier from the StackTraceArray::Append where it is no longer needed after this change.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise. Thank you!

Also remove the memory barrier from the StackTraceArray::Append since it
is not needed after that change.
@janvorli
Copy link
Member Author

This PR needs to wait for merging until SOS with corresponding change is released.

@janvorli janvorli added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Jun 11, 2024
I have also realized that when we need to trim, the keepAlive array is
always fully populated, so we don't need to check for cases where there
would be NULL in an entry of the array.
@janvorli
Copy link
Member Author

/azp run runtime-coreclr gcstress0x3-gcstress0xc

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@janvorli
Copy link
Member Author

/azp run runtime-coreclr gcstress0x3-gcstress0xc

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@janvorli
Copy link
Member Author

janvorli commented Jul 8, 2024

/azp run runtime-coreclr gcstress0x3-gcstress0xc

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@janvorli janvorli merged commit dd2120b into dotnet:main Jul 8, 2024
108 of 112 checks passed
@janvorli janvorli deleted the remove-eh-stacktrace-global-lock-new branch July 8, 2024 19:52
@janvorli janvorli removed the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Jul 24, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Aug 24, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants