Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[debugger] Fix crash during Async Break when APC and CET are enabled #111408

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

thaystg
Copy link
Member

@thaystg thaystg commented Jan 14, 2025

When we are running an APC Callback we are not allowed to SetIP because when APC Callback is resuming it will check if the IP is not changed if CET is enabled.
To avoid this problem we use the APC to suspend the thread, but then we enable the single step and continue the thread execution, this will exit the apc callback and pause in the single step, so we are allowed to SetIP to FuncEvalHijack to run FuncEvals.

Fixes #110552

src/coreclr/vm/threadsuspend.cpp Outdated Show resolved Hide resolved
src/coreclr/vm/threadsuspend.cpp Outdated Show resolved Hide resolved
src/coreclr/vm/threadsuspend.cpp Outdated Show resolved Hide resolved
src/coreclr/vm/threadsuspend.cpp Outdated Show resolved Hide resolved
@@ -3722,7 +3722,7 @@ VOID
PALAPI
FlushProcessWriteBuffers();

typedef void (*PAL_ActivationFunction)(CONTEXT *context);
typedef void (*PAL_ActivationFunction)(CONTEXT *context, bool fromDebugger);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nit - can you please keep this function pointer unchanged and add the following function instead?

void HandleSuspensionForInterruptedThread(CONTEXT *interruptedContext)
{
    HandleSuspensionForInterruptedThread(interruptedContext, false);
}

@@ -5759,7 +5784,7 @@ BOOL CheckActivationSafePoint(SIZE_T ip)
// address to take the thread to the appropriate stub (based on the return
// type of the method) which will then handle preparing the thread for GC.
//
void HandleSuspensionForInterruptedThread(CONTEXT *interruptedContext)
void HandleSuspensionForInterruptedThread(CONTEXT *interruptedContext, bool suspendForDebugger)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this and the call site, it might be nicer to pass in the ActivationReason instead of bool.

#ifdef FEATURE_SPECIAL_USER_MODE_APC
// It's not allowed to change the IP while paused in an APC Callback for security reasons if CET is turned on
// So we enable the single step in the thread that is running the APC Callback
// and then it will be paused using single step exception after exiting the APC callback
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, the pause will occur after executing forward one instruction in the jitted managed code? For example if the CONTEXT where the debugger initially suspended had IP = 0x7ffe1000, and code:
0x7ffe1000: mov ecx, 5
0x7ffe1005: mov rax, 0x27c3ad47000
0x7ffe100f: jmp qword ptr [rax]

Once we set the single step bit, resume, and get notified that the single step has completed will the IP now be 0x7ffe1005 and we've executed that first mov instruction? If so that could be problematic because executing that instruction may have changed what line of source we are on, changed the value of variables, or shifted the thread from a GC-safe-point to a non-GC-safe-point.

If that is the case, a couple potential alternatives:

  1. Perhaps we don't need to hijack at all and we could run the func-eval from inside the APC? The purpose of the hijack is to get control of the thread to run what code we wanted to run and the APC has already done that for us. To make this work we'd have to be sure that stack working works correctly across the APC frame and that it is legal to recursively nest APCs on the stack.
  2. Instead of a single-step maybe we could use a breakpoint to stop execution instead?
  3. In the worst case, we might need to disallow func-eval on threads that were suspended by APC but I'm hoping that wouldn't impact most func-evals that debuggers do. Debuggers commonly pick the thread which sent the debug event to run the func-eval and that thread shouldn't have needed the APC to transition because it was already getting sent into some exception handler or runtime helper that triggered the debug event.

Copy link
Member Author

@thaystg thaystg Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the problem of changing the line of source we are on, or change the value of the variables? When we ask for an APC we don't know exactly when it will be executed right? What is the problem in pausing not exactly where it was paused by the APC call, but on the next instruction?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The APC serves us for suspending the thread. It is delivered to it at "random" place. However, with the single step issue that Noah has raised, there can be a problem. When we get the APC callback (the ApcActivationCallback is called), we first check if the context instruction pointer is at activation safe point and we don't call the HandleSuspensionForInterruptedThread if it was not. The check for the activation safe point is here:

BOOL CheckActivationSafePoint(SIZE_T ip)
{
Thread *pThread = GetThreadNULLOk();
// The criteria for safe activation is to be running managed code.
// Also we are not interested in handling interruption if we are already in preemptive mode nor if we are single stepping
BOOL isActivationSafePoint = pThread != NULL &&
(pThread->m_StateNC & Thread::TSNC_DebuggerIsStepping) == 0 &&
pThread->PreemptiveGCDisabled() &&
ExecutionManager::IsManagedCode(ip);
if (!isActivationSafePoint)
{
pThread->m_hasPendingActivation = false;
}
return isActivationSafePoint;
}

Now with this single step mechanism, we could end up moving the instruction pointer to a location that's not activation safe in case the instruction was a call or jump to native code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crashes during debugging when pausing in VS2022/.NET9
3 participants