Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release/5.0] Port Debugger data breakpoint deadlock fixes to .NET 5.0 #44563

Merged
merged 2 commits into from
Nov 24, 2020

Conversation

sdmaclea
Copy link
Contributor

@sdmaclea sdmaclea commented Nov 11, 2020

Backport of #44471 & #44549 to release/5.0

  • When suspending for the debugger is in progress (the debugger is waiting for some threads to reach a safe point for suspension), a thread that is not yet suspended may trigger another runtime suspension due to triggering a GC from allocation or for deleting call counters. Suspending the runtime at that point is currently not allowed because the order of operations conflicts with requirements to send GC events for managed data breakpoints to work correctly. Instead, the thread suspends for the debugger first, and after the runtime is resumed by the debugger, continues suspending for the other reason.
  • However, the thread may also be in a forbid-suspend-for-debugger region since it holds the slot backpatching lock and suspending for the debugger at that point could cause FuncEvals to deadlock
  • So the thread is not allowed to suspend for the debugger and not allowed to suspend for the other reason, and the runtime deadlocks
  • The plan is to change managed data breakpoints to pin objects instead of using GC events to track object relocation, and to deprecate the GC events APIs. This change adds an API to pin an object, breaks the deadlock in thread suspension, and fails an attempt to QI the interface containing the API to enable GC events, ensuring that it would not be used. An update to VS would follow, to use pinning for managed data breakpoints.

Customer Impact

  • The .NET 5.0 release contained a potential deadlock issue when the debugger was attached. The deadlock was rare, but permanently deadlocked the target process
  • When VS breaks a process for managed debugging at an arbitrary time, if it happens during a very short window of time, the target process deadlocks and the only resolution is to kill the process
  • A customer indicated that the deadlock occurs at random times, and roughly once every three hours of debugging in the customer's app, see Intermittent hang/deadlock in .net core 5.0 RC1 while debugging #42375 (comment)
  • A workaround is to disable tiering and profiler-rejit-on-attach when debugging. This has to be done when starting the process, so it would not be usable when attaching to an already-running process with those features enabled.

Regression?

  • Yes, from 3.1, which used a different solution for preventing FuncEvals from deadlocking on the slot backpatching lock. That solution was problematic for VS debugger tests and had worse diagnostic experience in more situations since FuncEvals would not be allowed to run.

Testing

  • Induced the problem by triggering a GC and suspending the runtime for deleting call counts, and breaking the process from VS in the short timing window where this issue would occur
  • Verified that the runtime no longer deadlocks after this change
  • Verified that trying to add a managed data breakpoint to a patched runtime with an unpatched VS shows an error indicating that the runtime does not support managed data breakpoints
  • @chuckries smoke tested the fix, and things look good for VS w.r.t. the runtime: We are able to create pinned handles, avoid turning on GC events, and everything appears to work as expected. I think the PR is good to complete.

Risk

This is a minimal risk patch.

  • The patch removes an COM interface which indicates support for managed data breakpoints to older VS debuggers.
  • It adds a new COM interface which will allow new VS debuggers to support managed data breakpoints.
  • Then simplifies the debugger suspension logic to fix the deadlock.
  • Includes updated precompiled MIDL (with whitespace changes introduced by the MIDL compiler).

CC @tommcdon @noahfalk @kouvel @chuckries @jeffschwMSFT

* Add ICorDebugHeapValue4
* Add EnableGCNotificationEvents deprecation comment
…otnet#44549)

* Stop providing IID_ICorDebugProcess10

Prevent older VS versions from setting managed data breakpoints.

* Simplify the thread collision logic to prevent deadlock

This is a simplification of dotnet#44539
as proposed by @kouvel
@ghost
Copy link

ghost commented Nov 11, 2020

Tagging subscribers to this area: @tommcdon
See info in area-owners.md if you want to be subscribed.


Issue meta data

Issue content: Backport of #44471 & #44549 to release/5.0

Customer Impact

The .NET 5.0 release contained a potential deadlock issue when the debugger was attached. The deadlock was rare, but killed the target process

Testing

@kouvel @chuckries Please update any testing that has occurred, change this from a draft and add the servicing consider label.

Risk

This is a minimal risk patch. The patch removes an COM interface which indicates support for managed data breakpoints to older VS debuggers. It adds a new COM interface which will allow new VS debuggers to support managed data breakpoints.

CC @tommcdon @noahfalk @kouvel @chuckries @jeffschwMSFT

Issue author: sdmaclea
Assignees: kouvel
Milestone: [object Object]

@sdmaclea sdmaclea marked this pull request as draft November 11, 2020 23:48
@sdmaclea
Copy link
Contributor Author

/azp run runtime

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@sdmaclea sdmaclea self-assigned this Nov 12, 2020
@danmoseley danmoseley changed the title Port Debugger data breakpoint deadlock fixes to .NET 5.0 [release/5.0] Port Debugger data breakpoint deadlock fixes to .NET 5.0 Nov 17, 2020
@sdmaclea sdmaclea marked this pull request as ready for review November 18, 2020 00:14
@sdmaclea sdmaclea requested a review from noahfalk November 18, 2020 00:45
Copy link
Member

@jeffschwMSFT jeffschwMSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved. Please get several code reviews as the change is on the larger side.

@jeffschwMSFT jeffschwMSFT added the Servicing-consider Issue for next servicing release review label Nov 18, 2020
@sdmaclea sdmaclea requested a review from mikem8361 November 18, 2020 00:51
@sdmaclea
Copy link
Contributor Author

Sure. I might have neglected to mention that a lot of this is machine generated core from the MIDL compiler. I left the whitespace as MIDL generated it. The change set looks substantially smaller with ignore whitespace.

@sdmaclea
Copy link
Contributor Author

@noahfalk @hoyosjs Can you also review this backport?

Copy link
Member

@noahfalk noahfalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sdmaclea sdmaclea added Servicing-approved Approved for servicing release and removed Servicing-consider Issue for next servicing release review labels Nov 24, 2020
@sdmaclea
Copy link
Contributor Author

Marking servicing approved per tactics

@sdmaclea sdmaclea modified the milestone: 5.0.x Nov 24, 2020
@sdmaclea sdmaclea merged commit ca4d6d1 into dotnet:release/5.0 Nov 24, 2020
@sdmaclea sdmaclea deleted the deadlock5.0 branch November 24, 2020 22:56
@ghost ghost locked as resolved and limited conversation to collaborators Dec 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants