-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[release/8.0-staging] Fix a set of minor DAC bugs I encountered recently #104919
[release/8.0-staging] Fix a set of minor DAC bugs I encountered recently #104919
Conversation
In src/coreclr/debug/daccess/dacdbiimpl.cpp - If g_pDebugger is NULL, produce an error, or NULL result instead of an invalid memory access. - This caused problems when debugging with a debug version of the DAC In src/coreclr/debug/di/module.cpp - Avoid attempting to get the MapAddr if the number of entries in the map is 0, as it may not yet be fully initialized and attempting to call the GetMapAddr function may cause an ASSERT in debug/checked versions of the DAC In src/coreclr/utilcode/collections.cpp - If a CHashTable is not fully initialized, or is in an intermediate state, its possible for the linked list to have an infinite cycle. Tweak the iteration here under the DAC to stop - This is known to cause stack walking to loop infinitely in the debugger when debugging a runtime which has suspended at a non-safe point
Friendly reminder that today Monday July 15th is Code Complete day at 4pm PT, which is the deadline to get this included in the August Release. |
@@ -5650,10 +5650,13 @@ void DacDbiInterfaceImpl::LookupEnCVersions(Module* pModule, | |||
DebuggerJitInfo * pDJI = NULL; | |||
EX_TRY_ALLOW_DATATARGET_MISSING_MEMORY | |||
{ | |||
pDMI = g_pDebugger->GetOrCreateMethodInfo(pModule, mdMethod); | |||
if (pDMI != NULL) | |||
if (g_pDebugger != NULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we get to this state? This gets allocated in EEStartup. Is the DBI getting called super early on? Feels like this is something that shouldn't happen. Data target missing memory like the catch gets should be the main reason to not have it (or init error)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good overall cleanup for defensive coding. To be clear for this PR, I did not encounter the dacdbiimpl.cpp change with a real world example. It would be fine with me to not backport this particular fix and just leave the fix shipping for v.next. It's only the collections.cpp
issue that I'd like to backport since I ran into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hoyosjs I hit those particular issues when debugging CLR startup scenarios. ... I spend a lot of time debugging the runtime startup path, and I regularly was hitting DAC crashes when trying to diagnose the startup path. Typically, the DAC isn't super useful at that point, but its really annoying to be unable to step through the various process creation paths.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, yes, the DBI is commonly called this early in the newer Windbg UI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I withdraw my comment. :) If this happens in CLR startup I do recommend backporting this part to .net 8 as well. We deal with that often in servicing, watson and elsewhere. Even if most CLR devs will be on .Net 9, it would be good to get ahead of this issue for our long term servicing release.
Do we have an understanding of what the customer scenarios are affected by this issue and how this change might address them? |
This was an internal find (me). I encountered one of these issues in testing. This is specifically the issue listed fixed in src/coreclr/utilcode/collections.cpp. I would be fine with only fixing/changing that file if it reduces the risk overhead. The other two issues I did not encounter. A crash in the runtime (or just hitting break in a native debugger) can cause us to arbitrarily pause the process in an "inconsistent" state, where we are in the middle of updating some pointers. I ran into this particular issue where The problem here is particularly annoying for the debuggers calling into the DBI. WinDbg goes into an infinite loop. Visual Studio seems to hang for several minutes, but it did seem to recover in some cases and it hard-crashed in others. We should fix this since it could potentially affect our ability to debug problems. It may also cause headaches in Watson if this is hit regularly, as putting the debugger into an infinite loop is a really bad state to be in for a service. (That team hasn't reached out, only pointing out the severity when debugging at scale.) |
This is still missing Tactics approval. Friendly reminder that Code Complete for the October Release is September 9. If we want this fix to be included in that release, please merge this PR before that date. |
Friendly reminder that today 10/14 is Code Complete for the November Release. If this fix is intended to be included in that release, please make sure to get a Tactics approval and merge it before 4pm PT. Otherwise, it will have to wait until next month. |
Backport of #100031 to release/8.0-staging
/cc @davidwrighton
Customer Impact
This fix addresses hangs and crashes reported by the Windbg team diagnosing Watson dumps.
Lee encountered one of these issues in testing. This is specifically the issue listed fixed in src/coreclr/utilcode/collections.cpp. I would be fine with only fixing/changing that file if it reduces the risk overhead. The other two issues I did not encounter.
A crash in the runtime (or just hitting break in a native debugger) can cause us to arbitrarily pause the process in an "inconsistent" state, where we are in the middle of updating some pointers. I ran into this particular issue where (psEntry->iNext == psSrch->iNext) multiple times while debugging a particular issue.
The problem here is particularly annoying for the debuggers calling into the DBI. WinDbg goes into an infinite loop. Visual Studio seems to hang for several minutes, but it did seem to recover in some cases and it hard-crashed in others.
We should fix this since it could potentially affect our ability to debug problems. It may also cause headaches in Watson if this is hit regularly, as putting the debugger into an infinite loop is a really bad state to be in for a service. (That team hasn't reached out, only pointing out the severity when debugging at scale.)
Regression
Windbg behavior is now using different debugger API's which has caused an experience regression for customers.
Testing
Manual use over the last few months of windbg has experienced fewer problems. Manual testing (by @leculver) of a small repro case has shown that this fix addresses at least one case where this failure broke the managed debugger.
Risk
Low, this affects only the DAC, and will result in avoiding infinite loops and doing error checking instead of triggering access violations.
IMPORTANT: If this backport is for a servicing release, please verify that:
The PR target branch is
release/X.0-staging
, notrelease/X.0
.If the change touches code that ships in a NuGet package, you have added the necessary package authoring and gotten it explicitly reviewed.