-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need a better user experience for crash dumps occurring in PRs #31820
Comments
cc @wfurt |
@steveharter I assume you are talking about this build? https://dnceng.visualstudio.com/public/_build/results?buildId=506909 There was a dump uploaded for the crash, see the attachments section in the failed workitems: https://dnceng.visualstudio.com/public/_build/results?buildId=506909&view=ms.vss-test-web.build-test-results-tab&runId=16085098&resultId=168576&paneView=attachments. console.b045ca6a.log: |
Getting the core is not that hard. (even if not obvious) Getting matching test bits is more difficult. |
I was under the assumption that a crash dump would be sufficient to diagnose such issues. I'm unsure what the action item here is. @steveharter can you please clarify? |
As @wfurt noted the test bits are necessary. It is not possible to debug a crash dump with The description lists 3 issues: no symbols, no instructions on how to get the symbols, and lldb instructions\SOS plugin not working. The latter may not be an "infrastructure" issue but someone should vet the OSX developer experience for dumps caused in a PR as the lldb instructions\SOS didn't work for me. |
Core may be sufficient if we published symbols to symbols server. e.g. official builds. I don't think we do for PRs. Otherwise, you need the bist and set |
Ideally, at least for the StackOverflow scenario I had, is that the test run information includes:
In rare cases where this information isn't enough to troubleshoot, then having access to the symbols and\or runtime would be nice to debug. In my case, a local build did not work to troubleshoot I assume due to different optimizations or local settings that didn't cause a StackOverflow. |
That's something that we would like to do when we switch to dotnet test (VSTest platform):
from https://docs.microsoft.com/en-us/dotnet/core/tools/dotnet-vstest?tabs=netcore21 |
This PR #32167 should add StackTrace to StackOverflowException. |
Tagging subscribers to this area: @safern, @ViktorHofer |
All these 3 issues are addressed by .md file that is generated next to the crash dump with detailed instructions for how to download the crash dump, matching symbols and SOS. (The template is at https://github.com/dotnet/runtime/blob/main/eng/testing/debug-dump-template.md.) |
Currently if there is a crash in a PR there is no easy way to diagnose since:
lldb
did not work. This may be an issue with the SOS plugin on OSX.Background: As part of #2259 there was a
StackOverFlowException
on OSX during PR runs. Since the code was new, the crash only occurred in PR runs and runtime symbols are not public.Steps taken:
System.Test.Json.Tests
due toStackOverflowException
.If I was able to see verbose console output of the tests (which display the current tests running) or the current state of
testsresults.xml
I would have been able to debug the test that was causing the issue and wouldn't have needed to go through the additional steps below.Optimally, I would see the failed test and the managed+native callstacks for the crash.
The test was only crashing on OSX, so from my MacBook I attempted to repro the environment locally (build release CLR and debug version of tests). However, I was not able to reproduce the StackOverflow.
On my MacBook, I download the core dump from the PR test run attachments and through some searches discovered helpful instructions at https://github.com/dotnet/diagnostics/blob/master/documentation/debugging-coredump.md
On my MacBook I installed SOS and
dotnet-symbols
according to the instructions.The instructions do not explain how to get the symbols for PR runs. Asking for help, I was able to do that through some low-level web requests and download the runtime files and associated symbol files. Ideally these would have been attached to the PR, like the core dump was.
The instructions state that
dotnet-symbol --host-only
will not work with local symbols so copy the symbols to a temp directory, so I did that (actually copied all runtime files to the temp location).Ran
lldb --core /tmp/dump/core.123 /tmp/dump/dotnet
. The instructions state "<host-program>
" for the last parameter, so I used/tmp/dump/dotnet
(also triedlibnethost.dylib
).From
lldb
ransetsymbolserver -directory /tmp/dump
Finally tried to see the stack. Ran
sos ClrStack
(and other sos commands later) and got an exception (from SOS SymbolReader.LoadNativeSymbols):lldb
to get a call stack, I tried to use the the runtime downloaded from Helix against my local tests, and I was able to repro the exception and debug the tests.The text was updated successfully, but these errors were encountered: