Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dotnet dump analyze's gcroot -all <address> crashes on arm64 #3726

Closed
omajid opened this issue Mar 7, 2023 · 18 comments
Closed

dotnet dump analyze's gcroot -all <address> crashes on arm64 #3726

omajid opened this issue Mar 7, 2023 · 18 comments
Assignees
Labels
bug Something isn't working
Milestone

Comments

@omajid
Copy link
Member

omajid commented Mar 7, 2023

Description

I created a hello-world ASP.NET Core application (literally dotnet new web; dotnet build; dotnet bin/Debug/net*/App.dll), then created a dump via dotnet-dump.

I then used dotnet-dump analyze to examine it:

$ dotnet dump analyze coredump.34662 --command dso --command exit                                                                              
Loading core dump: coredump.34662 ...                                                                                                          
OS Thread Id: 0x8766 (0)                     
SP/REG           Object           Name                                                  
x14              0000ffbf65437160 System.Object                                   
0000FFFFD8B0D480 0000ffbf65437160 System.Object                                         
0000FFFFD8B0D560 0000ffbf65437160 System.Object     
0000FFFFD8B0D700 0000ffbf65437160 System.Object                              
0000FFFFD8B0D708 0000ffbf65437050 System.Threading.Tasks.Task+SetOnInvokeMres
0000FFFFD8B0D770 0000ffbf65437050 System.Threading.Tasks.Task+SetOnInvokeMres       
0000FFFFD8B0D7A0 0000ffbf65437050 System.Threading.Tasks.Task+SetOnInvokeMres
0000FFFFD8B0D7A8 0000ffbf65436f78 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions+<RunAsync>d__4, Microsoft.Extensions.Hosting.Abstractions]]
0000FFFFD8B0D7C0 0000ffbf65436f78 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions+<RunAsync>d__4, Microsoft.Extensions.Hosting.Abstractions]]
0000FFFFD8B0D7D8 0000ffbf64806bb8 System.Threading.Tasks.TplEventSource
0000FFFFD8B0D800 0000ffbf65436f78 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions+<RunAsync>d__4, Microsoft.Extensions.Hosting.Abstractions]]
0000FFFFD8B0D878 0000ffbf65436f78 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[Microsoft.Extensions.Hosting.HostingAbstractionsHostExtensions+<RunAsync>d__4, Microsoft.Extensions.Hosting.Abstractions]]
0000FFFFD8B0D890 0000ffbf6609b808 Program+<>c                 
0000FFFFD8B0D898 0000ffbf6609b820 System.Func`1[[System.String, System.Private.CoreLib]]                 
0000FFFFD8B0D8A0 0000ffbf6609e200 Microsoft.AspNetCore.Builder.RouteHandlerBuilder
0000FFFFD8B0D8A8 0000ffbf6609b820 System.Func`1[[System.String, System.Private.CoreLib]]
0000FFFFD8B0D8B0 0000ffbf66010180 System.String    /
0000FFFFD8B0D8B8 0000ffbf660512e0 Microsoft.AspNetCore.Builder.WebApplication
0000FFFFD8B0D8C8 0000ffbf660512e0 Microsoft.AspNetCore.Builder.WebApplication
0000FFFFD8B0D8D0 0000ffbf660101d0 Microsoft.AspNetCore.Builder.WebApplicationBuilder
0000FFFFD8B0D8D8 0000ffbf660512e0 Microsoft.AspNetCore.Builder.WebApplication
0000FFFFD8B0D8E0 0000ffbf660101d0 Microsoft.AspNetCore.Builder.WebApplicationBuilder
0000FFFFD8B0D8E8 0000ffbf6600efa0 System.String[]
0000FFFFD8B0DA00 0000ffbf6600efa0 System.String[]
0000FFFFD8B0DBF8 0000ffbf6600efa0 System.String[]
0000FFFFD8B0DC20 0000ffbf6600efa0 System.String[]
0000FFFFD8B0DD90 0000ffbf6600efa0 System.String[]
0000FFFFD8B0E068 0000ffbf6600efa0 System.String[]
$ dotnet dump analyze coredump.34662 --command 'gcroot -all 0000ffbf65437050' --command exit

This crashes.

Backtrace from lldb:

Process 37613 stopped                                                  
* thread #1, name = 'dotnet-dump', stop reason = signal SIGSEGV: invalid address (fault address: 0xffffffffffffffff)
    frame #0: 0x0000ffbec3e4aa9c libmscordaccore.so`DacStackReferenceWalker::GCEnumCallbackSOS(void*, __DPtr<Object>*, unsigned int, _DAC_SLOT_LOCATION) + 64 libmscordaccore.so`DacStackReferenceWalker::GCEnumCallbackSOS:
->  0xffbec3e4aa9c <+64>: ldr    x22, [x21]
    0xffbec3e4aaa0 <+68>: mov    x21, xzr
    0xffbec3e4aaa4 <+72>: tbnz   w19, #0x0, 0xffbec3e4aaf4 ; <+152>                                                                                          
    0xffbec3e4aaa8 <+76>: b      0xffbec3e4ab20            ; <+196>
(lldb) bt   
* thread #1, name = 'dotnet-dump', stop reason = signal SIGSEGV: invalid address (fault address: 0xffffffffffffffff)                                          
  * frame #0: 0x0000ffbec3e4aa9c libmscordaccore.so`DacStackReferenceWalker::GCEnumCallbackSOS(void*, __DPtr<Object>*, unsigned int, _DAC_SLOT_LOCATION) + 64 
    frame #1: 0x0000ffbec3e61150 libmscordaccore.so`GcInfoDecoder::EnumerateLiveSlots(REGDISPLAY*, bool, unsigned int, void (*)(void*, __DPtr<Object>*, unsigned int, _DAC_SLOT_LOCATION), void*) + 4984
    frame #2: 0x0000ffbec3e37720 libmscordaccore.so`EECodeManager::EnumGcRefs(REGDISPLAY*, EECodeInfo*, unsigned int, void (*)(void*, __DPtr<Object>*, unsigned int, _DAC_SLOT_LOCATION), void*, unsigned int) + 280
    frame #3: 0x0000ffbec3e4b3f4 libmscordaccore.so`DacStackReferenceWalker::Callback(CrawlFrame*, void*) + 352                                              
    frame #4: 0x0000ffbec3e31820 libmscordaccore.so`Thread::StackWalkFramesEx(REGDISPLAY*, StackWalkAction (*)(CrawlFrame*, void*), void*, unsigned int, __VPtr<Frame>) + 372
    frame #5: 0x0000ffbec3e31b8c libmscordaccore.so`Thread::StackWalkFrames(StackWalkAction (*)(CrawlFrame*, void*), void*, unsigned int, __VPtr<Frame>) + 13$
    frame #6: 0x0000ffbec3e4bd3c libmscordaccore.so`unsigned int DacStackReferenceWalker::WalkStack<unsigned int, _SOS_StackRefData>(unsigned int, _SOS_StackRefData*, void (*)(__DPtr<__DPtr<Object> >, ScanContext*, unsigned int), void (*)(void*, __DPtr<Object>*, unsigned int, _DAC_SLOT_LOCATION)) + 212            
    frame #7: 0x0000ffbec3e4a6bc libmscordaccore.so`DacStackReferenceWalker::GetCount(unsigned int*) + 180                                                   
    frame #8: 0x0000ffbec8218900 libsos.so`___lldb_unnamed_symbol895 + 152
    frame #9: 0x0000ffbec81ce6a4 libsos.so`___lldb_unnamed_symbol391 + 88
    frame #10: 0x0000ffbec81cc384 libsos.so`___lldb_unnamed_symbol371 + 236
    frame #11: 0x0000ffbec81cbe5c libsos.so`___lldb_unnamed_symbol368 + 408
    frame #12: 0x0000ffbec81f6424 libsos.so`GCRoot + 432
    frame #13: 0x0000ffff73b12a2c
    frame #14: 0x0000ffff73b1286c
    frame #15: 0x0000ffff73b12728
    frame #16: 0x0000ffff73b123ac
    frame #17: 0x0000ffffb21f2d74 libcoreclr.so`CallDescrWorkerInternal + 132
    frame #18: 0x0000ffffb204d8f0 libcoreclr.so`CallDescrWorkerWithHandler(CallDescrData*, int) + 132                                                        
    frame #19: 0x0000ffffb20f2ff4 libcoreclr.so`RuntimeMethodHandle::InvokeMethod(Object*, void**, SignatureNative*, bool) + 1672                            
    frame #20: 0x0000ffff711291dc
    frame #21: 0x0000ffff7113e2f8
    frame #22: 0x0000ffff73b120ec
    frame #23: 0x0000ffff71b007b8
    frame #24: 0x0000ffff71b00700
    frame #25: 0x0000ffff71aec5ec
    frame #26: 0x0000ffff71aec0bc
    frame #27: 0x0000ffff71aebab8
    frame #28: 0x0000ffff71aeaf54
    frame #29: 0x0000ffff71ade610
    frame #30: 0x0000ffff71ac4b9c
    frame #31: 0x0000ffffb21f2d74 libcoreclr.so`CallDescrWorkerInternal + 132
    frame #32: 0x0000ffffb204d8f0 libcoreclr.so`CallDescrWorkerWithHandler(CallDescrData*, int) + 132
    frame #33: 0x0000ffffb20f2ff4 libcoreclr.so`RuntimeMethodHandle::InvokeMethod(Object*, void**, SignatureNative*, bool) + 1672
    frame #34: 0x0000ffff711291dc
    frame #35: 0x0000ffff7113e4b0
    frame #36: 0x0000ffff70f08d50
    frame #37: 0x0000ffff71ac0f8c
    frame #38: 0x0000ffff71ac0bfc
    frame #39: 0x0000ffff71ac0b48
    frame #40: 0x0000ffff71ac0ae4
    frame #41: 0x0000ffff71ac08c0
    frame #42: 0x0000ffff71ac06e4
    frame #43: 0x0000ffff71ac0630
    frame #44: 0x0000ffff71ac05d0
    frame #45: 0x0000ffff71abc0d4
    frame #46: 0x0000ffff71ac03a4
    frame #47: 0x0000ffff71ac01d4
    frame #48: 0x0000ffff71ac0120
    frame #49: 0x0000ffff71ac00c0
    frame #50: 0x0000ffff71abc0d4
    frame #51: 0x0000ffff71abfd88
    frame #52: 0x0000ffff71abfc14
    frame #53: 0x0000ffff71abfb60
    frame #54: 0x0000ffff71abfb00
    frame #55: 0x0000ffff71abc0d4
    frame #56: 0x0000ffff71abf694
    frame #57: 0x0000ffff71abf484
    frame #58: 0x0000ffff71abf3d0
    frame #59: 0x0000ffff71abf370
    frame #60: 0x0000ffff71abc0d4
    frame #61: 0x0000ffff71abf13c
    frame #62: 0x0000ffff71abeea4
    frame #63: 0x0000ffff71abedf0
    frame #64: 0x0000ffff71abed90
    frame #65: 0x0000ffff71abc0d4
    frame #66: 0x0000ffff71abeb90                             
    frame #67: 0x0000ffff71abe8dc                                              
    frame #68: 0x0000ffff71abe828                                                                                                                            
    frame #69: 0x0000ffff71abe7c8                        
    frame #70: 0x0000ffff71abc0d4                                   
    frame #71: 0x0000ffff71abe5d0   
    frame #72: 0x0000ffff71abe3f4
    frame #73: 0x0000ffff71abe340
    frame #74: 0x0000ffff71abe2e0
    frame #75: 0x0000ffff71abc0d4
    frame #76: 0x0000ffff71abe084
    frame #77: 0x0000ffff71abdcd4
    frame #78: 0x0000ffff71abdc20
    frame #79: 0x0000ffff71abdbc0
    frame #80: 0x0000ffff71abc0d4
    frame #81: 0x0000ffff71abce70
    frame #82: 0x0000ffff71abcb94
    frame #83: 0x0000ffff71abcae0
    frame #84: 0x0000ffff71abca80
    frame #85: 0x0000ffff71abc0d4
    frame #86: 0x0000ffff71abc888
    frame #87: 0x0000ffff71abc0d4
    frame #88: 0x0000ffff71abc3a8
    frame #89: 0x0000ffff71abc284
    frame #90: 0x0000ffff71abc1d0
    frame #91: 0x0000ffff71abc170
    frame #92: 0x0000ffff71abc0d4
    frame #93: 0x0000ffff71abbce4
    frame #94: 0x0000ffff71abbacc
    frame #95: 0x0000ffff71abba18
    frame #96: 0x0000ffff71abb9b8
    frame #97: 0x0000ffff71abb8dc
    frame #98: 0x0000ffff71abb8dc
    frame #99: 0x0000ffff71abb8dc
    frame #100: 0x0000ffff71abb8dc
    frame #101: 0x0000ffff71abb8dc
    frame #102: 0x0000ffff71abb8dc
    frame #103: 0x0000ffff71abb8dc
    frame #104: 0x0000ffff71abb8dc
    frame #105: 0x0000ffff71abb8dc
    frame #106: 0x0000ffff71abb8dc
    frame #107: 0x0000ffff71abb8dc
    frame #108: 0x0000ffff71aba1cc
    frame #109: 0x0000ffff71ab9f5c
    frame #110: 0x0000ffff71ab9ea8
    frame #111: 0x0000ffff71ab9e44
    frame #112: 0x0000ffff71ab9b7c
    frame #113: 0x0000ffff71ab9a3c
    frame #114: 0x0000ffff71ab9988
    frame #115: 0x0000ffff71ab9924
    frame #116: 0x0000ffff71aa9d40
    frame #117: 0x0000ffff71aa9c04
    frame #118: 0x0000ffff71aa9b50
    frame #119: 0x0000ffff71aa9ab8
    frame #120: 0x0000ffff71a90f48
    frame #121: 0x0000ffff71a90d78
    frame #122: 0x0000ffffb21f2d74 libcoreclr.so`CallDescrWorkerInternal + 132
    frame #123: 0x0000ffffb204dfb8 libcoreclr.so`MethodDescCallSite::CallTargetWorker(unsigned long const*, unsigned long*, int) + 816                       
    frame #124: 0x0000ffffb1f3fc8c libcoreclr.so`RunMain(MethodDesc*, short, int*, PtrArray**) + 756                                                         
    frame #125: 0x0000ffffb1f3ff6c libcoreclr.so`Assembly::ExecuteMainMethod(PtrArray**, int) + 408                                                          
    frame #126: 0x0000ffffb1f6a110 libcoreclr.so`CorHost2::ExecuteAssembly(unsigned int, char16_t const*, int, char16_t const**, unsigned int*) + 636        
    frame #127: 0x0000ffffb1f2cbac libcoreclr.so`coreclr_execute_assembly + 240
    frame #128: 0x0000ffffb24a4d70 libhostpolicy.so`run_app_for_context(hostpolicy_context_t const&, int, char const**) + 1368                               
    frame #129: 0x0000ffffb24a5084 libhostpolicy.so`run_app(int, char const**) + 72                                                                          
    frame #130: 0x0000ffffb24a5a0c libhostpolicy.so`corehost_main + 200
    frame #131: 0x0000ffffb253002c libhostfxr.so`fx_muxer_t::handle_exec_host_command(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, host_startup_info_t const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<known_options, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, known_options_hash, std::equal_to<known_options>, std::allocator<std::pair<known_options const, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > > > const&, int, char const**, int, host_mode_t, bool, char*, int, int*) + 1256                                                             
    frame #132: 0x0000ffffb252f2a4 libhostfxr.so`fx_muxer_t::execute(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, char const**, host_startup_info_t const&, char*, int, int*) + 704
    frame #133: 0x0000ffffb252bab0 libhostfxr.so`hostfxr_main_startupinfo + 172
    frame #134: 0x0000aaaab08444a0 dotnet-dump`exe_start(int, char const**) + 1252                                                                           
    frame #135: 0x0000aaaab0844748 dotnet-dump`main + 144
    frame #136: 0x0000ffffb25b4384 libc.so.6`__libc_start_main + 220
    frame #137: 0x0000aaaab0837c14 dotnet-dump`_start + 52

The full test is here: https://github.com/redhat-developer/dotnet-regular-tests/blob/1b7774d6367751500e440a78909606cab0303a18/debugging-via-dotnet-dump/test.sh

Configuration

  • Is this related to a specific tool? dotnet dump analyze <corefile>, then dso and then gcroot -all object where object is the first non-System.Object from the output of dso. In my case it's 0000FFFFD8B0D708 0000ffbf65437050 System.Threading.Tasks.Task+SetOnInvokeMres
  • What OS and version, and what distro if applicable? RHEL 8, arm64, .NET runtime built via source-build
  • What is the architecture (x64, x86, ARM, ARM64)? arm64
  • Do you know whether it is specific to that configuration? It doesn't happen on x64
  • Are you running in any particular type of environment? (e.g. Containers, a cloud scenario, app you are trying to target is a different user) Running this in a freshly provision VM. .NET was self-built using source-build
  • Is it a self-contained published application? No
  • What's the output of dotnet info
$ dotnet --info
.NET SDK:
 Version:   7.0.103
 Commit:    6359034b09

Runtime Environment:
 OS Name:     rhel
 OS Version:  8
 OS Platform: Linux
 RID:         rhel.8-arm64
 Base Path:   /usr/lib64/dotnet/sdk/7.0.103/

Host:
  Version:      7.0.3
  Architecture: arm64
  Commit:       0a2bda10e8

.NET SDKs installed:
  7.0.103 [/usr/lib64/dotnet/sdk]

.NET runtimes installed:
  Microsoft.AspNetCore.App 7.0.3 [/usr/lib64/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 7.0.3 [/usr/lib64/dotnet/shared/Microsoft.NETCore.App]

Other architectures found:
  None

Environment variables:
  DOTNET_ROOT       [/usr/lib64/dotnet]

global.json file:
  Not found

Learn more:
  https://aka.ms/dotnet/info

Download .NET:
  https://aka.ms/dotnet/download

-->

Regression?

IIRC, this works on x64 without any issues, just fails on arm64. It was working on arm64 in a previous release of .NET as well, though I am not sure whether that was .NET 6 or .NET 7.

Other information

cc @tmds @aslicerh

@omajid omajid added the bug Something isn't working label Mar 7, 2023
@omajid
Copy link
Member Author

omajid commented Mar 7, 2023

Similar crash with traverseheap -xml -verify full-heap on arm64:

$ lldb dotnet-dump                                                                              
(lldb) target create "dotnet-dump"                                             
Current executable set to 'dotnet-dump' (aarch64).                                                                                                           
(lldb) r analyze TestDir/coredump.42064 --command 'traverseheap -xml -verify full-heap' --command exit
analyze TestDir/coredump.42064 --command 'traverseheap -xml -verify full-heap' --command exit                                                       
Process 46322 launched: '/home/dotnet/.dotnet/tools/dotnet-dump' (aarch64)
Loading core dump: TestDir/coredump.42064 ...                                                                                                                 Writing Xml format to file full-heap                                                                                                                          Gathering types...                                                                                                                                            tracing roots...                                                                                                                                              Process 46322 stopped                                                                                                                                         * thread #1, name = 'dotnet-dump', stop reason = signal SIGSEGV: invalid address (fault address: 0xffffffffffffffff)                                         
    frame #0: 0x0000ffbf114daa9c libmscordaccore.so`DacStackReferenceWalker::GCEnumCallbackSOS(void*, __DPtr<Object>*, unsigned int, _DAC_SLOT_LOCATION) + 64 libmscordaccore.so`DacStackReferenceWalker::GCEnumCallbackSOS:
->  0xffbf114daa9c <+64>: ldr    x22, [x21]                                    
    0xffbf114daaa0 <+68>: mov    x21, xzr                                                                                                                    
    0xffbf114daaa4 <+72>: tbnz   w19, #0x0, 0xffbf114daaf4 ; <+152>
    0xffbf114daaa8 <+76>: b      0xffbf114dab20            ; <+196> 
(lldb) bt                                                 
* thread #1, name = 'dotnet-dump', stop reason = signal SIGSEGV: invalid address (fault address: 0xffffffffffffffff)
  * frame #0: 0x0000ffbf114daa9c libmscordaccore.so`DacStackReferenceWalker::GCEnumCallbackSOS(void*, __DPtr<Object>*, unsigned int, _DAC_SLOT_LOCATION) + 64
    frame #1: 0x0000ffbf114f1150 libmscordaccore.so`GcInfoDecoder::EnumerateLiveSlots(REGDISPLAY*, bool, unsigned int, void (*)(void*, __DPtr<Object>*, unsigned int, _DAC_SLOT_LOCATION), void*) + 4984
    frame #2: 0x0000ffbf114c7720 libmscordaccore.so`EECodeManager::EnumGcRefs(REGDISPLAY*, EECodeInfo*, unsigned int, void (*)(void*, __DPtr<Object>*, unsigned int, _DAC_SLOT_LOCATION), void*, unsigned int) + 280
    frame #3: 0x0000ffbf114db3f4 libmscordaccore.so`DacStackReferenceWalker::Callback(CrawlFrame*, void*) + 352
    frame #4: 0x0000ffbf114c1820 libmscordaccore.so`Thread::StackWalkFramesEx(REGDISPLAY*, StackWalkAction (*)(CrawlFrame*, void*), void*, unsigned int, __VPtr<Frame>) + 372
    frame #5: 0x0000ffbf114c1b8c libmscordaccore.so`Thread::StackWalkFrames(StackWalkAction (*)(CrawlFrame*, void*), void*, unsigned int, __VPtr<Frame>) + 132
    frame #6: 0x0000ffbf114dbd3c libmscordaccore.so`unsigned int DacStackReferenceWalker::WalkStack<unsigned int, _SOS_StackRefData>(unsigned int, _SOS_StackRefData*, void (*)(__DPtr<__DPtr<Object> >, ScanContext*, unsigned int), void (*)(void*, __DPtr<Object>*, unsigned int, _DAC_SLOT_LOCATION)) + 212
    frame #7: 0x0000ffbf114da6bc libmscordaccore.so`DacStackReferenceWalker::GetCount(unsigned int*) + 180
    frame #8: 0x0000ffbf11738900 libsos.so`___lldb_unnamed_symbol895 + 152
    frame #9: 0x0000ffbf116f0f70 libsos.so`___lldb_unnamed_symbol421 + 232
    frame #10: 0x0000ffbf116f0aa4 libsos.so`___lldb_unnamed_symbol415 + 224
    frame #11: 0x0000ffbf11706fa8 libsos.so`TraverseHeap + 520
    frame #12: 0x0000ffffb8f4537c
    frame #13: 0x0000ffffb8f451bc
    frame #14: 0x0000ffffb8f45078
    frame #15: 0x0000ffffb8f44cfc
    frame #16: 0x0000fffff7722d74 libcoreclr.so`CallDescrWorkerInternal + 132
    frame #17: 0x0000fffff757d8f0 libcoreclr.so`CallDescrWorkerWithHandler(CallDescrData*, int) + 132
    frame #18: 0x0000fffff7622ff4 libcoreclr.so`RuntimeMethodHandle::InvokeMethod(Object*, void**, SignatureNative*, bool) + 1672

@mikem8361 mikem8361 self-assigned this Mar 7, 2023
@mikem8361 mikem8361 added this to the 8.0.0 milestone Mar 7, 2023
@tmds
Copy link
Member

tmds commented Mar 8, 2023

Do you know if it crashes when you run the same commands from lldb with the sos plugin?
We have a test for that as well, right?

@tmds
Copy link
Member

tmds commented Mar 15, 2023

@omajid do you know the above? Or are we not testing that?

@omajid
Copy link
Member Author

omajid commented Mar 15, 2023

Or are we not testing that?

Mostly this. The sos version of the test is much less exhaustive than the dotnet-dump version of the test.

@leculver
Copy link
Contributor

Related: #485

@leculver
Copy link
Contributor

@omajid Are you able to find what line of code in DacStackReferenceWalker::GCEnumCallbackSOS is faulting in? Or potential range of lines?
https://github.com/dotnet/runtime/blob/main/src/coreclr/debug/daccess/daccess.cpp#L7929-L7985

That would go a long way towards narrowing it down. As an odd coincidence, I'll be working on that function for a perf issue next week. If I knew what was crashing here I might be able to take a look at this issue while working on the perf problem.

Thanks!

@leculver leculver assigned leculver and unassigned mikem8361 Mar 21, 2023
@leculver
Copy link
Contributor

So the big question here is why isn't the catch block handling this? DacStackReferenceWalker::GetCount is wrapped in an SOS enter/leave:

https://github.com/dotnet/runtime/blob/main/src/coreclr/debug/daccess/daccess.cpp#L7859-L7873

Which is defined here:

https://github.com/dotnet/runtime/blob/main/src/coreclr/debug/daccess/dacimpl.h#L3984-L4004

We should be hitting that EX_END_CATCH(SwallowAllExceptions), and not bringing down the debugger.

Obviously it's not good that something is causing the underlying crash (I'm working on a fix in .Net 8), but regardless, this should have manifested as a failed function call and not in bringing down the debugger.

@leculver
Copy link
Contributor

Ok, after digging in deeper here, it looks like this is unrelated to the underlying issues in DacStackReferenceWalker/DacHandleWalker.

@tommcdon: Who is the right person to assign this issue? I believe this is an arm64 dac-ization issue. Since it's a dac-ization/stackwalking issue on arm64 (instead of an issue in the DacStackReferenceWalker), I'm not going to tackle this one.

Also, I got to the bottom of this:

So the big question here is why isn't the catch block handling this?

Unlike Windows, we don't attempt to swallow SIGSEGV issues. So the SIGSEGV crashing us is expected, but the underlying SIGSEGV isn't expected...and is likely a dac-ization problem.

@leculver leculver removed their assignment Mar 23, 2023
@mikem8361 mikem8361 self-assigned this Mar 23, 2023
@leculver
Copy link
Contributor

Let's check this again when .Net 8 preview 4 ships. It's possible that some of our previous fixes in this area might have resolved the issue.

@tmds
Copy link
Member

tmds commented Jun 19, 2023

There are also crashes when using lldb sos gcroot on x64 using the latest dotnet-dump with the latest .NET 8 bits on RHEL 8, RHEL 9, and Fedora 38.

@hoyosjs
Copy link
Member

hoyosjs commented Jul 21, 2023

@tdms, what issues are you seeing? Is there any repro that can be shared? Is it the same shell script that was shared before?

@tmds
Copy link
Member

tmds commented Jul 29, 2023

Running this script should reproduce the gcroot issue: https://github.com/redhat-developer/dotnet-regular-tests/blob/main/debugging-sos-lldb-via-core/test.sh.

@leecow
Copy link
Member

leecow commented Aug 9, 2023

@hoyosjs - any additional thoughts based on Tom's repro script?

@hoyosjs
Copy link
Member

hoyosjs commented Aug 11, 2023

I haven't taken a look at this particular one, but I have 2 bugs in the 8 queue that might explain this.

@mikem8361
Copy link
Member

I'm currently investigating.

mikem8361 added a commit to mikem8361/runtime that referenced this issue Aug 15, 2023
Faulted in DAC because the HelperMethodFrame's REGDISPLAY CurrentContextPointers were not initialized correctly.

Fixes issue dotnet/diagnostics#3726
mikem8361 added a commit to dotnet/runtime that referenced this issue Aug 16, 2023
Faulted in DAC because the HelperMethodFrame's REGDISPLAY CurrentContextPointers were not initialized correctly.

Fixes issue dotnet/diagnostics#3726
github-actions bot pushed a commit to dotnet/runtime that referenced this issue Aug 16, 2023
Faulted in DAC because the HelperMethodFrame's REGDISPLAY CurrentContextPointers were not initialized correctly.

Fixes issue dotnet/diagnostics#3726
@mikem8361
Copy link
Member

Fixed in .NET 8 RC2 (September's release).

@tmds
Copy link
Member

tmds commented Aug 16, 2023

@mikem8361 the PR mentions arm64. There is also a crash on x64. Do you expect it to be fixed as well?

@tmds
Copy link
Member

tmds commented Aug 16, 2023

Ok, saw your new comment on the PR. x64 should be fixed by another change.

Thanks!

carlossanlop pushed a commit to dotnet/runtime that referenced this issue Aug 16, 2023
Faulted in DAC because the HelperMethodFrame's REGDISPLAY CurrentContextPointers were not initialized correctly.

Fixes issue dotnet/diagnostics#3726

Co-authored-by: Mike McLaughlin <mikem@microsoft.com>
@ghost ghost locked as resolved and limited conversation to collaborators Sep 15, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants