Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test failure tracing\\eventpipe\\pauseonstart\\pauseonstart\\pauseonstart.cmd #54469

Closed
VincentBu opened this issue Jun 21, 2021 · 20 comments
Closed

Comments

@VincentBu
Copy link
Contributor

Run: runtime-coreclr outerloop 20210618.10

Failed test:

CoreCLR windows x86 Checked no_tiered_compilation @ Windows.10.Amd64.Open

- tracing\\eventpipe\\pauseonstart\\pauseonstart\\pauseonstart.cmd

Error message:

Return code:      1
Raw output file:      C:\h\w\B1050997\w\A68508EA\uploads\Reports\tracing.eventpipe\pauseonstart\pauseonstart\pauseonstart.output.txt
Raw output:
BEGIN EXECUTION
"C:\h\w\B1050997\p\corerun.exe"  pauseonstart.dll
0.0s: ::== Running test: TEST_RuntimeResumesExecutionWithCommand
0.1s: Server name is 'DOTNET_TRACE_TESTS_40e5ocet.e4a'
0.9s: running sub-process: C:\h\w\B1050997\p\corerun.exe C:\h\w\B1050997\w\A68508EA\e\tracing\eventpipe\pauseonstart\pauseonstart\pauseonstart.dll 0
2.7s: subprocess started: True
2.7s: subprocess PID: 7708
4.6s: { Magic=System.Byte[]; ClrInstanceId=b355e344-977d-4ad4-9912-0ddff2c8f280; ProcessId=7708; Unused=0; }
4.6s: Sent: { Header={ Magic=System.Byte[]; Size=20; CommandSet=4; CommandId=1; Reserved=0 }; }
4.7s: received: { Header={ Magic=System.Byte[]; Size=24; CommandSet=255; CommandId=0; Reserved=0 }; Payload=[ 0x00 0x00 0x00 0x00  ]}
4.7s: Sending 'exit' to subprocess stdin
5.8s: ================= Subprocess Exited =================
5.8s: SubProcess exited - Exit code: 0
5.8s: ----------------------------------------
5.8s: Subprocess stdout:
4.8s: Subprocess started!  Waiting for input...
4.8s: Received 'exit'.  Exiting...
5.8s: Subprocess stderr:
5.8s: ----------------------------------------
5.9s: Test passed: True
5.9s:
5.9s: ::== Running test: TEST_TracesHaveRelevantEvents
6.0s: Server name is 'DOTNET_TRACE_TESTS_g4xdact4.0zp'
6.0s: running sub-process: C:\h\w\B1050997\p\corerun.exe C:\h\w\B1050997\w\A68508EA\e\tracing\eventpipe\pauseonstart\pauseonstart\pauseonstart.dll 0
6.0s: subprocess started: True
6.0s: subprocess PID: 7436
6.0s: Standard Diagnostics Server connection not created yet -> try again in 100 ms
6.2s: { Magic=System.Byte[]; ClrInstanceId=2eaf57d3-57a4-4111-b9dd-b6fc312daf82; ProcessId=7436; Unused=0; }
6.2s: Starting EventPipeSession over standard connection
6.4s: Started EventPipeSession over standard connection with session id: 0x3412608
6.5s: Send ResumeRuntime Diagnostics IPC Command
6.5s: Sent: { Header={ Magic=System.Byte[]; Size=20; CommandSet=4; CommandId=1; Reserved=0 }; }
6.5s: received: { Header={ Magic=System.Byte[]; Size=24; CommandSet=255; CommandId=0; Reserved=0 }; Payload=[ 0x00 0x00 0x00 0x00  ]}
8.5s: Stopping EventPipeSession over standard connection
9.4s: Stopped EventPipeSession over standard connection
9.4s: Sending 'exit' to subprocess stdin
9.5s: ================= Subprocess Exited =================
9.5s: SubProcess exited - Exit code: 0
9.5s: ----------------------------------------
9.5s: Subprocess stdout:
1.5s: Subprocess started!  Waiting for input...
3.5s: Received 'exit'.  Exiting...
9.5s: Subprocess stderr:
9.5s: ----------------------------------------
10.5s: isStartupEventPresent: True
10.6s: Test passed: True
10.6s:
10.6s: ::== Running test: TEST_MultipleSessionsCanBeStartedWhilepaused
10.6s: Server name is 'DOTNET_TRACE_TESTS_xwv5m345.qzm'
10.6s: running sub-process: C:\h\w\B1050997\p\corerun.exe C:\h\w\B1050997\w\A68508EA\e\tracing\eventpipe\pauseonstart\pauseonstart\pauseonstart.dll 0
10.6s: subprocess started: True
10.6s: subprocess PID: 3760
10.6s: Standard Diagnostics Server connection not created yet -> try again in 100 ms
10.8s: Standard Diagnostics Server connection not created yet -> try again in 100 ms
11.1s: { Magic=System.Byte[]; ClrInstanceId=16260ea4-6d84-403f-b37d-01f47bf81d6c; ProcessId=3760; Unused=0; }
11.1s: Starting EventPipeSession over standard connection
11.1s: Started EventPipeSession over standard connection with session id: 0x2c923f8
11.1s: Starting EventPipeSession over standard connection
11.1s: Started EventPipeSession over standard connection with session id: 0x2c922f0
11.1s: Starting EventPipeSession over standard connection
11.1s: Started EventPipeSession over standard connection with session id: 0x2d62aa0
11.1s: Send


Stack trace
   at tracing_eventpipe._pauseonstart_pauseonstart_pauseonstart_._pauseonstart_pauseonstart_pauseonstart_cmd() in tracing.eventpipe.XUnitWrapper.dll:token 0x6000016+0x284
@dotnet-issue-labeler dotnet-issue-labeler bot added area-Tracing-coreclr untriaged New issue has not been triaged by the area owner labels Jun 21, 2021
@tommcdon tommcdon removed the untriaged New issue has not been triaged by the area owner label Jun 21, 2021
@tommcdon tommcdon added this to the 6.0.0 milestone Jun 21, 2021
@tommcdon
Copy link
Member

@josalem

@josalem
Copy link
Contributor

josalem commented Jun 21, 2021

Here is the actual error message:

 18.1s: Subprocess stdout: 
  1.4s: Subprocess started!  Waiting for input...
 18.1s: Subprocess stderr: 
  6.2s: Assert failure(PID 3760 [0x00000eb0], Thread: 4660 [0x1234]): Consistency check failed: AV in clr at this callstack:
  6.2s: ------
  6.2s: CORECLR! AllocMemTracker::~AllocMemTracker + 0xFE (0x70b35ed3)
  6.2s: CORECLR! IBCLoggerAwareAllocMemTracker::~IBCLoggerAwareAllocMemTracker + 0xDC (0x708ae031)
  6.2s: CORECLR! ClassLoader::DoIncrementalLoad + 0x437 (0x70676dba)
  6.2s: CORECLR! ClassLoader::LoadTypeHandleForTypeKey_Body + 0x894 (0x7067f870)
  6.2s: CORECLR! ClassLoader::LoadTypeHandleForTypeKey + 0x19A (0x7067eab4)
  6.2s: CORECLR! ClassLoader::LoadConstructedTypeThrowing + 0x63D (0x7067b684)
  6.2s: CORECLR! ClassLoader::LoadCanonicalGenericInstantiation + 0x347 (0x706f2ded)
  6.2s: CORECLR! ClassLoader::CreateTypeHandleForNonCanonicalGenericInstantiation + 0x978 (0x706f0f67)
  6.2s: CORECLR! ClassLoader::CreateTypeHandleForTypeKey + 0x30E (0x70676022)
  6.2s: CORECLR! ClassLoader::DoIncrementalLoad + 0x376 (0x70676cf9)
  6.2s: CORECLR! ClassLoader::LoadTypeHandleForTypeKey_Body + 0x894 (0x7067f870)
  6.2s: CORECLR! ClassLoader::LoadTypeHandleForTypeKey + 0x19A (0x7067eab4)
  6.2s: CORECLR! ClassLoader::LoadConstructedTypeThrowing + 0x63D (0x7067b684)
  6.2s: CORECLR! ClassLoader::LoadGenericInstantiationThrowing + 0x4DF (0x7067c02d)
  6.2s: CORECLR! SigPointer::GetTypeHandleThrowing + 0xE1F (0x707ad27a)
  6.2s: CORECLR! ZapSig::DecodeType + 0xC9 (0x7080a600)
  6.2s: CORECLR! LoadDynamicInfoEntry + 0x67B (0x7070d03b)
  6.2s: CORECLR! Module::FixupNativeEntry + 0x1A0 (0x706466e0)
  6.2s: CORECLR! Module::FixupDelayListAux<Module *,int (__thiscall Module::*)(CORCOMPILE_IMPORT_SECTION *,unsigned long,unsigned long *)> + 0x34A (0x7080b7e9)
  6.2s: -----
  6.2s: .AV on tid=0x1234 (4660), cxr=04A9BF58, exr=04A9BF08
  6.2s: FAILED: false
  6.2s: CORECLR! CHECK::Trigger + 0x310 (0x70b3387c)
  6.2s: CORECLR! CLRVectoredExceptionHandlerPhase3 + 0x252 (0x706c4a2f)
  6.2s: CORECLR! CLRVectoredExceptionHandlerPhase2 + 0x4E (0x706c46aa)
  6.2s: CORECLR! CLRVectoredExceptionHandler + 0x13C (0x706c4650)
  6.2s: CORECLR! CLRVectoredExceptionHandlerShim + 0x172 (0x706c4c72)
  6.2s: NTDLL! LdrSetAppCompatDllRedirectionCallback + 0x1C21A (0x776af94a)
  6.2s: NTDLL! RtlUnwind + 0x1BA (0x7767011a)
  6.2s: NTDLL! KiUserExceptionDispatcher + 0xF (0x77680dbf)
  6.2s: CORECLR! IBCLoggerAwareAllocMemTracker::~IBCLoggerAwareAllocMemTracker + 0xDC (0x708ae031)
  6.2s: CORECLR! ClassLoader::DoIncrementalLoad + 0x437 (0x70676dba)
  6.2s:     File: D:\workspace\_work\1\s\src\coreclr\vm\excep.cpp Line: 7502
  6.2s:     Image: C:\h\w\B1050997\p\corerun.exe

This looks to be some kind of AV in the class loader?

@tommcdon
Copy link
Member

This looks to be some kind of AV in the class loader?

Adding @janvorli

@janvorli
Copy link
Member

This looks like some kind of memory corruption. The crash happens in the following loop:

AllocMemTrackerBlock *pBlock = m_pFirstBlock;
while (pBlock != &m_FirstBlock)
{
AllocMemTrackerBlock *pNext = pBlock->m_pNext;
delete pBlock;
pBlock = pNext;
}

The pBlock at the time of crash is NULL, but it is never supposed to be NULL, the last element in the list should have m_pNext pointing to the m_FirstBlock. The m_pNext is always assigned the m_pFirstBlock when a new block is added and the m_pFirstBlock is set to point to the newly added block. So there is no legit way of setting that to NULL.

Unfortunately it is not possible to see what the linked list looked like before the crash, as the loop we are crashing in destroys the blocks one by one, so the previous blocks leading to the one with the NULL m_pNext are already gone.

@josalem
Copy link
Contributor

josalem commented Jun 21, 2021

Interesting. Thanks, Jan.

Looks like the Helix run has dumps for this failure, so I'll try to look when I can get access to a Windows machine.

As far as I can tell, this has only happened in this run (according to the AzDO history).

@VincentBu
Copy link
Contributor Author

Failed again in runtime-coreclr outerloop 20210628.5

Failed test:

CoreCLR windows x86 Checked no_tiered_compilation @ Windows.10.Amd64.Open

- tracing\\eventpipe\\pauseonstart\\pauseonstart\\pauseonstart.cmd

Error message:

Return code:      1
Raw output file:      C:\h\w\A1A108BA\w\B19109D8\uploads\Reports\tracing.eventpipe\pauseonstart\pauseonstart\pauseonstart.output.txt
Raw output:
BEGIN EXECUTION
"C:\h\w\A1A108BA\p\corerun.exe"  pauseonstart.dll
0.0s: ::== Running test: TEST_RuntimeResumesExecutionWithCommand
0.1s: Server name is 'DOTNET_TRACE_TESTS_sgyju0hw.jrx'
0.9s: running sub-process: C:\h\w\A1A108BA\p\corerun.exe C:\h\w\A1A108BA\w\B19109D8\e\tracing\eventpipe\pauseonstart\pauseonstart\pauseonstart.dll 0
1.7s: subprocess started: True
1.7s: subprocess PID: 3464
3.8s: { Magic=System.Byte[]; ClrInstanceId=5fcc55d8-e2c4-41a5-9a8d-05bcc370f502; ProcessId=3464; Unused=0; }
3.8s: Sent: { Header={ Magic=System.Byte[]; Size=20; CommandSet=4; CommandId=1; Reserved=0 }; }
3.8s: received: { Header={ Magic=System.Byte[]; Size=24; CommandSet=255; CommandId=0; Reserved=0 }; Payload=[ 0x00 0x00 0x00 0x00  ]}
3.8s: Sending 'exit' to subprocess stdin
4.8s: ================= Subprocess Exited =================
4.8s: SubProcess exited - Exit code: 0
4.8s: ----------------------------------------
4.8s: Subprocess stdout:
3.8s: Subprocess started!  Waiting for input...
3.8s: Received 'exit'.  Exiting...
4.8s: Subprocess stderr:
4.8s: ----------------------------------------
4.9s: Test passed: True
4.9s:
4.9s: ::== Running test: TEST_TracesHaveRelevantEvents
5.0s: Server name is 'DOTNET_TRACE_TESTS_wfcjbjtb.yk1'
5.0s: running sub-process: C:\h\w\A1A108BA\p\corerun.exe C:\h\w\A1A108BA\w\B19109D8\e\tracing\eventpipe\pauseonstart\pauseonstart\pauseonstart.dll 0
5.0s: subprocess started: True
5.0s: subprocess PID: 6184
5.0s: Standard Diagnostics Server connection not created yet -> try again in 100 ms
5.2s: { Magic=System.Byte[]; ClrInstanceId=2715c04d-d5a3-432b-9d96-65b5d521f8ef; ProcessId=6184; Unused=0; }
5.2s: Starting EventPipeSession over standard connection
5.3s: Started EventPipeSession over standard connection with session id: 0x3102390
5.4s: Send ResumeRuntime Diagnostics IPC Command
5.4s: Sent: { Header={ Magic=System.Byte[]; Size=20; CommandSet=4; CommandId=1; Reserved=0 }; }
5.4s: received: { Header={ Magic=System.Byte[]; Size=24; CommandSet=255; CommandId=0; Reserved=0 }; Payload=[ 0x00 0x00 0x00 0x00  ]}
7.4s: Stopping EventPipeSession over standard connection
8.4s: Stopped EventPipeSession over standard connection
8.4s: Sending 'exit' to subprocess stdin
8.4s: ================= Subprocess Exited =================
8.4s: SubProcess exited - Exit code: 0
8.4s: ----------------------------------------
8.4s: Subprocess stdout:
1.4s: Subprocess started!  Waiting for input...
3.4s: Received 'exit'.  Exiting...
8.4s: Subprocess stderr:
8.4s: ----------------------------------------
9.4s: isStartupEventPresent: True
9.4s: Test passed: True
9.4s:
9.4s: ::== Running test: TEST_MultipleSessionsCanBeStartedWhilepaused
9.4s: Server name is 'DOTNET_TRACE_TESTS_jbahk3ll.pki'
9.4s: running sub-process: C:\h\w\A1A108BA\p\corerun.exe C:\h\w\A1A108BA\w\B19109D8\e\tracing\eventpipe\pauseonstart\pauseonstart\pauseonstart.dll 0
9.4s: subprocess started: True
9.4s: subprocess PID: 7720
9.5s: Standard Diagnostics Server connection not created yet -> try again in 100 ms
9.7s: { Magic=System.Byte[]; ClrInstanceId=dc42f3f8-4df8-4b46-97ba-609e9227518a; ProcessId=7720; Unused=0; }
9.7s: Starting EventPipeSession over standard connection
9.7s: Started EventPipeSession over standard connection with session id: 0x3492128
9.7s: Starting EventPipeSession over standard connection
9.7s: Started EventPipeSession over standard connection with session id: 0x34922e0
9.7s: Starting EventPipeSession over standard connection
9.7s: Started EventPipeSession over standard connection with session id: 0x3563c88
9.7s: Send ResumeRuntime Diagnostics IPC Command
9.7s: Sent: { Header={ Magic=System.Byte[]; S


Stack trace
   at tracing_eventpipe._pauseonstart_pauseonstart_pauseonstart_._pauseonstart_pauseonstart_pauseonstart_cmd() in tracing.eventpipe.XUnitWrapper.dll:token 0x6000016+0x284

@noahfalk
Copy link
Member

I have this looping on my machine and maybe we'll get a repro. If not my plan is to add some extra instrumentation so we can catch the failure before it has deleted the linked list and then wait for CI to hit it again.

@hoyosjs
Copy link
Member

hoyosjs commented Jul 30, 2021

@noahfalk I looked over the past 30 days and found no such failing test

@noahfalk
Copy link
Member

Thanks for taking a look @hoyosjs! I've still got the test looping with no repro yet. I'm going to leave it running overnight and assuming it still hasn't reproed tomorrow I can PR in a bit of instrumentation + resolve the bug for now. We may never see it again, but if it does show up in the future we'll have a few more clues to work with.

noahfalk added a commit to noahfalk/runtime that referenced this issue Jul 30, 2021
We are seeing some likely memory corruption in
dotnet#54469
and these changes hopefully will help better
diagnose it
@noahfalk
Copy link
Member

My overnight run of this test in a loop did not repro the problem.
I've added some instrumentation in #56648. No idea if the issue will ever repro again in CI but if it does we'll learn a little more. Once that PR is merged I plan to close this issue and we can re-open if we ever get another repro.

@VincentBu
Copy link
Contributor Author

Failed again in runtime-coreclr jitstress-isas-x86 20210731.1

Failed test:

CoreCLR windows x86 Checked jitstress_isas_nohwintrinsic_nosimd @ Windows.10.Amd64.Open

- tracing\\eventpipe\\pauseonstart\\pauseonstart\\pauseonstart.cmd

Error message:

Return code:      1
Raw output file:      C:\h\w\ACF2094E\w\ABF20946\uploads\Reports\tracing.eventpipe\pauseonstart\pauseonstart\pauseonstart.output.txt
Raw output:
BEGIN EXECUTION
"C:\h\w\ACF2094E\p\corerun.exe"  pauseonstart.dll
0.0s: ::== Running test: TEST_RuntimeResumesExecutionWithCommand
0.1s: Server name is 'DOTNET_TRACE_TESTS_alv4kxir.dui'
1.3s: running sub-process: C:\h\w\ACF2094E\p\corerun.exe C:\h\w\ACF2094E\w\ABF20946\e\tracing\eventpipe\pauseonstart\pauseonstart\pauseonstart.dll 0
1.9s: subprocess started: True
1.9s: subprocess PID: 6644
4.0s: { Magic=System.Byte[]; ClrInstanceId=4fc16898-e08c-4a7d-9bfc-dbb45114af48; ProcessId=6644; Unused=0; }
4.0s: Sent: { Header={ Magic=System.Byte[]; Size=20; CommandSet=4; CommandId=1; Reserved=0 }; }
4.0s: received: { Header={ Magic=System.Byte[]; Size=24; CommandSet=255; CommandId=0; Reserved=0 }; Payload=[ 0x00 0x00 0x00 0x00  ]}
4.1s: Sending 'exit' to subprocess stdin
5.7s: ================= Subprocess Exited =================
5.7s: SubProcess exited - Exit code: 0
5.7s: ----------------------------------------
5.7s: Subprocess stdout:
4.3s: Subprocess started!  Waiting for input...
4.3s: Received 'exit'.  Exiting...
5.7s: Subprocess stderr:
5.7s: ----------------------------------------
5.8s: Test passed: True
5.8s:
5.8s: ::== Running test: TEST_TracesHaveRelevantEvents
5.9s: Server name is 'DOTNET_TRACE_TESTS_5iton4la.jud'
5.9s: running sub-process: C:\h\w\ACF2094E\p\corerun.exe C:\h\w\ACF2094E\w\ABF20946\e\tracing\eventpipe\pauseonstart\pauseonstart\pauseonstart.dll 0
5.9s: subprocess started: True
5.9s: subprocess PID: 6444
5.9s: Standard Diagnostics Server connection not created yet -> try again in 100 ms
6.1s: { Magic=System.Byte[]; ClrInstanceId=a3e805ea-d8a1-4b8c-830d-5e0516072196; ProcessId=6444; Unused=0; }
6.1s: Starting EventPipeSession over standard connection
6.3s: Started EventPipeSession over standard connection with session id: 0x31a1a08
6.4s: Send ResumeRuntime Diagnostics IPC Command
6.4s: Sent: { Header={ Magic=System.Byte[]; Size=20; CommandSet=4; CommandId=1; Reserved=0 }; }
6.4s: received: { Header={ Magic=System.Byte[]; Size=24; CommandSet=255; CommandId=0; Reserved=0 }; Payload=[ 0x00 0x00 0x00 0x00  ]}
8.4s: Stopping EventPipeSession over standard connection
14.7s: ================= Subprocess Exited =================
14.7s: - Exit code: -1073740286
14.7s: Subprocess stdout:
2.1s: Subprocess started!  Waiting for input...
14.7s: Subprocess stderr:
5.8s: Assert failure(PID 6444 [0x0000192c], Thread: 8884 [0x22b4]): Consistency check failed: AV in clr at this callstack:
5.8s: ------
5.8s: CORECLR! AllocMemTracker::~AllocMemTracker + 0xFE (0x70d7ca53)
5.8s: CORECLR! IBCLoggerAwareAllocMemTracker::~IBCLoggerAwareAllocMemTracker + 0xDC (0x70af08aa)
5.8s: CORECLR! ClassLoader::DoIncrementalLoad + 0x437 (0x708b6fca)
5.8s: CORECLR! ClassLoader::LoadTypeHandleForTypeKey_Body + 0x894 (0x708bfd1a)
5.8s: CORECLR! ClassLoader::LoadTypeHandleForTypeKey + 0x19A (0x708bef5e)
5.8s: CORECLR! ClassLoader::LoadTypeDefThrowing + 0x8A5 (0x708bebde)
5.8s: CORECLR! MemberLoader::GetMethodDescFromMethodDef + 0x199 (0x709892b6)
5.8s: CORECLR! ExternalMethodFixupWorker + 0x591 (0x709dd028)
5.8s: CORECLR! DelayLoad_MethodCall + 0x23 (0x70d65018)
5.8s: SYSTEM.PRIVATE.CORELIB! <no symbol> + 0x0 (0x7010e5c4)
5.8s: SYSTEM.PRIVATE.CORELIB! <no symbol> + 0x0 (0x7010dc4c)
5.8s: <no module>! <no symbol> + 0x0 (0x08ec659b)
5.8s: CORECLR! ep_rt_provider_invoke_callback + 0x97 (0x70e9c09e)
5.8s: CORECLR! provider_invoke_callback + 0x10C (0x70e9ffb1)
5.8s: CORECLR! disable_helper + 0xB5 (0x70e95dcc)
5.8s: CORECLR! ep_disable + 0x77 (0x70e9875a)
5.8s: CORECLR! eventpipe_protocol_helper_stop_tracing + 0x3B (0x70ea5739)
5.8s: CORECLR! ds_eventpipe_protocol_helper_handle_ipc_message + 0x40 (0x70ea1d54)
5.8s:


Stack trace
   at tracing_eventpipe._pauseonstart_pauseonstart_pauseonstart_._pauseonstart_pauseonstart_pauseonstart_cmd() in tracing.eventpipe.XUnitWrapper.dll:token 0x6000016+0x284

@noahfalk
Copy link
Member

noahfalk commented Aug 2, 2021

Link to dump
Link to binaries
The new failure looks similar to the previous one, pBlock = NULL unexpectedly. We got a full callstack from windbg though:

 # ChildEBP RetAddr  
00 0519e510 70d987af coreclr!TerminateOnAssert+0x17
01 0519e510 70d97de4 coreclr!_DbgBreakCheck+0x411
02 0519e580 70d9810c coreclr!_DbgBreakCheckNoThrow+0x51
03 0519e61c 70d7a3fd coreclr!DbgAssertDialog+0x20b
04 0519e6c0 7090519d coreclr!CHECK::Trigger+0x310
05 0519e88c 70904e18 coreclr!CLRVectoredExceptionHandlerPhase3+0x252
06 0519e904 70904dbe coreclr!CLRVectoredExceptionHandlerPhase2+0x4e
07 0519e9b8 709053e2 coreclr!CLRVectoredExceptionHandler+0x13c
08 0519ea1c 77abf94a coreclr!CLRVectoredExceptionHandlerShim+0x172
09 0519ea64 77a8011a ntdll!RtlpCallVectoredHandlers+0x39b99
0a (Inline) -------- ntdll!RtlCallVectoredExceptionHandlers+0x8
0b 0519eaf8 77a90dbf ntdll!RtlDispatchException+0x7c
0c 0519f034 70d7ca53 ntdll!KiUserExceptionDispatcher+0xf
0d 0519f034 70af08aa coreclr!AllocMemTracker::~AllocMemTracker+0xfe
0e 0519f0a8 708b6fca coreclr!IBCLoggerAwareAllocMemTracker::~IBCLoggerAwareAllocMemTracker+0xdc
0f 0519f1ac 708bfd1a coreclr!ClassLoader::DoIncrementalLoad+0x437
10 0519f318 708bef5e coreclr!ClassLoader::LoadTypeHandleForTypeKey_Body+0x894
11 0519f404 708bebde coreclr!ClassLoader::LoadTypeHandleForTypeKey+0x19a
12 0519f5d0 709892b6 coreclr!ClassLoader::LoadTypeDefThrowing+0x8a5
13 0519f68c 709dd028 coreclr!MemberLoader::GetMethodDescFromMethodDef+0x199
14 0519f84c 70d65018 coreclr!ExternalMethodFixupWorker+0x591
15 0519f878 7010e5c4 coreclr!DelayLoad_MethodCall+0x23
16 0519f8a8 7010dc4c System_Private_CoreLib!System.Diagnostics.Tracing.EventProvider.GetDataFromController(Int32, EVENT_FILTER_DESCRIPTOR*, System.Diagnostics.Tracing.ControllerCommand ByRef, Byte[] ByRef, Int32 ByRef)+0x84
17 0519f96c 08ec659b System_Private_CoreLib!System.Diagnostics.Tracing.EventProvider.EtwEnableCallBack(System.Guid ByRef, Int32, Byte, Int64, Int64, EVENT_FILTER_DESCRIPTOR*, Void*)+0x19c
18 0519f9b8 70e9c09e System_Private_CoreLib!ILStubClass.IL_STUB_ReversePInvoke(IntPtr, Int32, Byte, Int64, Int64, Int32, Int32)+0x53
19 0519fa48 70e9ffb1 coreclr!ep_rt_provider_invoke_callback+0x97
1a 0519fab4 70e95dcc coreclr!provider_invoke_callback+0x10c
1b 0519fb4c 70e9875a coreclr!disable_helper+0xb5
1c 0519fb60 70ea5739 coreclr!ep_disable+0x77
1d 0519fb7c 70ea1d54 coreclr!eventpipe_protocol_helper_stop_tracing+0x3b
1e 0519fb8c 70ea6cdd coreclr!ds_eventpipe_protocol_helper_handle_ipc_message+0x40
1f 0519fbc8 745762c4 coreclr!server_thread+0x17d
20 0519fbdc 77a81b69 kernel32!BaseThreadInitThunk+0x24
21 0519fc24 77a81b34 ntdll!__RtlUserThreadStart+0x2f
22 0519fc34 00000000 ntdll!_RtlUserThreadStart+0x1b

We are trying to do ExternalMethodFixup on System.Number.UInt32ToDecStr(UInt32), MethodDef=0x6001513

0:002> dx Debugger.Sessions[0].Processes[6444].Threads[8884].Stack.Frames[19].SwitchTo();dv /t /v
Debugger.Sessions[0].Processes[6444].Threads[8884].Stack.Frames[19].SwitchTo()
@edi              class Module * pModule = 0x02df4000
0519f698          unsigned int MethodDef = 0x6001513
...
0:002> !DumpModule 0x02df4000
Name: C:\h\w\ACF2094E\p\System.Private.CoreLib.dll
...
0:002> !Token2EE System.Private.CoreLib.dll 0x6001513
Module:      02df4000
Assembly:    System.Private.CoreLib.dll
Token:       06001513
MethodDesc:  08fa7904
Name:        System.Number.UInt32ToDecStr(UInt32)
Not JITTED yet. Use !bpmd -md 08FA7904 to break on run.

Loading System.Number, TypeDef 0x2000175

0:002> dx Debugger.Sessions[0].Processes[6444].Threads[8884].Stack.Frames[18].SwitchTo();dv /t /v
Debugger.Sessions[0].Processes[6444].Threads[8884].Stack.Frames[18].SwitchTo()
@edi              class Module * pModule = 0x02df4000
0519f5e0          unsigned int typeDef = 0x2000175
...
0:002> !Token2EE System.Private.CoreLib.dll 0x2000175
Module:      02df4000
Assembly:    System.Private.CoreLib.dll
Token:       02000175
MethodTable: 08fa8714
EEClass:     08fb0710
Name:        System.Number

The instrumentation is still in PR so unfortunately other than another data point for what type is being loaded we don't have any more info on the corrupted memory.

noahfalk added a commit that referenced this issue Aug 2, 2021
We are seeing some likely memory corruption in
#54469
and these changes hopefully will help better
diagnose it
@noahfalk
Copy link
Member

noahfalk commented Aug 2, 2021

I went ahead and merged #56648 so that we'll get more clues the next time this fails. Given that we got a repro recently maybe we'll get another so leaving the issue open a few days. If we don't see it soon then I'll close and wait for another hit because there is little we can do until then.

@janvorli
Copy link
Member

janvorli commented Aug 2, 2021

I've looked at the core dump. It looks like a memory corruption of the memory that the m_pFirstBlock points to. It points to something that seems to be a valid heap allocated block, but its contents doesn't make sense for the AllocMemTrackerBlock. Looking at the memory around that structure, there is an interesting pattern - it is filled with 32 bit numbers incremented by one:

032694e0 000002d4 000002d5 000002d6 000002d7 000002d8 000002d9 000002da 000002db 000002dc
03269504 000002dd 000002de 000002df 000002e0 000002e1 000002e2 08f8e574 709f8706 88000c40

The m_pFirstBlock points to 0x32694e8

I wonder if those values might ring a bell..

@josalem
Copy link
Contributor

josalem commented Aug 2, 2021

Based on the beginning of that call stack:

16 0519f8a8 7010dc4c System_Private_CoreLib!System.Diagnostics.Tracing.EventProvider.GetDataFromController(Int32, EVENT_FILTER_DESCRIPTOR*, System.Diagnostics.Tracing.ControllerCommand ByRef, Byte[] ByRef, Int32 ByRef)+0x84
17 0519f96c 08ec659b System_Private_CoreLib!System.Diagnostics.Tracing.EventProvider.EtwEnableCallBack(System.Guid ByRef, Int32, Byte, Int64, Int64, EVENT_FILTER_DESCRIPTOR*, Void*)+0x19c
18 0519f9b8 70e9c09e System_Private_CoreLib!ILStubClass.IL_STUB_ReversePInvoke(IntPtr, Int32, Byte, Int64, Int64, Int32, Int32)+0x53
19 0519fa48 70e9ffb1 coreclr!ep_rt_provider_invoke_callback+0x97
1a 0519fab4 70e95dcc coreclr!provider_invoke_callback+0x10c
1b 0519fb4c 70e9875a coreclr!disable_helper+0xb5
1c 0519fb60 70ea5739 coreclr!ep_disable+0x77
1d 0519fb7c 70ea1d54 coreclr!eventpipe_protocol_helper_stop_tracing+0x3b
1e 0519fb8c 70ea6cdd coreclr!ds_eventpipe_protocol_helper_handle_ipc_message+0x40
1f 0519fbc8 745762c4 coreclr!server_thread+0x17d
20 0519fbdc 77a81b69 kernel32!BaseThreadInitThunk+0x24
21 0519fc24 77a81b34 ntdll!__RtlUserThreadStart+0x2f
22 0519fc34 00000000 ntdll!_RtlUserThreadStart+0x1b

The disable for this test shouldn't have any filter_data, but is it possible this some heap corruption being caused by the issue I fixed in #56104? It seems unlikely.

Odd to find an array of sequential numbers like that in memory. They don't look familiar. There does appear to a memory address at 0x 03269504+0xc.

@hoyosjs
Copy link
Member

hoyosjs commented Aug 2, 2021

Sigh. My query was wrong (odd parsing of the names of coreclr wrappers).
Query

TestResults 
| where Result == "Fail" and Type contains "pauseonstart"
| join (WorkItems | where Queued > ago(30d)) on JobName, $left.WorkItemName == $right.Name
| join Jobs on $left.JobName == $right.Name

@hoyosjs
Copy link
Member

hoyosjs commented Aug 2, 2021

Looks line no tiered compilation Winx86 and linux x64 are most likely to repro.

@noahfalk noahfalk removed their assignment Aug 6, 2021
@noahfalk
Copy link
Member

noahfalk commented Aug 6, 2021

I'll be out on vacation so reassigning to @josalem

@josalem
Copy link
Contributor

josalem commented Aug 6, 2021

Looking at @hoyosjs's query, we haven't seen a failure of this test since 8/3 in a win-x64 R2R test leg. That seems to indicate that this isn't related to #56104, though, which went in ~10 days ago.

@hoyosjs
Copy link
Member

hoyosjs commented Aug 10, 2021

The existing failures are R2R infra and android installation issues. Closing until new occurrences of this resurface.

@hoyosjs hoyosjs closed this as completed Aug 10, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Sep 9, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants