Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EventListenerThreadPool test failing in nativeaot outerloop runs #105556

Closed
MichalStrehovsky opened this issue Jul 26, 2024 · 6 comments · Fixed by #106641
Closed

EventListenerThreadPool test failing in nativeaot outerloop runs #105556

MichalStrehovsky opened this issue Jul 26, 2024 · 6 comments · Fixed by #106641
Assignees
Labels
area-System.Threading disabled-test The test is disabled in source code against the issue in-pr There is an active PR which will close this issue when it is merged
Milestone

Comments

@MichalStrehovsky
Copy link
Member

BEGIN EXECUTION
/datadisks/disk1/work/A6FD09A7/p/nativeaottest.sh /datadisks/disk1/work/A6FD09A7/w/A12B08E5/e/tracing/eventlistener/EventListenerThreadPool/ EventListenerThreadPool.dll ''
Test Failed: Did not see all of the expected events.
ThreadPoolIOPack: 0
ThreadPoolIOEnqueue: 0
ThreadPoolIODequeue: 0
Xunit.Sdk.EqualException: Assert.Equal() Failure: Values differ
Expected: 100
Actual:   -1
   at Xunit.Assert.Equal[T](T, T, IEqualityComparer`1) + 0x1a1
   at __GeneratedMainWrapper.Main() + 0x38
Expected: 100
Actual: 101
END EXECUTION - FAILED

This is consistently failing in the Pri0 outerloop runs on all architectures.

Last good run: https://dev.azure.com/dnceng-public/public/_build/results?buildId=745467&view=results
First bad run: https://dev.azure.com/dnceng-public/public/_build/results?buildId=746036&view=results

From this I suspected #103675 and a revert PR at #105543 confirmed the suspicion (notice Pri0 legs are green in the revert PR). Note that PR reverts two commits because there was a merge conflict I didn't bother resolving, but the other commit is from yesterday and this is failing for a week.

Cc @eduardo-vp @kouvel

Copy link
Contributor

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

@MichalStrehovsky
Copy link
Member Author

Disabling the test in #105557 to get clean(er) outerloops again.

@MichalStrehovsky MichalStrehovsky added disabled-test The test is disabled in source code against the issue area-System.Threading and removed area-NativeAOT-coreclr labels Jul 26, 2024
Copy link
Contributor

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

@davmason
Copy link
Member

I helped @eduardo-vp looked in to this failure, it doesn't seem like it is specific to the event he added.

@LakshanF is this scenario (using an EventListener to listen to native runtime events) expected to work under NativeAOT?

When I debugged the failing test under NativeAOT I saw it fail in this callstack:

	EventListenerThreadPool.exe!S_P_CoreLib_System_Diagnostics_Tracing_EventPipePayloadDecoder__DecodePayload() Line 18	Unknown
 	EventListenerThreadPool.exe!S_P_CoreLib_System_Diagnostics_Tracing_NativeRuntimeEventSource__ProcessEvent() Line 60	Unknown
 	EventListenerThreadPool.exe!S_P_CoreLib_System_Diagnostics_Tracing_EventPipeEventDispatcher__DispatchEventsToEventListeners() Line 169	Unknown
 	EventListenerThreadPool.exe!S_P_CoreLib_System_Diagnostics_Tracing_EventPipeEventDispatcher___c__DisplayClass12_0___StartDispatchTask_b__0() Line 139	Unknown
 	EventListenerThreadPool.exe!S_P_CoreLib_System_Threading_ExecutionContext__RunInternal() Line 179	Unknown
 	EventListenerThreadPool.exe!S_P_CoreLib_System_Threading_Tasks_Task__ExecuteWithThreadLocal() Line 2342	Unknown
 	EventListenerThreadPool.exe!S_P_CoreLib_System_Threading_Thread__StartThread() Line 446	Unknown
 	EventListenerThreadPool.exe!S_P_CoreLib_System_Threading_Thread__ThreadEntryPoint() Line 226	Unknown
 	kernel32.dll!BaseThreadInitThunk()	Unknown
 	ntdll.dll!00007ffc2e58af28()	Unknown

The failure is because m_eventData is not populated for NativeRuntimeEventSource and we use m_eventData as a template for how to parse the appropriate events coming from the runtime.

I was able to debug through and see the code for creating the metadata seems to be running as expected, except we don't actually have the metadata in memory after. The code in question is EventSource.CreateManifestAndDescriptors. We call AddEventDescriptor for the events here, but the change doesn't seem to actually happen to the original array after returning, even though it is passed by ref.

@MichalStrehovsky are you able to help determine if this is a bad codegen issue or some other? Maybe the debugger is not showing the right memory?

@MichalStrehovsky
Copy link
Member Author

Since this reproes on all architectures and all OSes, I'd consider a codegen issue less likely.

We run many of the tracing tests in the src/tests/tracing suite so my assumption is that this works in general but then again I would not be able to tell a difference between an event source and a metric, so take it with a grain of salt. @LakshanF knows most of native AOT specific aspects of this and any known issues.

jkotas added a commit to jkotas/runtime that referenced this issue Aug 19, 2024
This is necessary to allow consumers to parse the payload.

Delete unnecessary runtimeflavor argument of genRuntimeEventSources.py script.

Fixes dotnet#105556
@dotnet-policy-service dotnet-policy-service bot added the in-pr There is an active PR which will close this issue when it is merged label Aug 19, 2024
@jkotas
Copy link
Member

jkotas commented Aug 19, 2024

I was able to debug through and see the code for creating the metadata seems to be running as expected

The code is not creating the metadata as expected.

The problem is that we are missing the metadata for the native-only events in NativeAOT CoreLib. #106641 has the fix.

@jkotas jkotas closed this as completed in 094649e Aug 20, 2024
github-actions bot pushed a commit that referenced this issue Aug 20, 2024
This is necessary to allow consumers to parse the payload.

Delete unnecessary runtimeflavor argument of genRuntimeEventSources.py script.

Fixes #105556
jkotas pushed a commit that referenced this issue Aug 22, 2024
…Lib (#106713)

* Add NativeRuntimeEventSource metadata to NativeAot CoreLib

This is necessary to allow consumers to parse the payload.

Delete unnecessary runtimeflavor argument of genRuntimeEventSources.py script.

Fixes #105556

* Exclude unused keywords and events
@github-actions github-actions bot locked and limited conversation to collaborators Sep 20, 2024
mikelle-rogers pushed a commit to mikelle-rogers/runtime that referenced this issue Dec 10, 2024
…6641)

* Add NativeRuntimeEventSource metadata to NativeAot CoreLib

This is necessary to allow consumers to parse the payload.

Delete unnecessary runtimeflavor argument of genRuntimeEventSources.py script.

Fixes dotnet#105556

* Exclude unused keywords and events
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Threading disabled-test The test is disabled in source code against the issue in-pr There is an active PR which will close this issue when it is merged
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants