Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix DiagnosticSource to work with NativeAOT #76109

Merged
merged 7 commits into from
Sep 28, 2022

Conversation

eerhardt
Copy link
Member

@eerhardt eerhardt commented Sep 23, 2022

There were 2 problems:

  1. The use of MakeGenericType doesn't work when a property is a ValueType. An app will crash when a listener is enabled and DiagnosticSourceEventSource tries writing values.
  2. The properties on KeyValuePair were not being preserved correctly, so the Arguments of the DiagnosticSourceEventSource methods were not being serialized correctly.

Add test (and infrastructure) to ensure DiagnosticSource works in a NativeAOT app

Fix #75945

Remaining Tasks:

There were 2 problems:

1. The use of MakeGenericType doesn't work when a property is a ValueType.
An app will crash when a listener is enabled and DiagnosticSourceEventSource tries
writing values.
2. The properties on KeyValuePair were not being preserved correctly, so the Arguments
of the DiagnosticSourceEventSource methods were not being serialized correctly.

Add test (and infrastructure) to ensure DiagnosticSource works in a NativeAOT app

Fix dotnet#75945
@dotnet-issue-labeler
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@ghost
Copy link

ghost commented Sep 23, 2022

Tagging subscribers to this area: @tarekgh, @tommcdon, @pjanotti
See info in area-owners.md if you want to be subscribed.

Issue Details

There were 2 problems:

  1. The use of MakeGenericType doesn't work when a property is a ValueType. An app will crash when a listener is enabled and DiagnosticSourceEventSource tries writing values.
  2. The properties on KeyValuePair were not being preserved correctly, so the Arguments of the DiagnosticSourceEventSource methods were not being serialized correctly.

Add test (and infrastructure) to ensure DiagnosticSource works in a NativeAOT app

Fix #75945

Remaining Tasks:

Author: eerhardt
Assignees: -
Labels:

area-System.Diagnostics.Tracing, new-api-needs-documentation

Milestone: -

Copy link
Member

@tarekgh tarekgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a question and test suggestion. LGTM!

@MichalStrehovsky
Copy link
Member

Cc @LakshanF

Copy link
Member

@MichalStrehovsky MichalStrehovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@MichalStrehovsky
Copy link
Member

@MichalStrehovsky @joperezr - thoughts on which leg to hook them up to?

Maybe we could hook it up here:

https://github.com/dotnet/runtime/blob/main/eng/pipelines/coreclr/nativeaot-post-build-steps.yml

{
Type elemType = enumerableOfTType.GetGenericArguments()[0];
#if NETCOREAPP
if (!RuntimeFeature.IsDynamicCodeSupported && elemType.IsValueType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better to have only RuntimeFeature.IsDynamicCodeSupported check and assert on elemType.IsValueType ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe there is a valid assert we could put on elemType.IsValueType. It is valid for it to be either true or false. (Note my test that I added is IEnumerable<int>.)

The reason I'm checking for IsValueType is so the behavior/implementation remains the same for IEnumerable<RefType> - since it is possible to still call MakeGenericType if the Type being passed is a ref type.

Comment on lines 16 to 17
<!-- EventSourceSupport is disabled by default with PublishAot=true, so need to enable it here. -->
<EventSourceSupport>true</EventSourceSupport>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MichalStrehovsky @jkotas - with this change we now have tests ensuring EventSource works (at least in proc) with PublishAot. Do you think in .NET 8 we should re-enable EventSourceSupport=true by default? Or do you think we should keep it disabled by default going forward (for size savings?)

Copy link
Member

@jkotas jkotas Sep 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would wait for feedback on the current .NET 7 default before doing more changes.

I think it makes sense to leave it off by default and turn it on only for server app-models in their SDKs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with Jan. I kind of like the size savings. I would like to see feedback. Re-enabling it is an instant 30% size regression from 7.0 for hello world.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, how big (in bytes) is 30%? I wouldn't have guessed the savings were such a high percentage but I don't know if that is because I overestimated the size of HelloWorld or underestimated the size of EventSource.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have a log and I'm hoping to do some offcycle work to visualize/diff them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've love to see the results of this. Agreed with @noahfalk that this is more than I would have expected. It feels important to describe the diagnostic expectations for native AOT in .NET 8 so that we can determine how this decision fits in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a similar difference for PublishTrimmed as well - the size of managed code goes from 2,718,848 bytes to 3,003,008. So the cost is about 300 kB in IL terms. (The repro steps for both is basically just run dotnet publish -r win-x64 -c Release with -p:PublishTrimmed=true for trimming or -p:PublishAot=true for NativeAOT. Use -p:EventSourceSupport=false to enable/disable EventSource.)

For the NativeAOT case a quick breakdown is following.

Size of code per namespace with EventSource enabled

Namespace Size of code in bytes
System 414,414
System.Collections.Generic 246,832
System.Globalization 128,834
Internal.Runtime.TypeLoader 128,583
System.Diagnostics.Tracing 116,975
System.Text 92,958
Internal.TypeSystem 85,341
System.Reflection 40,905
System.Reflection.Runtime.TypeInfos 36,802
System.Collections.Concurrent 34,363
System.Reflection.Runtime.General 31,092
System.Numerics 30,384
System.Threading 27,273
Internal.Reflection.Execution 27,180
System.IO 26,659
System.Resources 25,769
System.Reflection.Runtime.MethodInfos 22,513
System.Runtime 21,400
System.Runtime.CompilerServices 20,287
System.Buffers 20,055

Size of code per namespace with EventSource disabled

Namespace Size of code in bytes
System 306,356
System.Collections.Generic 156,429
Internal.Runtime.TypeLoader 127,630
System.Globalization 107,047
System.Text 86,065
Internal.TypeSystem 85,341
System.Reflection 36,931
System.Reflection.Runtime.TypeInfos 35,576
System.Collections.Concurrent 34,363
System.Numerics 30,384
System.Reflection.Runtime.General 27,776
System.IO 26,299
Internal.Reflection.Execution 25,920
System.Resources 25,445
System.Runtime 21,336

(This is just namespaces with more than 20 kB of code)

Most of the cost is in transitive dependencies. EventSource has a big transitive closure and depends on a lot of framework code.

I can file an issue if there's interest in reducing the transtive closure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the cost is in transitive dependencies. EventSource has a big transitive closure and depends on a lot of framework code.

This tells me that this is the upper bound of the size savings for having EventSource disabled. In a real world app that uses these transitive dependencies, they won't be able to be trimmed. So the size savings of disabling EventSource will be less. For example, 90 kB of size savings is coming from System.Collections.Generic which many apps will use. When the app uses those collections, they won't be able to be trimmed anymore, even with EventSource disabled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the size savings of disabling EventSource will be less. For example, 90 kB of size savings is coming from System.Collections.Generic which many apps will use. When the app uses those collections, they won't be able to be trimmed anymore, even with EventSource disabled

That depends - when thinking about native code, one needs to think about instantiated generic code. For example the costs associated with System.Collections.Generic.Dictionary2<System.Type,System.Diagnostics.Tracing.TraceLoggingTypeInfo>` are exclusive to EventSource not matter what the app does.

It's easy to test out theories - just pick a bigger app and see the impact of EventSource. For apps the size of current ASP.NET, an extra 600 kB overhead is negligible either way,

- Only run them in Release configuration
- Suppress IL2026 warning
Copy link
Member

@noahfalk noahfalk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @eerhardt!

@eerhardt
Copy link
Member Author

@MichalStrehovsky @jkotas - are the test failures an existing issue? I tried searching and I can't find any open issues for them.

runtime (Build Linux arm64 Release NativeAOT) failures are all System.IO.FileSystem.Tests, which just got enabled in #76146 2 days ago.

System.PlatformNotSupportedException : Operation is not supported on this platform.
   at System.IO.Tests.FileSystemTest.mkfifo(String path, Int32 mode) + 0x28
   at System.IO.Tests.File_ExistsAsDirectory.FalseForNonRegularFile() + 0x40
   at System.IO.FileSystem!<BaseAddress>+0x11c8aa4
   at System.Reflection.DynamicInvokeInfo.Invoke(Object, IntPtr, Object[], BinderBundle, Boolean) + 0xd4

runtime (Build windows arm64 Release NativeAOT) failures seem transient: The process cannot access the file 'D:\a\_work\1\s\artifacts\helix\tests\windows.AnyCPU.Release\System.Runtime.Tests.zip' because it is being used by another process.

@jkotas
Copy link
Member

jkotas commented Sep 27, 2022

I have not seen failures like this.

"The process cannot access the file 'D:\a_work\1\s\artifacts\helix\tests\windows.AnyCPU.Release\System.Runtime.Tests.zip' because it is being used by another process." suggests that there is a build race condition.

Also, one of the test failures is:
System.DllNotFoundException : Unable to load shared library 'kernel32.dll' or one of its dependencies. In order to help diagnose loading problems, consider using a tool like strace.. It looks like a build race condition that mixes of Windows and Linux bits.`

Any chance that the changes in the test infrastructure you have made introduced this build race condition?

@eerhardt
Copy link
Member Author

Ah, I see it now. The MSBuild property I added "TestNativeAot" conflicts with an existing MSBuild property with the same name, but to mean different things. I'll rename my new one.

Also, one of the test failures is:
System.DllNotFoundException : Unable to load shared library 'kernel32.dll' or one of its dependencies. In order to help diagnose loading problems, consider using a tool like strace..

Where did you see that error? I hadn't seen that one.

Copy link
Member

@MichalStrehovsky MichalStrehovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great otherwise, thanks!

Set EventSourceSupport only on the projects that need it.
@eerhardt eerhardt merged commit 70b33f8 into dotnet:main Sep 28, 2022
@eerhardt eerhardt deleted the FixDiagnosticSourceNativeAot-main branch September 28, 2022 21:30
@eerhardt
Copy link
Member Author

@noahfalk @tommcdon @MichalStrehovsky @jkotas - any thoughts on backporting this to net7? We can leave EventSourceSupport off by default. But if someone turns it on, any app that sends an Http request would be broken when a trace was running.

@MichalStrehovsky
Copy link
Member

The failure mode is pretty bad, but it's mitigated by the fact that it's off by default. Someone needs to opt in and presumably would run into it immediately after opting in. The current theory is that opting into it wouldn't be a mainstream scenario for the appmodels we currently support with Native AOT.

Since it's not a mainstream scenario, I don't know if that meets the bar. It will more likely meet the bar if we have a customer report and then we can service it.

@ghost ghost locked as resolved and limited conversation to collaborators Nov 4, 2022
@MichalStrehovsky
Copy link
Member

/backport to release/7.0

@akoeplinger
Copy link
Member

@MichalStrehovsky can you try the backport again?

@MichalStrehovsky
Copy link
Member

/backport to release/7.0

@github-actions github-actions bot unlocked this conversation Nov 18, 2022
@github-actions
Copy link
Contributor

Started backporting to release/7.0: https://github.com/dotnet/runtime/actions/runs/3493408951

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[NativeAOT] IndexOutOfRangeException when running under perfview
9 participants