-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add additional filter capabilities to dotnet-pgo tool. #89853
Add additional filter capabilities to dotnet-pgo tool. #89853
Conversation
Mono adopted PGO in .net7 as a replacement for a solution called profiled AOT in mono/mono used by Android SDK. In the replaced solution there was a concept of a stop trigger, meaning that the user could collect methods up to a stop trigger (a method name) meaning that the profiled AOT image would only include methods up to that point. When using EventPipe and nettrace there is limited ability to get the same fine grained control over what methods that ends up in the nettrace file. dotnet-monitor includes ways to stop tracing if it hits for example a specific method, but due to the nature of EventPipe, there could still be additional methods added to the trace when closing session. dotnet-trace currently don't offer any ability to do something similar, but if implemented, it would probably come with the same limitations as dotnet-monitor. Adding better filter capabilities to dotnet-pgo would add additional capabilities, giving users more control on what methods that gets included into the generated mibc file. That would give Mono's profiled AOT better control to include methods up to a stop trigger. dotnet-pgo alread had capabilities to include events based on timestamp interval. This commit extends that to select the lower/upper timestamp based on a regular expression matching methods. This commit also adds capabilities to add a method include/exclude filters using regular expression, giving users fine grained control on what methods to include/exclude in the generated mibc file. The additional filter capabilities could be used by Android SDK to for example create a mibc file including all methods up to Main, replacing the stop trigger features used in old profiled AOT solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, I would expect that users should emit their own events that they can use to correlate when their scenarios start and stop running. This is for instance the approach that Benchmark.NET uses (via its WorkloadActual/Start
and WorkloadActual/Stop
events).
Looking for when a method starts jitting is not going to be a reliable way to correlate it with the execution of that method (at least for coreclr), though it might work in some cases.
I'm likewise not sure that it makes sense to use GetR2REntryPoint
events to make this correlation. Does it? Is this event lazily fired the first time a R2R method starts executing?
Just to be clear: I'm not against taking the changes if it addresses a Mono scenario, but in that case we need to document some of the limitations of these command-line options. I would like to get @davidwrighton's opinion on it.
I don't see that there are any limitations in the command line options as is, they are just giving the user additional capabilities to filter the methods that ends up in the generated mibc file, extending the filter capabilities that we already have in For solutions like Mono's profiled AOT, they use dotnet-trace (or dotnet-monitor) to collect a nettrace file -> dotnet-pgo to create the mibc file/files -> Mono AOT compiler to AOT compile all methods included in the mibc files. Since all is using generic tooling, doing custom events as part of that solution would mean injecting custom tooling into the toolchain, but the profile AOT scenario is just about what methods that has been jitted/loaded by the runtime up to some selected point in time events already existing and handled by the pgo toolchain. But again, the command line arguments implemented in this PR is not related to any specific solutions or scenarios, just extending the filter capabilities of the dotnet-pgo tooling. The Mono solution we are replacing is https://github.com/mono/mono/blob/main/mono/profiler/aot.c and the Android SDK team is using their own variant of it here, https://github.com/jonathanpeppers/android-profiled-aot. We can replace all parts of that solution with generic net-trace, dotnet-pgo (with extended filtering in this PR) and Mono AOT compiler accepting mibc files as profiles (implemented in .net7) making it into a generic Mono profiled AOT solution build on generic .net runtime and diagnostics tooling without any need for custom tooling. |
What I mean by limitations is that
PerfView has a powerful mechanism to describe custom events to indicate starting/stopping points (search for I'm still a bit confused about |
No, but didn't see any reason to exclude them from a generic filter implementation. |
These are tools in the toolchain are automated, generating a complete different artifact as result, converting it into a mibc file passed to aot compiler. The toolchain is cross platform and can't rely on Windows only UI tooling. |
We could rename them to be more clear and only apply that filter mechanism to JitStart event if that makes things simpler/clearer. The --include-methods, --exclude-method on the other hand are straightforward and could apply to bith JIT and R2R methods. |
I'm not sure the way these events are fired correlates with any form of ordering -- coreclr may simply fire the event for all methods in a R2R image as soon as it is loaded. I need to double check.
I didn't mean to switch the tooling to use PerfView. I meant we could introduce a similar powerful mechanism in dotnet-pgo, allowing a flexible syntax to specify the start/stopping points that would support custom events fired by the customer. But I understand if that's a bit more work than you'd like on your plate. :-)
That sounds good to me. I would be fine with the changes if we do that. |
I see, we could potentially do that long term, we also have plans to include similar powerful features into dotnet-trace directly, that would be more inline with the perfview feature. |
Jupp, these events can fire in different order based on multithreading, what gets called, on Mono we will fire them when we JIT methods or load from AOT image, and that happens when we are about to execute a specific method. |
OK, great, lets do it like that then, I will rename the before/after to include jitting and only apply it to jitted events. The more generic include/exclude methods can be applied to both jitted and R2R methods. |
@jakobbotsch So to summarize what we discussed and actions to get this PR merged:
Sounds good? |
Sounds great to me. Thanks! |
@jakobbotsch, PR has been updated. |
|
||
if (includeMethods != null || excludeMethods != null) | ||
{ | ||
string methodAsString = method.ToString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This string is generally meant for debugging, it goes through DebugNameFormatter
which will produce method names that users may not be familiar with (e.g. I believe types will be module prefixed in the format "[module name]type name"). Would it be better to introduce a custom formatter, reuse the TypeString
formatter, or to reuse the methodNameFromEventDirectly
that we are already using to print the warnings above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there expectation for users to be more specific than to use a regular expression for specifying which method to exclude events after? I was trying to get a local test loop going on Android to see the structural differences between a MethodDesc.ToString()
and the methodNameFromEventDirectly
but its taking me some time to get it working locally.
I can switch it to methodNameFromEventDirectly
as that should still be able to match a regex provided by the user. Is it usually easier to match the methodNameFromEventDirectly
form than to match the MethodDesc.ToString()
form even though it is passed through the DebugNameFormatter
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that the way DebugNameFormatter
formats names is an internal detail of the managed type system, so I do not think user-facing tools should be using this for any purpose but diagnostics. We do not want situations where we change this detail in the future and break something that took a dependency on the exact way that namespaces or class names are formatted by this formatter.
I looked again and realize methodNameFromEventDirectly
is not actually a field that comes directly from the event, but a concatenation of various fields. I think a simpler concatenation like e.MethodNamespace + "." + e.MethodName
would be better to do the matching against. I think MethodNamespace
includes the class name, but you may want to double check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this also suggests that e.MethodNamespace
will also contain the class name https://github.com/microsoft/perfview/blob/766a08ca8373d273d55da8b07147e7068e4d5f58/src/TraceEvent/Samples/21_ObserveJitEvents.cs#L126-L132. (haven't found the source code for such fields/classes yet)
Maybe methodNameFromEventDirectly
would allow for more flexibility as it will allow users to also specify between different overloaded forms of the same method? It would also more closely align with a GetMethodName
in Perfview https://github.com/microsoft/perfview/blob/766a08ca8373d273d55da8b07147e7068e4d5f58/src/TraceEvent/Computers/TraceManagedProcess.cs#L4170
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that idea, but then we should match the PerfView name exactly, methodNameFromEventDirectly
does not match that today. We should also update documentation of the CLI option to refer to the fact that matching works against names that are formatted the same as PerfView (I would also add a note that this includes parameters, so that users can realize they cannot use e.g. *MyClass*
to match all methods defined on that class). Do you know where exactly this name is displayed in the PerfView UI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too familiar with the user scenarios, so I dont know if it is likely that one would want to only include methods with a certain set of parameters and not include the others. Maybe we can start out with having the most customizable option first (allowing users to include/exclude particular overloaded methods) and later if we know users will never want to do that we can pursue the simpler format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also noticing that the excludeEventsBeforeJittingMethod
and excludeEventsAfterJittingMethod
are based off of method.ToString()
. It would probably be better to maintain parity for those as well. And on more deliberation and offline discussion, maybe keeping the simpler format is better for now. Also, do you happen to know if MethodDesc.ToString()
will contain the parameter information or not @jakobbotsch ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it looks like it does
sb.Append('('); | |
try | |
{ | |
sb.Append(Signature.ToString(includeReturnType: false)); | |
} | |
catch | |
{ | |
sb.Append("Unknown"); | |
} | |
sb.Append(')'); |
Maybe it would be better to stick with @lateralusX's original intent of the regex matching.
I'm not very familiar with these types, would e.MethodNamespace + "." + e.MethodName + paramsArgs
be the same values that are held by the resulting MethodDesc
discovered by idParser.ResolveMethoID(e.MethodID...)
? If not, then it might not be the right "substitution"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also noticing that the
excludeEventsBeforeJittingMethod
andexcludeEventsAfterJittingMethod
are based off ofmethod.ToString()
. It would probably be better to maintain parity for those as well.
Yes, the code above should be changed in the same way.
We should not use MethodDesc.ToString()
for anything except diagnostics when it is user facing like this. It should have a dedicated formatter then if we do not want to use the name from the event. I think you can base it on TypeString, except without including instantiations.
I'm not very familiar with these types, would
e.MethodNamespace + "." + e.MethodName + paramsArgs
be the same values that are held by the resultingMethodDesc
discovered byidParser.ResolveMethoID(e.MethodID...)
? If not, then it might not be the right "substitution"
Yes, they represent the same thing. dotnet-pgo has a managed implementation of the type system that replicates the native implementation of the type system that produced the strings in the event. Classes and methods are loaded into the type system based on events in the trace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use e.MethodNamespace + "." + e.MethodName + paramsArgs
as the method's name for regex matching, formatted after the method name in PerfView.
Also updated the documentation in the options' descriptions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally looks ok to me. Do you want this in .NET 8, or can it wait for .NET 9?
@jakobbotsch we want to get this into .NET 8. @lateralusX is out, so I can help push this through after addressing your feedback if it looks mostly complete. |
Sounds good to me. |
if (e.TimeStampRelativeMSec > excludeEventsBefore && excludeEventsBeforeJittingMethod != null && excludeEventsBeforeJittingMethod.IsMatch(perfviewMethodName)) | ||
{ | ||
excludeEventsBefore = e.TimeStampRelativeMSec; | ||
} | ||
|
||
if (e.TimeStampRelativeMSec < excludeEventsAfter && excludeEventsAfterJittingMethod != null && excludeEventsAfterJittingMethod.IsMatch(perfviewMethodName)) | ||
{ | ||
excludeEventsAfter = e.TimeStampRelativeMSec; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the behavior here on multiple matches the right one? In particular excluding events before the last match instead of before the first match seems odd to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RIght, it would be problematic especially if the method matching had a timestamp order of beforeMethodMatch1
< afterMethodMatch
< beforeMethodMatch2
. And worst case, if the two matches wer the same, it would be impossible for the user to have a specific enough regex to only match the first (temporally) method.
Are methods in the enumerator of p.EventsInProcess.ByEventType<MethodJittingStartedTraceData>()
guaranteed to be ordered temporally?
Maybe it would be better to keep a list of the matches and choose the first temporal match for excludeEventsBeforeJittingMethod
and the last temporal match for excludeEventsAfterJittingMethod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are methods in the enumerator of p.EventsInProcess.ByEventType() guaranteed to be ordered temporally?
Yes, so we can just have a bool or something to keep track of whether we have already assigned excludeEventsBefore
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And for excludeEventsAfterJittingMethod
we would need to still keep a list of the matches for a scenario like afterMethodMatch1
< beforeMethodMatch
< afterMethodMatch2
in case the user wanted to use the bounds [beforeMethodMatch, afterMethodMatch2]
.
Just realizing, if these events are for jitting methods, is it the case that we would only JIT a particular method once? Meaning no two events will have the same name? Not sure exactly how these events are collected. If thats the case, then maybe we can have it such that the user needs to be more specific with their Regex matching if there happens to be multiple matches?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to trace an app and sent over the .nettrace to my windows machine. It looks like caching does happen and a method will only be found once in MethodJittingStartedTraceData
? So it seems like users can be more specific with their regular expression matching, but will opt to take the first "before" match and the last "after" match to avoid invalid bounds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And for
excludeEventsAfterJittingMethod
we would need to still keep a list of the matches for a scenario likeafterMethodMatch1
<beforeMethodMatch
<afterMethodMatch2
in case the user wanted to use the bounds[beforeMethodMatch, afterMethodMatch2]
.
I don't think this is a scenario we need or should to try to handle. As long as we give a warning or error (which I believe we already do) the user can fix the regex to be specific enough. I would just make sure we pick the largest range that corresponds to the matches.
Just realizing, if these events are for jitting methods, is it the case that we would only JIT a particular method once? Meaning no two events will have the same name? Not sure exactly how these events are collected. If thats the case, then maybe we can have it such that the user needs to be more specific with their Regex matching if there happens to be multiple matches?
coreclr will JIT the same method multiple times at different optimization levels when tiered compilation is enabled, so there can be multiple "JIT started" events for the same method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added two doubles to track the earliest excludeEventsBeforeJittingMethod match and latest excludeEventsAfterJittingMethod match
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for addressing my feedback.
Mono adopted PGO in .net7 as a replacement for a solution called profiled AOT in mono/mono, primarly used by Android SDK. In the replaced solution there was a concept of a "stop trigger", meaning that the user could collect methods up to a stop trigger (a method name) meaning that the profiled AOT image would only include methods up to that point and nothing more.
When using EventPipe and nettrace there is limited ability to get the same fine grained control over what methods that ends up in the nettrace file. dotnet-monitor includes ways to stop tracing if it hits for example a specific method, but due to the nature of EventPipe, there could still be additional methods added to the trace when closing session. dotnet-trace currently don't offer any ability to do something similar, but if implemented, it would probably come with similar limitations as dotnet-monitor.
Adding better filter capabilities to dotnet-pgo would add additional capabilities, giving users more control on what methods that gets included into the generated mibc file. That would give Mono's profiled AOT better control to include methods up to a stop trigger.
dotnet-pgo already had capabilities to include events based on timestamp interval. This commit extends that to select the lower/upper timestamp based on a regular expression matching methods. This commit also adds additional filter capabilities to include/exclude methods based on regular expressions, giving better control on what methods to include/exclude in the generated mibc file.
The additional filter capabilities could be used by Android SDK to replace the stop trigger features used in old profiled AOT solution. Android team could then profile the startup of a MAUI application and use generated nettrace file with dotnet-pgo passing in
--exclude-events-after-method .*Main.* --exclude-methods .*Main.*
, to include methods up to executing Main method in the generated mibc file.The new filter capabilities also add ways to create a mibc file only including/exclude methods from a specific assemblies, namespaces or a combination of all.