-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mac kext performance tracing #281
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the direction, but want to make sure I understand more about how to use and maintain this moving forward.
|
||
public: | ||
inline ProfileSample(PrjFS_PerfCounter defaultProbe); | ||
inline void setProbe(PrjFS_PerfCounter probe); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Syle nit: we like to start method names with capital letters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should now be mostly fixed, let me know if I've missed any.
#define PCNAME(SUFFIX) [Probe_ ## SUFFIX] = #SUFFIX | ||
static const char* const PerfCounterNames[Probe_Count] = | ||
{ | ||
PCNAME(VnodeOp), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The macro is actually more characters than the full names :-). I would vote to drop the macro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the macro doesn't just expand the name, it also generates the assignment, so the idea is you can't accidentally mismatch enum value and name string. Without the macro you'd need to write e.g.
[Probe_VnodeOp] = "VnodeOp",
in place of
PCNAME(VnodeOp),
I'm not convinced this is really an improvement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is just a style difference. I'm pretty allergic to macros that mangle symbols because they tend to make it easier to write code at the expense of making it harder to read code. In this case, the biggest problem I have with it is I can't put my cursor on the enum value and jump to its definition. It's not even obvious that it is an enum value, so I wouldn't even know where to go find it.
|
||
static dispatch_source_t start_kext_profiling_data_polling(io_connect_t connection) | ||
{ | ||
dispatch_source_t timer = dispatch_source_create(DISPATCH_SOURCE_TYPE_TIMER, 0, 0, dispatch_get_main_queue()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: spaces, not tabs
} | ||
|
||
operationSample.takeSplitSample(Probe_Op_IdentifySplit); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When do we call setProbe
and when do we call takeSplitSample
? What is the difference between them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm documenting this properly in the header file in the next iteration of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8741882
to
f0a6cef
Compare
I've updated this PR for latest master. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes, this looks good. @kewillford once this goes in, could you please update the PR build definition to build this new configuration as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor questions\comments
@@ -92,7 +92,7 @@ | |||
</AdditionalOptions> | |||
</LaunchAction> | |||
<ProfileAction | |||
buildConfiguration = "Release" | |||
buildConfiguration = "Profiling(Release)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the impact of this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The <ProfileAction>
refers to the "Profile" scheme in Xcode. You can't run/profile kexts directly from Xcode of course, but if you select "Product" -> "Build For" -> "Profiling" it will build using the configuration specified here. So with that Xcode command you can (a) quickly check that your code still builds with profiling enabled, and (b) easily build a profiling-enabled kext you can copy to your target machine without manually messing about with the macro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok thanks!
|
||
void PerfTracing_Init() | ||
{ | ||
for (size_t i = 0; i < Probe_Count; ++i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should PerfTracing_Init
be a no-op when PRJFS_PERFORMANCE_TRACING_ENABLE
is not defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been trying to minimise the amount of conditionally compiled code as @sanoursa was uneasy about having too much of it due to the potential of it going stale if it's not well exercised. Given the Init functions are only called once, this seemed a relatively benign one to always allow to run.
@@ -58,12 +59,16 @@ kern_return_t VirtualizationRoots_Cleanup() | |||
|
|||
VirtualizationRoot* VirtualizationRoots_FindForVnode(vnode_t vnode) | |||
{ | |||
ProfileSample functionSample(Probe_VirtualizationRoot_Find); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the compiler smart enough to avoid the calls to the constructor and destructor when PRJFS_PERFORMANCE_TRACING_ENABLE
is not defined? Or would we need to use macros (to avoid creating ProfileSample
s in the first place)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ProfileSample
member functions are all defined as inline
, and as their bodies evaluate to nothing, and there isn't a vtable, they should disappear completely on optimised, non-profiling builds. With compiler optimisation, few things are ever guaranteed of course…
// // ... do something potentially expensive here ... | ||
// } | ||
// // ... more code ... | ||
// } // <-- Runtime to here will be recorded either under Probe_MyFunction or Probe_MyFunctionSpecialCase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Runtime to here will be recorded either under Probe_MyFunction or Probe_MyFunctionSpecialCase
To confirm, if specialCase
is true, the runtime will always be recorded under Probe_MyFunctionSpecialCase
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. And only Probe_MyFunctionSpecialCase
.
d1ca7e9
to
7487879
Compare
…e data This adds a scoped time measurement class with splits. Probes are defined in an enum. When a ProfileSample instance is created, the Mach absolute time is taken, and when it is destroyed, another time stamp is taken, and the interval recorded with the sample aggregation probe ID specified during construction. Additionally, split time samples can be taken. The number of samples, total runtime, sum of squares, minimum, and maximum values are recorded. From these, mean and standard deviation can be derived. An external method for extracting the profiling data from the kernel has been added to the logging user client class. The simple log tool has been extended to fetch and dump this data to stdout every 15 seconds.
In the review, it was pointed out we should be using nullptr instead of 0 for default function argument values which were copied from the declaration of the function being overridden in the log user client. The same is true for the provider user client, so this fixes it there too.
Using the previously created profiling mechanism, this adds various measurement points to the vnode scope listener handler, and the relevant strings to the user space extraction tool.
7487879
to
905258a
Compare
Adds a simple mechanism for tracing the runtime of code paths in the kext.
This is an updated and reworked version of the kext performance tracing code I wrote ages ago. The original PR (pre-github) was never merged due to the derived works licensing issue. This new PR no longer imports any external libraries.
Example output:
The main functionality is built around a scoped time measurement class, and an enum of probe indices. Each index corresponds to a struct of aggregate data from samples. (total runtime, sum-of-squares for standard deviation calculation, minimum, maximum, number of samples)
The logging user client is extended by a method for extracting the data, and the sample user space logging tool pulls and outputs that data periodically.
The second commit places a bunch of probes throughout the kext, and the final 2 commits ensure that the instrumentation normally compiles down to nothing, and add a profiling build configuration where it's enabled.