Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mac kext performance tracing #281

Merged
merged 5 commits into from
Oct 7, 2018
Merged

Conversation

pmj
Copy link
Contributor

@pmj pmj commented Sep 19, 2018

Adds a simple mechanism for tracing the runtime of code paths in the kext.

This is an updated and reworked version of the kext performance tracing code I wrote ages ago. The original PR (pre-github) was never merged due to the derived works licensing issue. This new PR no longer imports any external libraries.

Example output:

 0                                  VnodeOp    324663 [  324663] samples, total time:      2183224415 ns, mean:    6724.59 ns +/-   165772.00, min:     552 ns, max:   63021858 ns
 1                         VnodeOp_EarlyOut     14065 [   14065] samples, total time:          919921 ns, mean:      65.40 ns +/-      148.00, min:      10 ns, max:      10823 ns
 2         VnodeOp_NoVirtualizationRootFlag   1053931 [ 1053931] samples, total time:       324526160 ns, mean:     307.92 ns +/-      851.00, min:     131 ns, max:     506253 ns
 3                        VnodeOp_EmptyFlag         0 [       0] samples, total time:               0 ns, mean:       0.00 ns +/-        0.00
 4                      VnodeOp_DenyCrawler      6211 [    6211] samples, total time:         2723004 ns, mean:     438.42 ns +/-      199.00, min:     208 ns, max:       5000 ns
 5                          VnodeOp_Offline     13787 [   13787] samples, total time:       128897126 ns, mean:    9349.18 ns +/-    22885.00, min:    1476 ns, max:    1960524 ns
 6                         VnodeOp_Provider     33311 [   33311] samples, total time:       165406431 ns, mean:    4965.52 ns +/-     3066.00, min:     712 ns, max:     192446 ns
 7     VnodeOp_PopulatePlaceholderDirectory       405 [     405] samples, total time:      1679875615 ns, mean: 4147841.02 ns +/- 11614989.00, min:  229834 ns, max:  169771434 ns
 8           VnodeOp_HydratePlaceholderFile      3711 [    3711] samples, total time:      2797867855 ns, mean:  753939.06 ns +/-  9880228.00, min:  101931 ns, max:  599370787 ns
 9                    VnodeOp_IdentifySplit    375877 [  375877] samples, total time:       346696712 ns, mean:     922.37 ns +/-   153833.00, min:     193 ns, max:   63019327 ns
10      VnodeOp_VirtualizationRootFindSplit    375877 [  375877] samples, total time:      2087201093 ns, mean:    5552.88 ns +/-     9195.00, min:     299 ns, max:    1948646 ns
11                            ReadFileFlags   1436019 [ 1436019] samples, total time:       214778997 ns, mean:     149.57 ns +/-      599.00, min:      58 ns, max:     505953 ns
12                  VirtualizationRoot_Find    375877 [  375877] samples, total time:      2024552898 ns, mean:    5386.21 ns +/-     9150.00, min:     201 ns, max:    1948488 ns
13         VirtualizationRoot_FindIteration   1369038 [ 1369038] samples, total time:      1894929183 ns, mean:    1384.13 ns +/-     4698.00, min:      95 ns, max:    1945764 ns

The main functionality is built around a scoped time measurement class, and an enum of probe indices. Each index corresponds to a struct of aggregate data from samples. (total runtime, sum-of-squares for standard deviation calculation, minimum, maximum, number of samples)

The logging user client is extended by a method for extracting the data, and the sample user space logging tool pulls and outputs that data periodically.

The second commit places a bunch of probes throughout the kext, and the final 2 commits ensure that the instrumentation normally compiles down to nothing, and add a profiling build configuration where it's enabled.

Copy link
Contributor

@sanoursa sanoursa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the direction, but want to make sure I understand more about how to use and maintain this moving forward.


public:
inline ProfileSample(PrjFS_PerfCounter defaultProbe);
inline void setProbe(PrjFS_PerfCounter probe);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Syle nit: we like to start method names with capital letters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should now be mostly fixed, let me know if I've missed any.

#define PCNAME(SUFFIX) [Probe_ ## SUFFIX] = #SUFFIX
static const char* const PerfCounterNames[Probe_Count] =
{
PCNAME(VnodeOp),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The macro is actually more characters than the full names :-). I would vote to drop the macro.

Copy link
Contributor Author

@pmj pmj Sep 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the macro doesn't just expand the name, it also generates the assignment, so the idea is you can't accidentally mismatch enum value and name string. Without the macro you'd need to write e.g.

[Probe_VnodeOp] = "VnodeOp",

in place of

PCNAME(VnodeOp),

I'm not convinced this is really an improvement?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is just a style difference. I'm pretty allergic to macros that mangle symbols because they tend to make it easier to write code at the expense of making it harder to read code. In this case, the biggest problem I have with it is I can't put my cursor on the enum value and jump to its definition. It's not even obvious that it is an enum value, so I wouldn't even know where to go find it.


static dispatch_source_t start_kext_profiling_data_polling(io_connect_t connection)
{
dispatch_source_t timer = dispatch_source_create(DISPATCH_SOURCE_TYPE_TIMER, 0, 0, dispatch_get_main_queue());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: spaces, not tabs

}

operationSample.takeSplitSample(Probe_Op_IdentifySplit);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When do we call setProbe and when do we call takeSplitSample? What is the difference between them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm documenting this properly in the header file in the next iteration of this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ProjFS.Mac/PrjFSKext/PrjFSKext/PerformanceTracing.cpp Outdated Show resolved Hide resolved
@pmj pmj force-pushed the performance-tracing branch 3 times, most recently from 8741882 to f0a6cef Compare October 1, 2018 19:04
@pmj
Copy link
Contributor Author

pmj commented Oct 1, 2018

I've updated this PR for latest master.

Copy link
Contributor

@sanoursa sanoursa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, this looks good. @kewillford once this goes in, could you please update the PR build definition to build this new configuration as well?

Copy link
Member

@wilbaker wilbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor questions\comments

@@ -92,7 +92,7 @@
</AdditionalOptions>
</LaunchAction>
<ProfileAction
buildConfiguration = "Release"
buildConfiguration = "Profiling(Release)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the impact of this change?

Copy link
Contributor Author

@pmj pmj Oct 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The <ProfileAction> refers to the "Profile" scheme in Xcode. You can't run/profile kexts directly from Xcode of course, but if you select "Product" -> "Build For" -> "Profiling" it will build using the configuration specified here. So with that Xcode command you can (a) quickly check that your code still builds with profiling enabled, and (b) easily build a profiling-enabled kext you can copy to your target machine without manually messing about with the macro.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks!


void PerfTracing_Init()
{
for (size_t i = 0; i < Probe_Count; ++i)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should PerfTracing_Init be a no-op when PRJFS_PERFORMANCE_TRACING_ENABLE is not defined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been trying to minimise the amount of conditionally compiled code as @sanoursa was uneasy about having too much of it due to the potential of it going stale if it's not well exercised. Given the Init functions are only called once, this seemed a relatively benign one to always allow to run.

ProjFS.Mac/PrjFSKext/PrjFSKext/PrjFSLogUserClient.hpp Outdated Show resolved Hide resolved
@@ -58,12 +59,16 @@ kern_return_t VirtualizationRoots_Cleanup()

VirtualizationRoot* VirtualizationRoots_FindForVnode(vnode_t vnode)
{
ProfileSample functionSample(Probe_VirtualizationRoot_Find);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the compiler smart enough to avoid the calls to the constructor and destructor when PRJFS_PERFORMANCE_TRACING_ENABLE is not defined? Or would we need to use macros (to avoid creating ProfileSample s in the first place)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ProfileSample member functions are all defined as inline, and as their bodies evaluate to nothing, and there isn't a vtable, they should disappear completely on optimised, non-profiling builds. With compiler optimisation, few things are ever guaranteed of course…

// // ... do something potentially expensive here ...
// }
// // ... more code ...
// } // <-- Runtime to here will be recorded either under Probe_MyFunction or Probe_MyFunctionSpecialCase
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtime to here will be recorded either under Probe_MyFunction or Probe_MyFunctionSpecialCase

To confirm, if specialCase is true, the runtime will always be recorded under Probe_MyFunctionSpecialCase?

Copy link
Contributor Author

@pmj pmj Oct 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. And only Probe_MyFunctionSpecialCase.

@pmj pmj force-pushed the performance-tracing branch 2 times, most recently from d1ca7e9 to 7487879 Compare October 5, 2018 10:12
Phil Dennis-Jordan and others added 5 commits October 6, 2018 09:35
…e data

This adds a scoped time measurement class with splits. Probes are defined in an enum. When a ProfileSample instance is created, the Mach absolute time is taken, and when it is destroyed, another time stamp is taken, and the interval recorded with the sample aggregation probe ID specified during construction. Additionally, split time samples can be taken. The number of samples, total runtime, sum of squares, minimum, and maximum values are recorded. From these, mean and standard deviation can be derived.

An external method for extracting the profiling data from the kernel has been added to the logging user client class. The simple log tool has been extended to fetch and dump this data to stdout every 15 seconds.
In the review, it was pointed out we should be using nullptr instead of 0 for default function argument values which were copied from the declaration of the function being overridden in the log user client. The same is true for the provider user client, so this fixes it there too.
Using the previously created profiling mechanism, this adds various measurement points to the vnode scope listener handler, and the relevant strings to the user space extraction tool.
@pmj pmj merged commit e1db074 into microsoft:master Oct 7, 2018
@pmj pmj deleted the performance-tracing branch October 7, 2018 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants