Mac kext performance tracing #281

pmj · 2018-09-19T14:57:41Z

Adds a simple mechanism for tracing the runtime of code paths in the kext.

This is an updated and reworked version of the kext performance tracing code I wrote ages ago. The original PR (pre-github) was never merged due to the derived works licensing issue. This new PR no longer imports any external libraries.

Example output:

 0                                  VnodeOp    324663 [  324663] samples, total time:      2183224415 ns, mean:    6724.59 ns +/-   165772.00, min:     552 ns, max:   63021858 ns
 1                         VnodeOp_EarlyOut     14065 [   14065] samples, total time:          919921 ns, mean:      65.40 ns +/-      148.00, min:      10 ns, max:      10823 ns
 2         VnodeOp_NoVirtualizationRootFlag   1053931 [ 1053931] samples, total time:       324526160 ns, mean:     307.92 ns +/-      851.00, min:     131 ns, max:     506253 ns
 3                        VnodeOp_EmptyFlag         0 [       0] samples, total time:               0 ns, mean:       0.00 ns +/-        0.00
 4                      VnodeOp_DenyCrawler      6211 [    6211] samples, total time:         2723004 ns, mean:     438.42 ns +/-      199.00, min:     208 ns, max:       5000 ns
 5                          VnodeOp_Offline     13787 [   13787] samples, total time:       128897126 ns, mean:    9349.18 ns +/-    22885.00, min:    1476 ns, max:    1960524 ns
 6                         VnodeOp_Provider     33311 [   33311] samples, total time:       165406431 ns, mean:    4965.52 ns +/-     3066.00, min:     712 ns, max:     192446 ns
 7     VnodeOp_PopulatePlaceholderDirectory       405 [     405] samples, total time:      1679875615 ns, mean: 4147841.02 ns +/- 11614989.00, min:  229834 ns, max:  169771434 ns
 8           VnodeOp_HydratePlaceholderFile      3711 [    3711] samples, total time:      2797867855 ns, mean:  753939.06 ns +/-  9880228.00, min:  101931 ns, max:  599370787 ns
 9                    VnodeOp_IdentifySplit    375877 [  375877] samples, total time:       346696712 ns, mean:     922.37 ns +/-   153833.00, min:     193 ns, max:   63019327 ns
10      VnodeOp_VirtualizationRootFindSplit    375877 [  375877] samples, total time:      2087201093 ns, mean:    5552.88 ns +/-     9195.00, min:     299 ns, max:    1948646 ns
11                            ReadFileFlags   1436019 [ 1436019] samples, total time:       214778997 ns, mean:     149.57 ns +/-      599.00, min:      58 ns, max:     505953 ns
12                  VirtualizationRoot_Find    375877 [  375877] samples, total time:      2024552898 ns, mean:    5386.21 ns +/-     9150.00, min:     201 ns, max:    1948488 ns
13         VirtualizationRoot_FindIteration   1369038 [ 1369038] samples, total time:      1894929183 ns, mean:    1384.13 ns +/-     4698.00, min:      95 ns, max:    1945764 ns

The main functionality is built around a scoped time measurement class, and an enum of probe indices. Each index corresponds to a struct of aggregate data from samples. (total runtime, sum-of-squares for standard deviation calculation, minimum, maximum, number of samples)

The logging user client is extended by a method for extracting the data, and the sample user space logging tool pulls and outputs that data periodically.

The second commit places a bunch of probes throughout the kext, and the final 2 commits ensure that the instrumentation normally compiles down to nothing, and add a profiling build configuration where it's enabled.

sanoursa

I like the direction, but want to make sure I understand more about how to use and maintain this moving forward.

sanoursa · 2018-09-20T17:50:58Z

ProjFS.Mac/PrjFSKext/PrjFSKext/PerformanceTracing.hpp

+
+public:
+    inline ProfileSample(PrjFS_PerfCounter defaultProbe);
+    inline void setProbe(PrjFS_PerfCounter probe);


Syle nit: we like to start method names with capital letters

These should now be mostly fixed, let me know if I've missed any.

sanoursa · 2018-09-20T18:03:13Z

ProjFS.Mac/PrjFSLib/prjfs-log/prjfs-log.cpp

+#define PCNAME(SUFFIX) [Probe_ ## SUFFIX] = #SUFFIX
+static const char* const PerfCounterNames[Probe_Count] =
+{
+    PCNAME(VnodeOp),


The macro is actually more characters than the full names :-). I would vote to drop the macro.

Note that the macro doesn't just expand the name, it also generates the assignment, so the idea is you can't accidentally mismatch enum value and name string. Without the macro you'd need to write e.g.

[Probe_VnodeOp] = "VnodeOp",

in place of

PCNAME(VnodeOp),

I'm not convinced this is really an improvement?

I guess this is just a style difference. I'm pretty allergic to macros that mangle symbols because they tend to make it easier to write code at the expense of making it harder to read code. In this case, the biggest problem I have with it is I can't put my cursor on the enum value and jump to its definition. It's not even obvious that it is an enum value, so I wouldn't even know where to go find it.

sanoursa · 2018-09-20T18:04:12Z

ProjFS.Mac/PrjFSLib/prjfs-log/prjfs-log.cpp

+
+static dispatch_source_t start_kext_profiling_data_polling(io_connect_t connection)
+{
+	dispatch_source_t timer = dispatch_source_create(DISPATCH_SOURCE_TYPE_TIMER, 0, 0, dispatch_get_main_queue());


Nit: spaces, not tabs

sanoursa · 2018-09-20T18:11:14Z

ProjFS.Mac/PrjFSKext/PrjFSKext/KauthHandler.cpp

    }

+    operationSample.takeSplitSample(Probe_Op_IdentifySplit);


When do we call setProbe and when do we call takeSplitSample? What is the difference between them?

I'm documenting this properly in the header file in the next iteration of this.

I've now added a comment section explaining how it all works.

ProjFS.Mac/PrjFSKext/PrjFSKext/KauthHandler.cpp

ProjFS.Mac/PrjFSKext/PrjFSKext/PerformanceTracing.cpp

pmj · 2018-10-01T19:12:06Z

I've updated this PR for latest master.

sanoursa

Thanks for the changes, this looks good. @kewillford once this goes in, could you please update the PR build definition to build this new configuration as well?

wilbaker

Some minor questions\comments

wilbaker · 2018-10-04T23:17:01Z

ProjFS.Mac/PrjFS.xcworkspace/xcshareddata/xcschemes/PrjFS.xcscheme

@@ -92,7 +92,7 @@
      </AdditionalOptions>
   </LaunchAction>
   <ProfileAction
-      buildConfiguration = "Release"
+      buildConfiguration = "Profiling(Release)"


What's the impact of this change?

The <ProfileAction> refers to the "Profile" scheme in Xcode. You can't run/profile kexts directly from Xcode of course, but if you select "Product" -> "Build For" -> "Profiling" it will build using the configuration specified here. So with that Xcode command you can (a) quickly check that your code still builds with profiling enabled, and (b) easily build a profiling-enabled kext you can copy to your target machine without manually messing about with the macro.

ProjFS.Mac/PrjFSKext/PrjFSKext/PerformanceTracing.cpp

wilbaker · 2018-10-04T23:19:33Z

ProjFS.Mac/PrjFSKext/PrjFSKext/PerformanceTracing.cpp

+
+void PerfTracing_Init()
+{
+    for (size_t i = 0; i < Probe_Count; ++i)


Should PerfTracing_Init be a no-op when PRJFS_PERFORMANCE_TRACING_ENABLE is not defined?

I've been trying to minimise the amount of conditionally compiled code as @sanoursa was uneasy about having too much of it due to the potential of it going stale if it's not well exercised. Given the Init functions are only called once, this seemed a relatively benign one to always allow to run.

ProjFS.Mac/PrjFSKext/PrjFSKext/PrjFSLogUserClient.hpp

wilbaker · 2018-10-04T23:24:58Z

ProjFS.Mac/PrjFSKext/PrjFSKext/VirtualizationRoots.cpp

@@ -58,12 +59,16 @@ kern_return_t VirtualizationRoots_Cleanup()

 VirtualizationRoot* VirtualizationRoots_FindForVnode(vnode_t vnode)
 {
+    ProfileSample functionSample(Probe_VirtualizationRoot_Find);


Is the compiler smart enough to avoid the calls to the constructor and destructor when PRJFS_PERFORMANCE_TRACING_ENABLE is not defined? Or would we need to use macros (to avoid creating ProfileSample s in the first place)

The ProfileSample member functions are all defined as inline, and as their bodies evaluate to nothing, and there isn't a vtable, they should disappear completely on optimised, non-profiling builds. With compiler optimisation, few things are ever guaranteed of course…

wilbaker · 2018-10-04T23:28:48Z

ProjFS.Mac/PrjFSKext/PrjFSKext/PerformanceTracing.hpp

+//         // ... do something potentially expensive here ...
+//     }
+//     // ... more code ...
+// } // <-- Runtime to here will be recorded either under Probe_MyFunction or Probe_MyFunctionSpecialCase


Runtime to here will be recorded either under Probe_MyFunction or Probe_MyFunctionSpecialCase

To confirm, if specialCase is true, the runtime will always be recorded under Probe_MyFunctionSpecialCase?

Yes. And only Probe_MyFunctionSpecialCase.

…e data This adds a scoped time measurement class with splits. Probes are defined in an enum. When a ProfileSample instance is created, the Mach absolute time is taken, and when it is destroyed, another time stamp is taken, and the interval recorded with the sample aggregation probe ID specified during construction. Additionally, split time samples can be taken. The number of samples, total runtime, sum of squares, minimum, and maximum values are recorded. From these, mean and standard deviation can be derived. An external method for extracting the profiling data from the kernel has been added to the logging user client class. The simple log tool has been extended to fetch and dump this data to stdout every 15 seconds.

In the review, it was pointed out we should be using nullptr instead of 0 for default function argument values which were copied from the declaration of the function being overridden in the log user client. The same is true for the provider user client, so this fixes it there too.

Using the previously created profiling mechanism, this adds various measurement points to the vnode scope listener handler, and the relevant strings to the user space extraction tool.

sanoursa reviewed Sep 20, 2018

View reviewed changes

pmj force-pushed the performance-tracing branch 3 times, most recently from 8741882 to f0a6cef Compare October 1, 2018 19:04

sanoursa approved these changes Oct 4, 2018

View reviewed changes

wilbaker reviewed Oct 4, 2018

View reviewed changes

pmj force-pushed the performance-tracing branch 2 times, most recently from d1ca7e9 to 7487879 Compare October 5, 2018 10:12

wilbaker approved these changes Oct 5, 2018

View reviewed changes

Phil Dennis-Jordan and others added 5 commits October 6, 2018 09:35

Mac kext: Placement of profiling probes throughout the vnode listener

5f7cc97

Using the previously created profiling mechanism, this adds various measurement points to the vnode scope listener handler, and the relevant strings to the user space extraction tool.

Mac kext: Conditionally enable performance tracing.

b268599

Mac kext: New Profiling(Release) build configuration based on Release

905258a

pmj force-pushed the performance-tracing branch from 7487879 to 905258a Compare October 6, 2018 07:41

pmj merged commit e1db074 into microsoft:master Oct 7, 2018

pmj deleted the performance-tracing branch October 7, 2018 08:56

jrbriggs mentioned this pull request Oct 12, 2018

Merge milestones/M142 to releases/shipped #381

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mac kext performance tracing #281

Mac kext performance tracing #281

pmj commented Sep 19, 2018 •

edited

Loading

sanoursa left a comment

sanoursa Sep 20, 2018

pmj Sep 29, 2018

sanoursa Sep 20, 2018

pmj Sep 21, 2018 •

edited

Loading

sanoursa Sep 21, 2018

sanoursa Sep 20, 2018

sanoursa Sep 20, 2018

pmj Sep 28, 2018

pmj Sep 29, 2018

pmj commented Oct 1, 2018

sanoursa left a comment

wilbaker left a comment

wilbaker Oct 4, 2018

pmj Oct 5, 2018 •

edited

Loading

wilbaker Oct 5, 2018

wilbaker Oct 4, 2018

pmj Oct 5, 2018

wilbaker Oct 4, 2018

pmj Oct 5, 2018

wilbaker Oct 4, 2018

pmj Oct 5, 2018 •

edited

Loading

Mac kext performance tracing #281

Mac kext performance tracing #281

Conversation

pmj commented Sep 19, 2018 • edited Loading

sanoursa left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pmj Sep 21, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pmj commented Oct 1, 2018

sanoursa left a comment

Choose a reason for hiding this comment

wilbaker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pmj Oct 5, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pmj Oct 5, 2018 • edited Loading

Choose a reason for hiding this comment

pmj commented Sep 19, 2018 •

edited

Loading

pmj Sep 21, 2018 •

edited

Loading

pmj Oct 5, 2018 •

edited

Loading

pmj Oct 5, 2018 •

edited

Loading