Using perf_timer as a library

This is a simple perf timer that provides output in text and HTML format. It's very simple to work with.

The frame profiler supports a number of output modes, but the most useful is the HTML output mode, which offers multiple ways of viewing your data:

At the top of each page, you'll see a tree map view:

Below that, you'll see an expandable and collapsible table that can be configured in either a top-down tree view to see what call stacks are the most expensive:

Or in a flat view to see which functions are taking the longest in all contexts combined:

In a multithreaded environment, these outputs are presented per-thread, so you can see each thread's individual hotpoints, and they're also collected at the bottom of the page to show cumulative hotspots across all threads.

In addition to the HTML output view, there's also two text output modes (TREE and FLAT), which display the same tables you see in the HTML output as ASCII tables.

Using perf_timer as a library

Basic Usage

When collection is turned on, the perf timer uses collections.deque for atomic insertion of events with no locking. When disabled, it does nothing at all.

To enable:

perf_timer.EnablePerfTracking()

To disable:

import perf_timer
perf_timer.EnablePerfTracking(False)

Once enabled, you can create perf timers as context managers:

import perf_timer
with perf_timer.PerfTimer("BlockName"):
    # Your code here

It will collect nested metrics on your application, recording the total time and callstacks for each block you instrument this way.

When you're finished, to print the report:

import perf_timer
perf_timer.PerfTimer.PrintPerfReport(perf_timer.ReportMode.HTML)

In addition to HTML (which will create an interactive html file with all the results), you can also use ReportMode.TREE and ReportMode.FLAT to print out an ascii table with the results.

When the report mode is HTML, the second parameter is a string indicating the output file name. It will default to <main_file_name>_PERF.html.

When the report mode is TREE or FLAT, the second parameter is a function that accepts strings to print. It defaults to 'print()'.

Frame-based Profiling

PerfTimer also supports frame-based profiling. The term "frame-based" here is derived from the use case of profiling video games, where hitches and long frames can be very difficult to profile. However, the implementation is generic enough that it can be used for capturing profile reports in any context where you want to do any kind of grouping of profile results.

To use frame-based profiling, simply add a numerical current frame count to your PerfTimer constructor:

import perf_timer
global frameCount
with perf_timer.PerfTimer("BlockName", frameCount):
    # Your profiled code here

In the HTML output mode, this will add a frame-by-frame performance graph to the top of the HTML page. Initially, the profile output will show the collected data from all frames together, but clicking on any of these bars will load the profiling data for that specific frame below it, allowing for inspection of hotspots within the longest frames and hitches.

If you're profiling a large number of frames and having difficulty clicking on the frame you want, you can also click and drag within the header to zoom in on a segment of the output so that you can click on the one you want more easily.

!! A word of warning !!

Running the frame-based profiler with output modes of FLAT or TREE will generate a separate ascii table PER FRAME and PER THREAD. This could result in an extremely large amount of output. It's strongly recommended to use frame-based profiling only with the HTML output mode.

Using perf_timer as an external viewer

In addition to being usable as a python library, perf_timer can also be used to generate visualization output for metrics collected from other applications, potentially written in different languages. If you can generate metrics in the format required by perf_timer, it can consume those and convert them into output. There are two formats supported by perf_timer: a JSON format or a binary format.

Both standard and frame-based profiling modes are supported in the external viewer mode. See the section on frame-based profiling above for more information on this.

JSON format

The JSON format is relatively simple. It contains an array of arrays, with each nested array representing a single recorded metric. The format of the array is as follows:

[
    // Either 0 (ENTER CONTEXT) or 1 (EXIT CONTEXT)
    operation: number,

    // Unique number per thread. Doesn't matter what it is as long as it's unique.
    threadId: number, 

    // Current frame count. If you don't wish to use frame-based profiling, set this to -1.
    frameId: number, 

    // Timestamp of the event, in nanoseconds - this is an absolute time, not a duration
    timestamp: number,

    // The string name for the current block. Parent block names should not be included. 
    blockName: string 
]

Example JSON data:

[
  [0, 1, 0, 1609044830711675392, "DemoFunc_0"],
  [0, 1, 0, 1609044830713163520, "DemoFunc_0_0"],
  [0, 1, 0, 1609044830714651392, "DemoFunc_1_0"],
  [1, 1, 0, 1609044830716141056, "DemoFunc_1_0"],
  [1, 1, 0, 1609044830716141056, "DemoFunc_0_0"],
  [0, 1, 0, 1609044830716141056, "DemoFunc_0_1"],
  [0, 1, 0, 1609044830717628672, "DemoFunc_1_0"],
  [1, 1, 0, 1609044830719115520, "DemoFunc_1_0"],
  [1, 1, 0, 1609044830719115520, "DemoFunc_0_1"],
  [1, 1, 0, 1609044830719115520, "DemoFunc_0"],
  [0, 1, 1, 1609044830719115520, "DemoFunc_0"],
  [1, 1, 1, 1609044830720603392, "DemoFunc_0"]
]

Note that it is the responsibility of the application to ensure all ENTER events are balanced with a corresponding EXIT event and that the events are ordered correctly within their thread. (Cross-thread ordering does not matter - thread events can be interleaved, or each thread can have its own section of the array.) perf_timer will use the ordering of the ENTER and EXIT events to build call stacks for the tree views.

Binary format

The binary format is similar to the JSON format, but, perhaps expectedly, is more strict. It's provided to allow for native applications to generate the output more efficiently.

Note that the binary format does not include ANY alignment padding to ensure it will work across all languages and architectures without making any assumptions about the alignment of data. Structs used to generate the metrics file should thus be packed. All values should also be provided in little endian byte order, meaning if you generate the file on a big endian system, you will need to convert your values.

The overall file format is as follows:

Field	Type	Size	Values
Magic	unsigned int32	4 bytes	MUST contain the value 0xFA57
Count	unsigned int32	4 bytes	The total number of events recorded
Events	struct Event	(see struct definition)

The definition of struct Event is as follows:

Field	Type	Size	Values
Operation	Byte	1 byte	0 (START_CONTEXT) or 1 (END_CONTEXT)
Thread ID	unsigned int64	8 bytes	Any integer value unique to the current thread
Frame ID	signed int32	4 bytes	The current frame count for this event, or -1 if not using frame-based profiling
Timestamp	unsigned int64	8 bytes	The timestamp of the event in nanoseconds
String Length	unsigned int16	2 bytes	The length of the Block Name string in bytes
Block Name	char*	Variable	The name of the current context as a string without the null terminator

Example binary data:

Samples

The sample_output directory in the tree includes some randomly-generated examples of the HTML output, as well as example binary and json files containing example data.

If you use the 010 editor by SweetScape, the example data can be viewed using the following template definition (as shown in the screenshot above):

int magic <format=hex, comment="Must be 0xFA57">;
Assert(magic == 0xFA57);
int count<comment="Number of events">;

enum<byte> Operation
{
    Start = 0,
    End = 1,
};

local int i;
for( i = 0; i < count; i++ )
{
    struct Foo
    {
        Operation op<comment="Event operation">;
        int64 threadId<comment="Numeric thread ID">;
        int32 frameId<comment="Frame counter for this event">;
        uint64 time<comment="Nanosecond timestamp">;
        uint16 strlen<comment="String length">;
        char name[strlen]<comment="Block name">;
    };
    Foo f;
}

You can also use that template to test the validity of any binary output you generate.

Reference Implementations

A reference implementation for generating perf timer data in another language is provided in the lib_references directory. Currently, only a header for C++ is provided.

The C++ presentation is a 100% usable implementation of the perf timer; however, it is somewhat naiive about multithreading, using std::mutex to guard events rather than using a more tailored solution such as a concurrent queue. This decision was made to keep the reference implementation as simple as possible, but it should be fairly trivial to change it to use a concurrent queue.

Note that by default, multithreading is not enabled to avoid taking the cost in single-threaded apps. You can control this by defining PERFTIMER_MULTITHREADED to true before including the header, or by modifying the header. (It's recomended to set this in the makefile or other build settings to ensure it's applied consistently to all headers or you may have crashes.)

Additionally, the perf timer itself is also disabled by default so that instrumenting your code has zero cost unless you make a build with the perf timer turned on. To turn it on, simply #define PERFTIMER_ENABLED before including the header. (Again, recommended to do this in the makefile.)

Basic usage:

void SomeFunction()
{
    // This creates a performance timer for this function scope. It's automatically named with
    // the function's name and results in an object that will go out of scope when the function
    // returns.
    PERF_TIMER(g_frameCount);

    // This provides a note, which will appear on the output as an annotation on the frame it's
    // associated with. This is helpful for tracking the impact on frame time of certain rare events.
    PERF_NOTE("A note!", g_frameCount);

    {
        // This creates a block-scope perf timer that will track only the time within
        // this specific block of code, and will go out of scope at the closing brace.
        ::PerfTimer::PerfTimer blockTimer("Some block", g_frameCount);
    }
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
doc_images		doc_images
lib_reference/c++		lib_reference/c++
sample_output		sample_output
LICENSE		LICENSE
README.md		README.md
perf_timer.py		perf_timer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Using perf_timer as a library

Basic Usage

Frame-based Profiling

!! A word of warning !!

Using perf_timer as an external viewer

JSON format

Binary format

Samples

Reference Implementations

About

Packages

Languages

License

SleepingCatGames/perf_timer

Folders and files

Latest commit

History

Repository files navigation

Using perf_timer as a library

Basic Usage

Frame-based Profiling

!! A word of warning !!

Using perf_timer as an external viewer

JSON format

Binary format

Samples

Reference Implementations

About

Topics

Resources

License

Stars

Watchers

Forks

Packages 0

Languages

Packages