-
Notifications
You must be signed in to change notification settings - Fork 612
Performance Profiling with Tracy
https://github.com/wolfpld/tracy
A real time, nanosecond resolution, remote telemetry, hybrid frame and sampling profiler for games and other applications.
We can't know how good/bad our performance is until we measure it.
Tracy
is made up of two parts:
- The client - which you build into your program and will broadcast your performance information to the server.
- The server - an external program (available in the Tracy release, now called
tracy-profiler.exe
) which will receive the information and allow you to analyze it.
If building on the command line:
- Add
-DTRACY_ENABLE=ON
to your configuration arguments and build as normal.
If building from Visual Studio, select one of the -Tracy
build configurations and build as normal. To get a fair measure of runtime performance, you probably want to profile the Release
variant of the program.
1> Working directory: C:\ffxi\server\build\x64-Release-Tracy
1> [CMake] -- C:/ProgramData/chocolatey/bin/ccache.exe found and enabled
1> [CMake] -- CMAKE_SOURCE_DIR: C:/ffxi/server
1> [CMake] -- CMAKE_SIZEOF_VOID_P == 8: 64-bit build
1> [CMake] -- ENABLE_FAST_MATH: ON
1> [CMake] -- TRACY_ENABLE: ON
1> [CMake] -- Downloading Tracy development library
1> [CMake] x tracy-0.8.2/
1> [CMake] x tracy-0.8.2/.github/
1> [CMake] x tracy-0.8.2/.github/FUNDING.yml
1> [CMake] x tracy-0.8.2/.github/sponsor.png
1> [CMake] x tracy-0.8.2/.github/workflows/
...
1> [CMake] -- Downloading Tracy client
1> [CMake] x tracy-profiler.exe
...
1> [CMake] -- Modifying C:/ffxi/server/ext/tracy/tracy-0.8.2/client/TracyProfiler.hpp
...
1> [CMake] -- Configuring done
1> [CMake] -- Generating done
1> [CMake] -- Build files have been written to: C:/ffxi/server/build/x64-Release-Tracy
During the Tracy-enabled build from the previous steps, the client code will be downloaded and built into xi_map
build target. The server executables will also be downloaded and placed in the repo root (tracy-profiler.exe
, etc.).
The build will output xi_map_tracy.exe
instead of xi_map.exe
, so you can continue to run multi-process setups by swapping out the single xi_map.exe
process you want to profile with the Tracy-enabled xi_map_tracy.exe
.
WARNING: Tracy is designed to only bind to and profile a single executable at a time. If you launch multiple xi_map_tracy.exe
's at the same time, tracy-profiler.exe
will bind to the first one it finds, not necessarily the one you're wanting to profile.
WARNING: Tracy can only properly gather all the information it needs if you run xi_map_tracy.exe
as Administrator/root. There are loud warnings if you don't do this, so you're unlikely to miss them.
Run your xi_map_tracy.exe
as Administrator/root and then launch tracy-profiler.exe
.
You can connect to your local machine (127.0.0.1
) or you can enter the IP address of another machine on your network to connect to it. You'll need to make sure port 8086
is open.
Press Connect
.
You will see it connect and start profiling.
You can launch tracy-profiler.exe
before or after xi_map.exe
, it isn't important.
It is usually better to wait until startup has completed before you attach Tracy, as the startup routine isn't a good indicator of the server's runtime performance. The startup is also incredibly intensive in terms of how much data is collected and transmitted. If you don't need to profile the startup specifically, you should only attach tracy-profiler.exe
once the server is up and running.
Once connected, you should see something like this:
If you want to record a trace for later use you can click on the Wifi symbol
and you'll be given the option to save the current trace.
WARNING Traces can be very large! Plan accordingly!
If you need to capture a trace without launching the GUI (on a remote VM, a resource constrained system, etc.), Tracy comes with tracy-capture.exe
.
You can capture a trace using a command line utility contained in the capture directory. To use it you may
provide the following parameters:
• -o output.tracy – the file name of the resulting trace (required).
• -a address – specifies the IP address (or a domain name) of the client application (uses localhost if
not provided).
• -p port – network port which should be used (optional).
• -f – force overwrite, if output file already exists.
• -s seconds – number of seconds to capture before automatically disconnecting (optional).
If no client is running at the given address, the server will wait until it can make a connection. During the
capture, the utility will display the following information:
You can launch it from the command line:
PS C:\ffxi\server> .\capture.exe -o trace.tracy -f -s 60
Connecting to 127.0.0.1:8086...
Queue delay: 0 ns
Timer resolution: 100 ns
1.32 Kbps /138.5% = 0.00 Mbps | Tx: 41.34 MB | 330.28 MB | 1:32.9
Frames: 26
Time span: 1:32.9
Zones: 941,349
Elapsed time: 1:00.1
Saving trace... done!
Trace size 40.59 MB (24.26% ratio)
PS C:\ffxi\server>
You can open the resulting Trace in the tracy-profiler.exe
GUI at a later time.
Searchable statistics are in the Statistics
header, log messages are in Messages
. You can click and drag and zoom around the main timeline window for information about whats going on. You can "re-attach" to the most active frames by clicking on the Pause/Resume
header and using the options there.
If you click on the entries in the Statistics
menu, you can drill down into that function and look at it in more detail.
Remember that there are a lot of things that can affect performance.
- Platform (Windows, Linux, OSX)
- Architecture (x86, x86_64)
- Type of build (Debug, RelWithDebugInfo, Release, MinSizeRel)
- Compiler (MSVC, Clang, GCC)
- Your system specs (CPU Speed, Available Memory, Memory Latency, HDD R/W speed etc.)
- Other programs using your system's resources
- Virtualization/Containerization (VMWare, WSL, Docker)
If you're performing before/after testing, try as hard as you can to make sure the conditions are the same for both runs and change as little as possible for each change. It is also helpful to take multiple readings and many samples per reading to try and get an accurate view of performance.
It's also possible to convince yourself that something is very expensive by looking at the distribution of time spent in child calls for a given function. If a function is only taking nanoseconds
overall, but it's spending 80% of its time in a particular child call, this probably isn't a good candidate for investigation.
- Expensive pathing and navmesh access... all the time... every tick... every mob... everywhere...
-
parse
routine is slow. - Next to nothing is threaded, almost everything is blocking.