Note: This is pre-release software. Please test it, but don't rely on it for alerting (yet). The metric names and the perflib library interface will probably change.
perflib_exporter is a Prometheus exporter for Windows system performance. It queries performance data using the low-level HKEY_PERFORMANCE_DATA registry API instead of the high-level WMI or PDH interfaces.
The registry API will return metrics for all perflib providers in a single binary blob that we have to parse ourselves. This makes it very efficient - querying all metrics (~20-30k) takes about ~40ms or ~300ms with cold cache. The providers enabled by default take ~10ms to query.
Its raison d'être is a bug in WMI which causes collection times to spike every ~16 minutes - see prometheus-community/windows_exporter#89 (comment) for details.
This repository contains the "perflib" package, which exposes an API for querying the HKEY_PERFORMANCE_DATA API. See the godoc for more details:
https://godoc.org/github.com/leoluk/perflib_exporter/perflib
Install using "go get":
go get github.com/leoluk/perflib_exporter
perflib groups metrics into different groups/providers. Each provider has a numerical ID. Query performance depends on the number of providers enabled, with some being more expensive than others.
If you run the binary without arguments, it uses a default list of perflib exporters. You can manually specify a list of providers to enable. This only enabled the "System" provider:
perflib_exporter.exe --perflib.objects=2
You can view a HTML dump of all available metrics on /dump
:
Filter by provider:
http://127.0.0.1:9432/dump?query=2
usage: perflib_exporter.exe [<flags>]
Flags:
-h, --help Show context-sensitive help (also try
--help-long and --help-man).
--telemetry.addr=":9432" host:port for perflib exporter.
--telemetry.path="/metrics"
URL path for surfacing collected metrics.
--perflib.objects=PERFLIB.OBJECTS ...
List of perflib object indices to queryBuf
(defaults to a built-in list)
--perflib.objects.add=PERFLIB.OBJECTS.ADD ...
List of perflib object indices to add to list
--perflib.objects.remove=PERFLIB.OBJECTS.REMOVE ...
List of perflib object indices to remove from
list
--perflib.objects.names=PERFLIB.OBJECTS.NAMES ...
List of perflib object names to queryBuf
--perflib.objects.names.add=PERFLIB.OBJECTS.NAMES.ADD ...
List of perflib object names to add to list
--perflib.objects.names.remove=PERFLIB.OBJECTS.NAMES.REMOVE ...
List of perflib object names to remove from list
--telemetry.auth=TELEMETRY.AUTH ...
List of valid bearer tokens. Defaults to none
(no auth)
--log.level="info" Only log messages with the given severity or
above. Valid levels: [debug, info, warn, error,
fatal]
--log.format="logger:stderr"
Set the log target and format. Example:
"logger:syslog?appname=bob&local=7" or
"logger:stdout?json=true"
--version Show application version.
-
Document which providers have been tested and are known to work well.
-
Metric names and labels are auto-generated. We might want to statically generate them for better performance and reliability (discussion needed - the name mangling could be done statically, with no name tables needed at runtime)
-
Object names are very verbose, shorten them? (Distributed Transaction Coordinator -> dtc, Security System-Wide Statistics -> security, ...)
-
perflib flags are sometimes wrong, incorrectly identifying a counter as a gauge. Do we want to leave them like that for consistency, or make a list of exceptions?
- [350] Errors Access Permissions
- [1828] SMB BranchCache Hash V2 Generation Requests
- [1262] Total Durable Handles
- [1260] Logon Total
- [4412] Total Failed Persistent Handle Reopen Count
- ...
-
Group multiple related metrics into a single one with labels ("promotion")
-
Byte conversion
- PercentFreeSpace_Base
-
-
Deal with metrics like these (both end up as "logon_total"):
- [692] Logon/sec (counter)
- [1260] Logon Total (gauge)
-
Deal with multi-value counters (base values)
- Include 100ns base timer (and document how to calculate percentages)
-
Complete the godoc
-
Remove _Total instances
-
List of objects that are officially supported and auto-generated docs
-
Document lack of filtering (we retrieve all instances anyway - use relabelling if cardinality is a concern)
-
No tests done on Windows Server 2008 and 2016 beyond basic functionality (they have a slightly different set of perflib providers)
-
The "Global" query is non-deterministic. This is either a bug in our parsing code, or in perflib (...)
-
Bearer token auth (it's hard to run an extra reverse proxy on Windows)
-
MSI installer and sensible firewall config (PowerShell examples)
-
perflib: has_instances flag
-
Comprehensive tests
-
CI/automated builds