-
Notifications
You must be signed in to change notification settings - Fork 433
Statistics
- Use UCX with statistics support compiled-in (
./configure --enable-stats ...
or./contrib/configure-prof
) - Pass env vars:
UCX_STATS_DEST=stdout
or something likeUCX_STATS_DEST=file:/tmp/ucx_%h_%p.stat
- Run the application and statistics will be generated on exit
#
# Destination to send statistics to. If the value is empty, statistics are
# not reported. Possible values are:
# udp:<host>[:<port>] - send over UDP to the given host:port.
# stdout - print to standard output.
# stderr - print to standard error.
# file:<filename>[:bin] - save to a file (%h: host, %p: pid, %c: cpu, %t: time, %u: user, %e: exe)
#
# Syntax: string
#
UCX_STATS_DEST=
#
# Trigger to dump statistics:
# exit - dump just before program exits.
# signal:<signo> - dump when process is signaled.
# timer:<interval> - dump in specified intervals (in seconds).
#
# Syntax: string
#
UCX_STATS_TRIGGER=exit
#
# Used for filter counters summary.
# Comma-separated list of glob patterns specifying counters.
# Statistics summary will contain only the matching counters.
# The order is not meaningful.
# Each expression in the list may contain any of the following wildcard:
# * - matches any number of any characters including none.
# ? - matches any single character.
# [abc] - matches one character given in the bracket.
# [a-z] - matches one character from the range given in the bracket.
#
# Syntax: comma-separated list of: string
#
UCX_STATS_FILTER=*
#
# Statistics format parameter:
# full - each counter will be displayed in a separate line
# agg - like full but there will also be an aggregation between similar counters
# summary - all counters will be printed in the same line.
#
# Syntax: [full|agg|summary]
#
UCX_STATS_FORMAT=full
Throughout the code there are counting points. The counters are divided into classes. The classes are arranged in a hierarchy. An example of classes and their relation maybe:
ucp_worker->uct_iface->uct_ep->rc_fc
For example the group uct_ep contains the counters:am, put, get, atomic, bytes_short, bytes_bcopy, bytes_zcopy, no_res, flush, flush_wait.
The counters may be printed in two ways: full report and summary. In full report mode all classes and their counters will be printed. The user may specify the subset of the counters to be printed, either as a list of counters or as a list of regular expressions (globing). The result will be a single line. For example if the user specified the following
list:=*copy*,*eager*
then the result will look like:
[elrond1:13966] ucp_worker{rx_eager_msg:10000 rx_eager_chunk_exp:1670000 rx_eager_chunk_unexp:0} ucp_ep{tx_eager:10000 tx_eager_sync:0} uct_ep{bytes_bcopy:10253440130 uct_ep.bytes_zcopy:0}
Each counter will be an accumulation of all instances within its class. For example: uct_ep.bytes_bcopy has 2 instances in:
ucp_worker-0x6aeb90:
uct_iface-mlx5_0:1-0x6b4760:
uct_ep-0x7289d0:
bytes_bcopy: 10253440000
uct_iface-mlx5_0:1-0x716020:
uct_ep-0x732a30:
bytes_bcopy: 130
The list of counters or regular expressions is defined in the UCX_STATS_FILTER environment variable. If UCX_STATS_FILTER=* then full report will be provided. Otherwise a summary.