Kiam can exports both Prometheus metrics to determine the health of the
system, check the timing of each RPC call, and monitor the size of the
credentials cache. By default, Prometheus metrics are exported on localhost:9620
.
A example Grafana dashboard with Prometheus as datasource is provided here, it displays the basic metrics and includes daemonset status from kube-state-metrics & container metrics from cAdvisor if available.
- The
prometheus-listen-addr
controls which address Kiam should create a Prometheus endpoint on. This is by defaultlocalhost:9620
. The metrics themselves can be accessed at<prometheus-listen-addr>/metrics
. - The
prometheus-sync-interval
flag controls how frequently Prometheus metrics should be updated. This is by default5s
.
kiam_metadata_handler_latency_seconds
- Bucketed histogram of handler timings. Tagged by handlerkiam_metadata_credential_fetch_errors_total
- Number of errors fetching the credentials for a podkiam_metadata_credential_encode_errors_total
- Number of errors encoding credentials for a podkiam_metadata_find_role_errors_total
- Number of errors finding the role for a podkiam_metadata_empty_role_total
- Number of empty roles returnedkiam_metadata_success_total
- Number of successful responses from a handlerkiam_metadata_responses_total
- Responses from mocked out metadata handlerskiam_metadata_proxy_requests_blocked_total
- Number of access requests to the proxy handler that were blocked by the regexp
kiam_sts_cache_hit_total
- Number of cache hits to the metadata cachekiam_sts_cache_miss_total
- Number of cache misses to the metadata cachekiam_sts_issuing_errors_total
- Number of errors issuing credentialskiam_sts_assumerole_timing_seconds
- Bucketed histogram of assumeRole timingskiam_sts_assumerole_current
- Number of assume role calls currently executing
kiam_k8s_dropped_pods_total
- Number of dropped pods because of full buffer
grpc_server_handled_total
- Total number of RPCs completed on the server, regardless of success or failure.grpc_server_msg_received_total
- Total number of RPC stream messages received on the server.grpc_server_msg_sent_total
- Total number of gRPC stream messages sent by the server.grpc_server_started_total
- Total number of RPCs started on the server.
grpc_client_handled_total
- Total number of RPCs completed by the client, regardless of success or failure.grpc_client_msg_received_total
- Total number of RPC stream messages received by the client.grpc_client_msg_sent_total
- Total number of gRPC stream messages sent by the client.grpc_client_started_total
- Total number of RPCs started on the client.