Add documentation about controller metrics

kubernetes · Jun 26, 2017 · 99ff364 · 99ff364
1 parent 628e88e
commit 99ff364
Show file tree

Hide file tree

Showing 2 changed files with 49 additions and 0 deletions.
diff --git a/_data/concepts.yml b/_data/concepts.yml
@@ -82,6 +82,7 @@ toc:
   - docs/concepts/cluster-administration/authenticate-across-clusters-kubeconfig.md
   - docs/concepts/cluster-administration/master-node-communication.md
   - docs/concepts/cluster-administration/proxies.md
+  - docs/concepts/cluster-administration/controller-metrics.md
   - title: Policies
     section:
     - docs/concepts/policy/container-capabilities.md

diff --git a/docs/concepts/cluster-administration/controller-metrics.md b/docs/concepts/cluster-administration/controller-metrics.md
@@ -0,0 +1,48 @@
+---
+title: Controller manager metrics
+---
+
+{% capture overview %}
+Controller manager metrics provide important insight into the performance and health of
+the controller manager.
+
+{% endcapture %}
+
+{% capture body %}
+## What are controller manager metrics
+
+Controller manager metrics provide important insight into the performance and health of the controller manager.
+These metrics include common Go language runtime metrics such as go_routine count and controller specific metrics such as
+etcd request latencies or Cloudprovider (AWS, GCE, Openstack) API latencies that can be used
+to gauge the health of a cluster.
+
+Starting from Kubernetes 1.7, detailed Cloudprovider metrics are available for storage operations for GCE, AWS, Vsphere and Openstack.
+These metrics can be used to monitor health of persistent volume operations.
+
+For example, for GCE these metrics are called:
+
+```
+cloudprovider_gce_api_request_duration_seconds { request = "instance_list"}
+cloudprovider_gce_api_request_duration_seconds { request = "disk_insert"}
+cloudprovider_gce_api_request_duration_seconds { request = "disk_delete"}
+cloudprovider_gce_api_request_duration_seconds { request = "attach_disk"}
+cloudprovider_gce_api_request_duration_seconds { request = "detach_disk"}
+cloudprovider_gce_api_request_duration_seconds { request = "list_disk"}
+```
+
+
+
+## Configuration
+
+
+In a cluster, controller-manager metrics are available from `http://localhost:10252/metrics`
+from the host where the controller-manager is running.
+
+The metrics are emitted in [prometheus format](https://prometheus.io/docs/instrumenting/exposition_formats/) and are human readable.
+
+In a production environment you may want to configure prometheus or some other metrics scraper
+to periodically gather these metrics and make them available in some kind of time series database.
+
+{% endcapture %}
+
+{% include templates/concept.md %}