You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Configure the prometheus addon to scape Typhoon etcd targets on controller nodes. Then, metrics from etcd will be available in Prometheus. Alert rules for etcd will fire during incidents. The etcd dashboard provided with the grafana addon will be populated.
Invariants:
Users still only need to choose to kubectl apply the addon manifests. Nothing more.
Users never need to fiddle with listing etcd nodes on any platform.
Background
The prometheus addon manifests setup Prometheus 2.1 (#113) to scape apiservers, kubelets, services, endpoints, cAdvisor, and exporters (kube-state-metrics and node_exporter). Alerting rules and Grafana graphs in addons correspond to these metrics. However, etcd rules and graphs currently aren't active/populated.
Situation
Prometheus's can be configured (via the ConfigMap) to scrape the secured :2379/metrics endpoints of etcd nodes just like any other target. The etcd cluster runs on-host, across controllers with systemd, it is a lower-level component on which Kubernetes relies (not atop k8s), and it handles its own client authentication already.
Typhoon runs etcd on-host, across controllers, on all platforms
Typhoon requires etcd be setup with TLS on all platforms
Typhoon creates etcd client certs, but only places them on controller nodes
To perform the scrapes, Prometheus needs the etcd client certificates to write the tls_configsection in a new scape job.
Options
Add etcd client materials in a kube-system secret. We did this back when self-hosted etcd was explored.
Pro: Allows prometheus pod to be scheduled on any node
Con: Opens up the possibility of escalation attacks (i.e. read kube-system secrets == read everything)
Mount the etcd client materials from a controller host. (current most viable)
Pro: Avoid keeping etcd client materials in a Kubernetes secret
Con: Restricts prometheus pod itself to run on controller nodes
Explore whether its possible to create (or invent) "metrics-only" etcd certificates
Likely not on the roadmap
Metrics whitelist proxy
Con: I use some whitelist proxies for some internal things. They're gross though.
The text was updated successfully, but these errors were encountered:
I have an example Prometheus config for scraping etcd that works alright, but needs some modifications:
etcd TLS certs have user:group etcd:etcd on host which isn't readable within the prometheus pod (which runs as nobody) without modification. fsGroup doesn't apply to hostPath volumes.
Filtering out workers (etcd always runs on Typhoon controllers) in Prometheus relabel_config is troublesome because Kubernetes controllers and workers are labeled:
node-role.kubernetes.io/master=""
node-role.kubernetes.io/node=""
Relabel matching doesn't seem able to distinguish between a label being present vs not, as "" means the same as the label not being present.
The first issue can be addressed by adapting the etcd TLS ownership. The second can be addressed by adding an additional controller label that has a key and value or finding a better Prometheus relabel trick.
Feature
Configure the
prometheus
addon to scape Typhoon etcd targets on controller nodes. Then, metrics from etcd will be available in Prometheus. Alert rules for etcd will fire during incidents. The etcd dashboard provided with thegrafana
addon will be populated.Invariants:
kubectl apply
the addon manifests. Nothing more.Background
The
prometheus
addon manifests setup Prometheus 2.1 (#113) to scape apiservers, kubelets, services, endpoints, cAdvisor, and exporters (kube-state-metrics and node_exporter). Alerting rules and Grafana graphs in addons correspond to these metrics. However, etcd rules and graphs currently aren't active/populated.Situation
Prometheus's can be configured (via the ConfigMap) to scrape the secured
:2379/metrics
endpoints of etcd nodes just like any other target. The etcd cluster runs on-host, across controllers with systemd, it is a lower-level component on which Kubernetes relies (not atop k8s), and it handles its own client authentication already.To perform the scrapes, Prometheus needs the etcd client certificates to write the
tls_config
section in a new scape job.Options
kube-system
secret. We did this back when self-hosted etcd was explored.The text was updated successfully, but these errors were encountered: