[VC-34401] Add metrics settings to the Helm chart #544

wallrj · 2024-06-28T15:55:07Z

In #341 @tfadeyi added a metrics server to the agent.
In this PR I've made the minimum viable changes to allow that metrics server to be queried by Prometheus,
when the agent is installed by Helm in a Kubernetes cluster.

I have chosen to only update the venafi-kubernetes-agent chart, because I believe the jetstack-secure agent is deprecated / retired.
I decided not to make the metrics server port configurable. In csi-driver and approver-policy etc it is configurable, to allow users to change it in case it clashes with some other sidecar container that might be injected in the pod. If it becomes necessary, we can make the port configurable in a followup PR.
I decided not to add any E2E tests...because there weren't any existing tests to use as examples.

🔗 FYI I recently made similar changes to cert-manager/csi-driver

[VC-34401] Add Prometheus metrics endpoint cert-manager/csi-driver#271

Testing

Create cluster

kind create cluster

Install agent

helm upgrade venafi-kubernetes-agent ./deploy/charts/venafi-kubernetes-agent \
    --install \
    --create-namespace \
    --namespace venafi

Fetch metrics directly

POD_NAME=$(kubectl get pod -n venafi -l app.kubernetes.io/instance=venafi-kubernetes-agent -o jsonpath='{ .items[0].metadata.name }')
kubectl get --raw "/api/v1/namespaces/venafi/pods/${POD_NAME}:8081/proxy/metrics" | grep HELP

...
# HELP go_info Information about the Go environment.
...
# HELP process_open_fds Number of open file descriptors.
...
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.

Install kube-prometheus-stack

# values.kube-prometheus-stack.yaml
alertmanager:
  enabled: false

grafana:
  enabled: true

nodeExporter:
  enabled: false

# Enable discovery of all ServiceMonitor and PodMonitor resources
# https://github.com/prometheus-community/helm-charts/issues/1911#issuecomment-1106559031
prometheus:
  prometheusSpec:
    serviceMonitorSelectorNilUsesHelmValues: false
    podMonitorSelectorNilUsesHelmValues: false

helm upgrade -i default kube-prometheus-stack \
      --repo https://prometheus-community.github.io/helm-charts \
      --install \
      --namespace prometheus \
      --create-namespace \
      --values values.kube-prometheus-stack.yaml \
      --wait

Enable the venafi-kubernetes-agent PodMonitor

helm upgrade venafi-kubernetes-agent ./deploy/charts/venafi-kubernetes-agent \
    --install \
    --create-namespace \
    --namespace venafi \
    --set metrics.podmonitor.enabled=true

Connect to Grafana and import dashboards

kubectl port-forward -n prometheus deployments/default-grafana 3000

http://localhost:3000/d/ypFZFgvmz/go-processes (username admin, password prom-operator)

Example Dashboards

To import the dashboard, go to http://localhost:3000/dashboards and "New" → "Import", and paste the following dashboard URL and click "Load":

https://grafana.com/grafana/dashboards/6671-go-processes/

Signed-off-by: Richard Wall <richard.wall@venafi.com>

wallrj · 2024-06-28T16:48:32Z

deploy/charts/venafi-kubernetes-agent/README.md

Generated by make update-helm-docs.

wallrj · 2024-06-28T16:49:48Z

deploy/charts/venafi-kubernetes-agent/templates/deployment.yaml

Naming the port is not strictly necessary, but adding it allows the PodMonitor (if enabled) to use the named port "http-metrics" rather than the port number.

misleading comments in container.Ports kubernetes/kubernetes#108255

https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/pod-v1/#ports

wallrj · 2024-06-28T16:51:33Z

deploy/charts/venafi-kubernetes-agent/templates/podmonitor.yaml

The latest thinking is that we only need to provide a PodMonitor, not a ServiceMonitor.

Other cert-manager projects also provide a ServiceMonitor, but we now consider that a legacy.
Disadvantage of ServiceMonitor is that it requires a Service, which adds unnecessary complication to the chart.
And as we understand it, with a ServiceMonitor, PrometheusOperator uses the Endpoints object created by the Service to discover the targets.

The template is copied and adapted from cert-manager:

https://github.com/cert-manager/cert-manager/blob/master/deploy/charts/cert-manager/templates/podmonitor.yaml

Thanks. I was about to ask myself "what's the recommended choice: service monitor or pod monitor", and I read your comment. Nice proactive self-reviewing!

hawksight

I haven't fully tested this install however I have validated all the resources look ok manually and with kubeconform:

helm template deploy/charts/venafi-kubernetes-agent --values deploy/charts/venafi-kubernetes-agent/tests/values/custom-volumes.yaml --set config.organisation="test" --set config.cluster="test" --set metrics.podmonitor.enabled=true | kubeconform -verbose -schema-location default -schema-location 'https://raw.githubusercontent.com/datreeio/CRDs-catalog/main/{{.Group}}/{{.ResourceKind}}_{{.ResourceAPIVersion}}.json'

Good documentation 👍

maelvls

I've read the changes and manually reproduced the tests with a kind cluster. I was able to get to the same results; i've taken the liberty of fixing some of the commands (adding -i to helm upgrade, and what's the username and password for grafana, and how to import the dashboard). Thank you for the self-review comments too, it makes it such a pleasure to read.

maelvls · 2024-07-01T12:42:18Z

README.md

+ * Go collector: via the [default registry](https://github.com/prometheus/client_golang/blob/34e02e282dc4a3cb55ca6441b489ec182e654d59/prometheus/registry.go#L60-L63) in Prometheus client_golang.
+ * Process collector: via the [default registry](https://github.com/prometheus/client_golang/blob/34e02e282dc4a3cb55ca6441b489ec182e654d59/prometheus/registry.go#L60-L63) in Prometheus client_golang.
+ * Agent metrics:
+  * `data_readings_upload_size`: Data readings upload size (in bytes) sent by the jscp in-cluster agent.


nit: what's "jscp"?

I suppose it is "Jetstack Secure Control Plane". I copied that line from the metric description:

https://github.com/search?q=repo%3Ajetstack%2Fjetstack-secure%20jscp%20&type=code

maelvls · 2024-07-01T12:45:10Z

deploy/charts/venafi-kubernetes-agent/templates/podmonitor.yaml

Thanks. I was about to ask myself "what's the recommended choice: service monitor or pod monitor", and I read your comment. Nice proactive self-reviewing!

wallrj marked this pull request as draft June 28, 2024 15:55

wallrj changed the title ~~Add metrics settings to the Helm chart~~ WIP: [VC-34401] Add metrics settings to the Helm chart Jun 28, 2024

wallrj force-pushed the VC-34401-prometheus-metrics branch from a2db6c7 to ce73758 Compare June 28, 2024 15:57

Add metrics settings to the Helm chart

de31f01

Signed-off-by: Richard Wall <richard.wall@venafi.com>

wallrj force-pushed the VC-34401-prometheus-metrics branch from ce73758 to de31f01 Compare June 28, 2024 16:46

wallrj commented Jun 28, 2024

View reviewed changes

wallrj changed the title ~~WIP: [VC-34401] Add metrics settings to the Helm chart~~ [VC-34401] Add metrics settings to the Helm chart Jun 28, 2024

wallrj marked this pull request as ready for review June 28, 2024 17:00

wallrj requested a review from maelvls June 28, 2024 17:01

hawksight approved these changes Jul 1, 2024

View reviewed changes

maelvls approved these changes Jul 1, 2024

View reviewed changes

wallrj merged commit a385696 into jetstack:master Jul 2, 2024
4 checks passed

wallrj deleted the VC-34401-prometheus-metrics branch July 2, 2024 08:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VC-34401] Add metrics settings to the Helm chart #544

[VC-34401] Add metrics settings to the Helm chart #544

wallrj commented Jun 28, 2024 •

edited by maelvls

Loading

wallrj Jun 28, 2024

wallrj Jun 28, 2024

wallrj Jun 28, 2024

maelvls Jul 1, 2024

hawksight left a comment

maelvls left a comment

maelvls Jul 1, 2024

wallrj Jul 2, 2024

maelvls Jul 1, 2024

[VC-34401] Add metrics settings to the Helm chart #544

[VC-34401] Add metrics settings to the Helm chart #544

Conversation

wallrj commented Jun 28, 2024 • edited by maelvls Loading

Testing

Example Dashboards

wallrj Jun 28, 2024

Choose a reason for hiding this comment

wallrj Jun 28, 2024

Choose a reason for hiding this comment

wallrj Jun 28, 2024

Choose a reason for hiding this comment

maelvls Jul 1, 2024

Choose a reason for hiding this comment

hawksight left a comment

Choose a reason for hiding this comment

maelvls left a comment

Choose a reason for hiding this comment

maelvls Jul 1, 2024

Choose a reason for hiding this comment

wallrj Jul 2, 2024

Choose a reason for hiding this comment

maelvls Jul 1, 2024

Choose a reason for hiding this comment

wallrj commented Jun 28, 2024 •

edited by maelvls

Loading