Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VC-34401] Add metrics settings to the Helm chart #544

Merged
merged 1 commit into from
Jul 2, 2024

Conversation

wallrj
Copy link
Member

@wallrj wallrj commented Jun 28, 2024

In #341 @tfadeyi added a metrics server to the agent.
In this PR I've made the minimum viable changes to allow that metrics server to be queried by Prometheus,
when the agent is installed by Helm in a Kubernetes cluster.

  • I have chosen to only update the venafi-kubernetes-agent chart, because I believe the jetstack-secure agent is deprecated / retired.
  • I decided not to make the metrics server port configurable. In csi-driver and approver-policy etc it is configurable, to allow users to change it in case it clashes with some other sidecar container that might be injected in the pod. If it becomes necessary, we can make the port configurable in a followup PR.
  • I decided not to add any E2E tests...because there weren't any existing tests to use as examples.

🔗 FYI I recently made similar changes to cert-manager/csi-driver

Testing

  • Create cluster
kind create cluster
  • Install agent
helm upgrade venafi-kubernetes-agent ./deploy/charts/venafi-kubernetes-agent \
    --install \
    --create-namespace \
    --namespace venafi
  • Fetch metrics directly
POD_NAME=$(kubectl get pod -n venafi -l app.kubernetes.io/instance=venafi-kubernetes-agent -o jsonpath='{ .items[0].metadata.name }')
kubectl get --raw "/api/v1/namespaces/venafi/pods/${POD_NAME}:8081/proxy/metrics" | grep HELP
...
# HELP go_info Information about the Go environment.
...
# HELP process_open_fds Number of open file descriptors.
...
# HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
# HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
  • Install kube-prometheus-stack
# values.kube-prometheus-stack.yaml
alertmanager:
  enabled: false

grafana:
  enabled: true

nodeExporter:
  enabled: false

# Enable discovery of all ServiceMonitor and PodMonitor resources
# https://github.com/prometheus-community/helm-charts/issues/1911#issuecomment-1106559031
prometheus:
  prometheusSpec:
    serviceMonitorSelectorNilUsesHelmValues: false
    podMonitorSelectorNilUsesHelmValues: false
helm upgrade -i default kube-prometheus-stack \
      --repo https://prometheus-community.github.io/helm-charts \
      --install \
      --namespace prometheus \
      --create-namespace \
      --values values.kube-prometheus-stack.yaml \
      --wait
  • Enable the venafi-kubernetes-agent PodMonitor
helm upgrade venafi-kubernetes-agent ./deploy/charts/venafi-kubernetes-agent \
    --install \
    --create-namespace \
    --namespace venafi \
    --set metrics.podmonitor.enabled=true
  • Connect to Grafana and import dashboards
kubectl port-forward -n prometheus deployments/default-grafana 3000

http://localhost:3000/d/ypFZFgvmz/go-processes (username admin, password prom-operator)

Example Dashboards

To import the dashboard, go to http://localhost:3000/dashboards and "New" → "Import", and paste the following dashboard URL and click "Load":

image

@wallrj wallrj marked this pull request as draft June 28, 2024 15:55
@wallrj wallrj changed the title Add metrics settings to the Helm chart WIP: [VC-34401] Add metrics settings to the Helm chart Jun 28, 2024
@wallrj wallrj force-pushed the VC-34401-prometheus-metrics branch from a2db6c7 to ce73758 Compare June 28, 2024 15:57
Signed-off-by: Richard Wall <richard.wall@venafi.com>
@wallrj wallrj force-pushed the VC-34401-prometheus-metrics branch from ce73758 to de31f01 Compare June 28, 2024 16:46
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generated by make update-helm-docs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming the port is not strictly necessary, but adding it allows the PodMonitor (if enabled) to use the named port "http-metrics" rather than the port number.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest thinking is that we only need to provide a PodMonitor, not a ServiceMonitor.

Other cert-manager projects also provide a ServiceMonitor, but we now consider that a legacy.
Disadvantage of ServiceMonitor is that it requires a Service, which adds unnecessary complication to the chart.
And as we understand it, with a ServiceMonitor, PrometheusOperator uses the Endpoints object created by the Service to discover the targets.

The template is copied and adapted from cert-manager:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I was about to ask myself "what's the recommended choice: service monitor or pod monitor", and I read your comment. Nice proactive self-reviewing!

@wallrj wallrj changed the title WIP: [VC-34401] Add metrics settings to the Helm chart [VC-34401] Add metrics settings to the Helm chart Jun 28, 2024
@wallrj wallrj marked this pull request as ready for review June 28, 2024 17:00
@wallrj wallrj requested a review from maelvls June 28, 2024 17:01
Copy link
Contributor

@hawksight hawksight left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't fully tested this install however I have validated all the resources look ok manually and with kubeconform:

helm template deploy/charts/venafi-kubernetes-agent --values deploy/charts/venafi-kubernetes-agent/tests/values/custom-volumes.yaml --set config.organisation="test" --set config.cluster="test" --set metrics.podmonitor.enabled=true | kubeconform -verbose -schema-location default -schema-location 'https://raw.githubusercontent.com/datreeio/CRDs-catalog/main/{{.Group}}/{{.ResourceKind}}_{{.ResourceAPIVersion}}.json'

Good documentation 👍

Copy link
Member

@maelvls maelvls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've read the changes and manually reproduced the tests with a kind cluster. I was able to get to the same results; i've taken the liberty of fixing some of the commands (adding -i to helm upgrade, and what's the username and password for grafana, and how to import the dashboard). Thank you for the self-review comments too, it makes it such a pleasure to read.

* Go collector: via the [default registry](https://github.com/prometheus/client_golang/blob/34e02e282dc4a3cb55ca6441b489ec182e654d59/prometheus/registry.go#L60-L63) in Prometheus client_golang.
* Process collector: via the [default registry](https://github.com/prometheus/client_golang/blob/34e02e282dc4a3cb55ca6441b489ec182e654d59/prometheus/registry.go#L60-L63) in Prometheus client_golang.
* Agent metrics:
* `data_readings_upload_size`: Data readings upload size (in bytes) sent by the jscp in-cluster agent.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: what's "jscp"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it is "Jetstack Secure Control Plane". I copied that line from the metric description:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I was about to ask myself "what's the recommended choice: service monitor or pod monitor", and I read your comment. Nice proactive self-reviewing!

@wallrj wallrj merged commit a385696 into jetstack:master Jul 2, 2024
4 checks passed
@wallrj wallrj deleted the VC-34401-prometheus-metrics branch July 2, 2024 08:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants