Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addon: Use Hostprocess for Windows Exporter #2048

Conversation

dineshsharmads
Copy link
Contributor

@dineshsharmads dineshsharmads commented Mar 19, 2023

Description

This PR adds an addon windows-hostprocess.libsonnet which deploys and configures Windows Exporter as a hostprocess pod on all Windows nodes dynamically.

The existing addon windows.libsonnet requires a static list of node IPs to be provided.

This deploys a daemonset and podmonitor . It basically implements the setup mentioned in windows_exporter on Kubernetes with some minor fixes. (enable memory collector, relabel node name, use windowsExporterSelector instead of wmiExporterSelector).

I have a few AKS clusters with Windows nodes where I have tested the addon and dashboards.

Addresses #1627

Type of change

  • CHANGE (fix or feature that would cause existing functionality to not work as expected)
  • FEATURE (non-breaking change which adds functionality)
  • BUGFIX (non-breaking change which fixes an issue)
  • ENHANCEMENT (non-breaking change which improves existing functionality)
  • NONE (if none of the other choices apply. Example, tooling, build system, CI, docs, etc.)

Changelog entry

Add Windows support using Hostprocess instead of static_configs

@philipgough
Copy link
Contributor

@dineshsharmads - sorry for the lag on this, do you mind rebasing and I can trigger CI?

@github-actions github-actions bot removed the stale label May 26, 2023
Dinesh Sharma added 2 commits May 31, 2023 14:50
This allows for Windows Exporter to be deployed and configured
dynamically without requiring to specify targets manually for each
Windows node

Documentation and example added
Fix some performance issues.
- Specify resource limits for windows exporter.
- Allow for scrape timeout and interval to be configured. Depending
on how many pods are running on a node it can take from 500ms to 15s
to scrape metrics from node. Default timeout is 10s.
- Allow for enabled collectors to be configured.
- Only enable collectors that are being used in rules and dashboards.
@dineshsharmads
Copy link
Contributor Author

@philipgough no worries, I have rebased.

docs/windows.md Outdated
@@ -21,3 +21,20 @@ local kp = (import 'kube-prometheus/main.libsonnet') +
```

[Containerd](https://github.com/prometheus-community/windows_exporter/blob/master/kubernetes/kubernetes.md) version can run as pod.

```
local kp = (import 'kube-prometheus/main.libsonnet') +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we update the docs above that say Currently, Docker based Windows does not support running with [windows_exporter](https://github.com/prometheus-community/windows_exporter) in a pod so this add on uses [additional scrape configuration](https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/additional-scrape-config.md) to set up a static config to scrape the node ports where windows_exporter is configured.?

I would also lean towards moving this up in the docs and making it the default over the static pod configuration. We should drop the static pod configuration in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I have updated the doc to reflect both configs and moved hostprocess config up.

image:: error 'must provide version',
resources:: {
requests: { cpu: '300m', memory: '200Mi' },
limits: { memory: '200Mi' },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how did you choose these vaules? They might be a bit low for Windows?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm running it from my fork in few windows clusters.
I see the usage as below. The scrape duration that I'm getting for windows exporters is between 400-700ms.

NAME CPU(cores) MEMORY(bytes)
windows-exporter-2lw7j 24m 31Mi
windows-exporter-8vxc9 22m 31Mi
windows-exporter-bbpc8 17m 31Mi
windows-exporter-c7qgw 22m 31Mi
windows-exporter-gmq4p 15m 29Mi
windows-exporter-r4zbk 21m 30Mi
windows-exporter-r7kms 17m 29Mi
windows-exporter-rhrfn 19m 30Mi
windows-exporter-t2r9l 21m 29Mi
windows-exporter-tb699 19m 32Mi
windows-exporter-tmbz2 19m 30Mi
windows-exporter-xk8n6 21m 32Mi

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that idle or with pods running on the system? Just want to make sure we aren't too low out of the box, I've seen other windows "system" pods hit OOM limits when under load. Based on what you shared, this looks reasonable to start with.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a non-prod cluster but it has 60 pods per e8s node. I just deployed it in our prod cluster which is running 35 pods per node but has higher CPU utilization. And I see similar numbers. I'm guessing since it is a host process pod it does not spin up other system services per pod (csrss,svchost etc). So the utilization is just for windows_exporter process.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing since it is a host process pod it does not spin up other system services per pod (csrss,svchost etc). So the utilization is just for windows_exporter process.

This makes sense. Thanks for verifying!

Copy link
Contributor

@jsturtevant jsturtevant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

docs/windows.md Outdated Show resolved Hide resolved
@jsturtevant
Copy link
Contributor

/assign @philipgough

@philipgough
Copy link
Contributor

philipgough commented Jun 12, 2023

Thanks for the review @jsturtevant

@dineshsharmads Im no windows expert, but the change looks fine. I'll need to fix the broken CI and we can merge.

@philipgough philipgough merged commit dc0ad5e into prometheus-operator:main Jun 12, 2023
7 of 10 checks passed
@philipgough
Copy link
Contributor

Actually, will merge this first to avoid you having to circle back on it. Thanks

@dineshsharmads
Copy link
Contributor Author

Thanks @philipgough and @jsturtevant.

@dineshsharmads dineshsharmads deleted the hostprocess-windows-exporter branch June 13, 2023 02:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants