Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Produce metrics for consumers #1162

Closed
jcpunk opened this issue Apr 19, 2023 · 8 comments · Fixed by #1242
Closed

Produce metrics for consumers #1162

jcpunk opened this issue Apr 19, 2023 · 8 comments · Fixed by #1242
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Milestone

Comments

@jcpunk
Copy link
Contributor

jcpunk commented Apr 19, 2023

What would you like to be added:
Some telemetry information about the daemons running. I'm not sure what metrics would make sense for this daemonset - perhaps the flag stats on the node? But having something I can monitor is handy.

Why is this needed:
I generally don't like to deploy things I can't monitor. Having native telemetry available may also help troubleshoot oddities in the wild.

@jcpunk jcpunk added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 19, 2023
@marquiz
Copy link
Contributor

marquiz commented Apr 24, 2023

Thanks @jcpunk for the suggestion. I think this would definitely be a worthy enhancement.

We would just need to understand what kind of metrics to advertise 🧐 On the nfd-master side we could have some counters and/or gauges identifying e.g. the number of nodes updated, CRDs processed etc. We could also expose the time spent on updating nodes on the nfd-master side and the time spent in doing the feature discovery on the nfd-worker side.

ping @ArangoGutierrez @PiotrProkop @zvonkok @Tal-or any thoughts?

@ArangoGutierrez
Copy link
Contributor

On worker side, maybe advertise if the worker can't reach master, even tho this is logged already.

@ArangoGutierrez
Copy link
Contributor

/assign

@marquiz
Copy link
Contributor

marquiz commented May 11, 2023

/milestone v0.14

@k8s-ci-robot k8s-ci-robot added this to the v0.14 milestone May 11, 2023
@marquiz
Copy link
Contributor

marquiz commented Jun 5, 2023

@ArangoGutierrez is working on an implementation.

@jcpunk @PiotrProkop @zvonkok @fmuyassarov @Tal-or any wishes/thoughts on which kind of metrics to expose. I listed some above

@ArangoGutierrez
Copy link
Contributor

Getting there
image

@jcpunk
Copy link
Contributor Author

jcpunk commented Jun 5, 2023

I'm not really sure what metrics would be useful...

@Tal-or
Copy link
Contributor

Tal-or commented Jun 6, 2023

@marquiz on the topology-updater side we can expose the following:

  1. Number of updates done to NRT objects, each update increase the counter by one.
  2. Number of PodresourceAPI calls, each call increases the counter by one.
  3. We can also expose the time the update took for each node

Adding @swatisehgal and @ffromani as they might have more ideas on the subject

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants