Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Default Grafana Dashboards #207

Closed
mjnagel opened this issue Feb 28, 2024 · 11 comments
Closed

Add Default Grafana Dashboards #207

mjnagel opened this issue Feb 28, 2024 · 11 comments
Labels
enhancement New feature or request monitoring Issues related to monitoring components / resources

Comments

@mjnagel
Copy link
Contributor

mjnagel commented Feb 28, 2024

Is your feature request related to a problem? Please describe.

Currently UDS Core deploys without many/any default dashboards in grafana. It would be beneficial if I could see some dashboards out of the box for basic information.

Describe the solution you'd like

Based on user feedback, add minimal, clean dashboards that are valuable to end users.

Describe alternatives you've considered

End user is able to create these dashboards themselves, but this could be difficult in airgap to pull from remote dashboards, etc.

@mjnagel mjnagel added the enhancement New feature or request label Feb 28, 2024
@mjnagel
Copy link
Contributor Author

mjnagel commented Feb 28, 2024

@blancharda @ntwkninja @docandrew would be great to get some insight on what is valuable to you all as end users. I have heard this list before:

  • cluster health (size, capacity, node status etc)
  • resource usage (cpu/mem/storage/network at both the cluster level, and for individual workloads)
  • telemetry / latency data (istio/tempo?)
  • UDS package status (overlap with UDS Engine likely)
  • NeuVector (potentially, depending on enforcement mode and what this looks like)

Not sure if there are other valuable pieces.

@docandrew
Copy link

Storage/PV use is definitely critical for us. Even though the Elastic stack isn't part of UDS Core, having dashboards available for when Elastic is deployed alongside UDS Core would be a valuable thing for us as well: https://grafana.com/grafana/dashboards/878-elasticsearch-dashboard/

@mjnagel
Copy link
Contributor Author

mjnagel commented Feb 28, 2024

@docandrew storage/PV is a good callout - are there dashboards you're currently using for that (published in grafana's site or otherwise).

I don't think we'd want to include elastic dashboards in uds-core, but we do already enable auto-adding dashboards from a configmap (see this example from loki that gets pulled in here). That might be something where your separate zarf package for elastic could include a configmap similarly to load those into grafana. We'd likely take a similar approach with other UDS Packages like gitlab, etc - if dashboards are needed for those they would be in those specific zarf packages rather than core. Helps to keep the core baseline slimmer and keep us from adding lots of conditional pieces based on what you deploy on top.

@docandrew
Copy link

We've run into issues with whether Grafana is using the "sidecar provider" for dashboards vs auto-adding others from configmaps. We'll just need to make sure that the configmaps for user-added dashboards have the correct annotation so the sidecar provider can pick those up (if that's how its being used)

@docandrew
Copy link

I can't speak to specific dashboards just yet that we're using to monitor storage, but will try and dig a bit to see what's useful.

@mjnagel
Copy link
Contributor Author

mjnagel commented Apr 29, 2024

Updating on current status - #256 introduced some of the default dashboards from the upstream chart. That should address some of the key asks for:

  • cluster health (size, capacity, node status etc)
  • resource usage (cpu/mem/storage/network at both the cluster level, and for individual workloads)

I think I'm going to let that one roll out and see if we can solicit feedback on other things people may be looking for before introducing others.

@mjnagel mjnagel added the monitoring Issues related to monitoring components / resources label Jul 2, 2024
@mjnagel
Copy link
Contributor Author

mjnagel commented Jul 2, 2024

Haven't heard any clear feedback yet - @blancharda and @docandrew have you all had a chance to deploy and see if you find any dashboards lacking? I know there's some changes coming with UDS Engine to provide policy + package dashboarding so don't believe we have plans to add those two pieces to Grafana.

@docandrew
Copy link

I haven't had the opportunity to look again - will try redeploying it and poking around as soon as I get some spare cycles, thanks for all the work on this!

@blancharda
Copy link
Contributor

I'm fairly satisfied in terms of dashboard content for the moment -- if anything there are more than we probably need.
The ones I use most frequently are definitely the compute resource dashboards for cluster and namespace (pod).

The networking info is nice to have when troubleshooting, and I'm sure the Loki dashboards will be useful as we attempt to tune/size our installation -- but we could probably narrow down the list in all categories.

I would note that we run into resource issues pretty frequently though. Some amount of it is obviously environment specific.. but it still may be worth bumping the defaults for prom and Grafana.

@mjnagel
Copy link
Contributor Author

mjnagel commented Jul 8, 2024

Going to tentatively close this ticket out, if anyone comes across new needs or asks to remove dashboards feel free to open follow-ons and link this original issue. Also would welcome a specific issue on that resource problem @blancharda - I think we've encountered some issues with prometheus in our staging environment so that one definitely seems like a good first one to bump up.

@mjnagel mjnagel closed this as completed Jul 8, 2024
@blancharda
Copy link
Contributor

Tossed up #551 to start the discussion 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request monitoring Issues related to monitoring components / resources
Projects
None yet
Development

No branches or pull requests

3 participants