Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows modular observ lib #1050

Merged
merged 76 commits into from
Oct 18, 2023
Merged

Windows modular observ lib #1050

merged 76 commits into from
Oct 18, 2023

Conversation

v-zhuravlev
Copy link
Contributor

@v-zhuravlev v-zhuravlev commented Aug 30, 2023

cd windows-mixin
grr apply mixin.libsonnet

Checklist:

  • Fleet dashboard
    • Add drill down and data links to other dashes
  • Overview dashboard
  • Networking dashboard (WONT DO, don't have many network metrics)
  • Disks dashboard
  • Memory dash (WONT DO, requires additional collector)
  • cpu and system dash
    • NTP and time metrics (enable time collector)
  • Logs dashboard
    • Hide under feature flag (EnableLokiLogs)
  • Alerts:
    • Linux parity: Time alerts (no ntp, offset too big)
    • Physical disk status (enable disk module) (enable disk module)
    • Linux parity: add disk predict alert
  • Extras:
    • Annotations: Reboot,failed service, critical event
    • Logs scrape snippet
    • Add descriptions to panels

Changelog:

  • Create three-tier dashboards (Fleet->Overview->Advanced drill down, including logs), similar to node exporter dashboards
  • Introduced modular observ lib structure, that should make this lib highly customizable and reusable
  • Increase pending period for alerts to 15m to match linux
  • Add disk status alert
  • Add common annotations: reboot,failedService,criticalSystemEvent

Examples:
Fleet dash:
image
Overview dash:
image
Logs dash:
image
Disks dash:
image

@v-zhuravlev v-zhuravlev requested a review from rgeyer September 6, 2023 18:31
@v-zhuravlev v-zhuravlev force-pushed the vzhuravlev/jl-windows-new branch from 62d9e89 to ddacc8a Compare September 22, 2023 17:26
@v-zhuravlev v-zhuravlev marked this pull request as ready for review September 22, 2023 17:26
@v-zhuravlev v-zhuravlev changed the title Windows modular mixin Windows modular observ lib Sep 22, 2023
@v-zhuravlev v-zhuravlev force-pushed the vzhuravlev/jl-windows-new branch 5 times, most recently from 9446224 to 26b0127 Compare September 27, 2023 18:29
@v-zhuravlev v-zhuravlev force-pushed the vzhuravlev/jl-windows-new branch from 235438d to b2e76d2 Compare October 3, 2023 14:52
@gaantunes gaantunes force-pushed the vzhuravlev/jl-windows-new branch from 5e6a8be to c961582 Compare October 4, 2023 20:48
Copy link
Contributor

@rgeyer rgeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. A few nits here and there.

Mostly, I think we should get the dependencies reviewed/merged before this is merged. What do you think?

}
},
"version": "master"
"version": "vzhuravlev/jl-windows-new"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll want to have a follow-up PR to change this to master, or perhaps you can reference the specific commit SHA. I believe the branch usually gets deleted on merge.

Comment on lines 3 to 4
// TODO: add scheduled tasks failed alerts,
// time alerts, ntp delay alerts, disk running out of space predict alerts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like these are all done, this can be removed?

Suggested change
// TODO: add scheduled tasks failed alerts,
// time alerts, ntp delay alerts, disk running out of space predict alerts

Comment on lines +81 to +80
// panels.systemThreads,
// panels.systemExceptions,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a problem with these? Should they also be removed from the windows-observ-lib?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this get discussed? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check and make follow-up PR.

"subdir": "common-lib"
}
},
"version": "vzhuravlev/common-lib"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this get merged first? Given it's a requirement, we should probably have that nailed down with some certainty that the interface will not change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commonlib is ready for review. once we merge it, we can continue with windows lib.

Comment on lines 179 to 193
systemExceptions:
commonlib.panels.all.timeSeries.base.new(
'System calls and exceptions',
targets=[
t.windowsSystemExceptions,
t.windowsSystemCalls,
],
),
systemThreads:
commonlib.panels.all.timeSeries.base.new(
'System threads',
targets=[
t.windowsSystemThreads,
],
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as in dashboards.libsonnet.. If these are unused, should they persist?

CPU usage by different modes.
|||
),
// TODO add why it is important, consider alert
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the description adequately explains why this metric is important?

Suggested change
// TODO add why it is important, consider alert

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thnks, removed

@@ -0,0 +1,41 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most of these are pretty common utils. I don't recall. Did we intend to merge these with grafonnet or with the common-lib?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thnks, Moved to commonlib

Copy link
Contributor

@rgeyer rgeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks ready to go!

I'd wait until common-lib is merged, and update that reference in jsonnet bundler, but otherwise. 🚀

@v-zhuravlev v-zhuravlev force-pushed the vzhuravlev/jl-windows-new branch from ecb0fc2 to 1d0eb76 Compare October 18, 2023 11:29
@v-zhuravlev v-zhuravlev enabled auto-merge (squash) October 18, 2023 11:29
@v-zhuravlev v-zhuravlev merged commit d2347c7 into master Oct 18, 2023
@v-zhuravlev v-zhuravlev deleted the vzhuravlev/jl-windows-new branch October 18, 2023 11:30
Comment on lines +92 to +98
![image](https://github.com/grafana/jsonnet-libs/assets/14870891/b36b6245-643a-426f-9745-5437d93815ad)
Overview dashboard:
![image](https://github.com/grafana/jsonnet-libs/assets/14870891/723df88c-a789-4e73-a85e-724d9ea06cd2)
Logs dashboard:
![image](https://github.com/grafana/jsonnet-libs/assets/14870891/ec136706-96c1-4bc4-b608-f7184327d845)
Drill down disks dashboard:
![image](https://github.com/grafana/jsonnet-libs/assets/14870891/dfcda70d-4c2e-494f-b092-7d37a13d65d1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should probably be rehosted in our GCP cluster rather than github's CDN?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for integration, yes.

Comment on lines +81 to +80
// panels.systemThreads,
// panels.systemExceptions,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this get discussed? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants