Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

☂️ Monitoring #48

Closed
2 of 6 tasks
schrodit opened this issue Nov 17, 2020 · 8 comments · Fixed by #110
Closed
2 of 6 tasks

☂️ Monitoring #48

schrodit opened this issue Nov 17, 2020 · 8 comments · Fixed by #110
Labels
area/monitoring Monitoring (including availability monitoring and alerting) related kind/enhancement Enhancement, improvement, extension kind/epic Large multi-story topic lifecycle/rotten Nobody worked on this for 12 months (final aging stage) priority/3 Priority (lower number equals higher priority) status/closed Issue is closed (either delivered or triaged)
Milestone

Comments

@schrodit
Copy link

schrodit commented Nov 17, 2020

How to categorize this issue?

/area monitoring
/kind epic
/priority 3

Umbrella issue to expose landscaper and deployer metrics and tracing got the various reconcile loops.

General:

Landscaper:

  • blueprint/component-descriptor artifact cache
  • installations controller
    • add tracing through the reconcile flow
  • executions controller
    • add tracing through the reconcile flow

The issue is open for other aspects we can monitor

@schrodit schrodit added the kind/enhancement Enhancement, improvement, extension label Nov 17, 2020
@gardener-robot gardener-robot added area/monitoring Monitoring (including availability monitoring and alerting) related kind/epic Large multi-story topic priority/normal labels Nov 17, 2020
@schrodit schrodit added this to the 2021-Q2 milestone Nov 30, 2020
@hendrikKahl
Copy link
Contributor

Additionally, I feel we should expose basic controller metrics for all deployers as well.

In terms of further metrics: Maybe it would be interesting, if deployers would expose the connection time metrics?
Like, how long does it takes to fetch deploy items from a landscaper cluster as a histogram?

@schrodit
Copy link
Author

schrodit commented Feb 1, 2021

Additionally, I feel we should expose basic controller metrics for all deployers as well.

In terms of further metrics: Maybe it would be interesting, if deployers would expose the connection time metrics?
Like, how long does it takes to fetch deploy items from a landscaper cluster as a histogram?

you mean how long does it take for a deploy item to be processed?
maybe thats something we can also include the deployer library. some of such kind of default metrics

@hendrikKahl
Copy link
Contributor

hendrikKahl commented Feb 1, 2021

you mean how long does it take for a deploy item to be processed?

That would be another valid option. What I had in mind was focusing more on traffic when there is a distributed setup with deployers and landscaper running in separate clusters.

@hendrikKahl hendrikKahl self-assigned this Feb 1, 2021
@hendrikKahl
Copy link
Contributor

metrics added to OCI cache - gardener-attic/component-cli#3

@hendrikKahl
Copy link
Contributor

a first implementation has been merged with #110 . This enables the landscaper-controller collect metrics of the controller-runtime and its OCI cache(s).

Meaningful next steps would include:

  • enable deployer-controllers to expose the same set of metrics
  • write a meaningful prometheus scrape config and drop metrics which are not needed
  • build a grafana dashboard to visualize the status of each controller

Additionally, it could be interesting to expand the cache metrics with a histogram that gives insight into the file sizes.

@hendrikKahl hendrikKahl reopened this Feb 12, 2021
@gardener-robot gardener-robot added priority/3 Priority (lower number equals higher priority) and removed priority/normal labels Mar 8, 2021
@gardener-robot
Copy link

@schrodit Label priority/normal does not exist.

@schrodit
Copy link
Author

We should also switch to OpenTelementry as poen standard for metrics.
In addition open telemetry supports traces which could be a good way to monitor our reconcile loops.

Also the controller-runtime is migrating towards that approach: kubernetes-sigs/controller-runtime#305

@gardener-robot gardener-robot added the lifecycle/stale Nobody worked on this for 6 months (will further age) label Nov 28, 2021
@gardener-robot gardener-robot added lifecycle/rotten Nobody worked on this for 12 months (final aging stage) and removed lifecycle/stale Nobody worked on this for 6 months (will further age) labels May 28, 2022
@achimweigel
Copy link
Contributor

@In-Ko closed because outdated and/or copied into internal project

@gardener-robot gardener-robot added the status/closed Issue is closed (either delivered or triaged) label Aug 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/monitoring Monitoring (including availability monitoring and alerting) related kind/enhancement Enhancement, improvement, extension kind/epic Large multi-story topic lifecycle/rotten Nobody worked on this for 12 months (final aging stage) priority/3 Priority (lower number equals higher priority) status/closed Issue is closed (either delivered or triaged)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants