Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add monitoring metrics for jobset #613

Closed
googs1025 opened this issue Jul 4, 2024 · 2 comments · Fixed by #614 or #633
Closed

add monitoring metrics for jobset #613

googs1025 opened this issue Jul 4, 2024 · 2 comments · Fixed by #614 or #633
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@googs1025
Copy link
Member

googs1025 commented Jul 4, 2024

What would you like to be added:
Currently, there are only two monitoring indicators for jobset. here
We should add some metrics for tracking the health of the system, some of them like:

  • Number of jobset failures
  • Jobset completed number
  • Number of jobset restarts?
    There are many more metrics that can be added or discussed.

Why is this needed:
Improve system observability

@googs1025
Copy link
Member Author

/kind feature

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 4, 2024
@googs1025
Copy link
Member Author

/assign

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
2 participants