Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add metrics for blocked eval resources #10454

Merged
merged 3 commits into from
Apr 29, 2021
Merged

Conversation

lgfa29
Copy link
Contributor

@lgfa29 lgfa29 commented Apr 27, 2021

When an eval is blocked, it's often because of insufficient resources available. Nomad currently tracks now many evals are blocked, but it has not indication of how much resource is required to unblock them, or how where they are assigned to run.

This PR adds a new set of metrics that emit how much resource have been requested by blocked evals. The metrics are split into jobs, datacenters and node classes.

If a job spans multiple datacenters or node classes, they will all be updated, since adding more resources to any of them would be enough to unblock the eval.

This metric is emitted by the leader, since it is the only node in the cluster that has the proper scheduling information.

Missing work:

  • More tests
  • Update documentation

Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall approach LGTM, and it'll be nice to have a place to hook in more metrics around scheduler decisions so 👍 on that.

nomad/blocked_evals.go Show resolved Hide resolved
nomad/blocked_evals_stats.go Show resolved Hide resolved
nomad/blocked_evals_test.go Show resolved Hide resolved
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@lgfa29 lgfa29 merged commit c711492 into main Apr 29, 2021
@lgfa29 lgfa29 deleted the f-blocked-eval-resource-metrics branch April 29, 2021 19:03
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants