Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v0.9.0 - Nomad Metrics seems to be broken for AllocClientStatusPending #5540

Closed
billykwooten opened this issue Apr 10, 2019 · 5 comments · Fixed by #5541
Closed

v0.9.0 - Nomad Metrics seems to be broken for AllocClientStatusPending #5540

billykwooten opened this issue Apr 10, 2019 · 5 comments · Fixed by #5541
Assignees

Comments

@billykwooten
Copy link

billykwooten commented Apr 10, 2019

Nomad version

Nomad v0.9.0 (18dd590)

Operating system and Environment details

Ubuntu v18.04

Issue

We're monitoring Nomad via Prometheus metrics and nomad_client_allocations_pending is reporting incorrect number of pending allocations. We have no pending allocations on our cluster, however it's reporting pending allocations, see below:

nomad_client_allocations_pending{datacenter="AAA",host="ldaaaaa",node_class="none",node_id="7ee2d161-c142-44e5-5a7c-d4f1600f8b26"} 10

However, if I go to that host "ldaaaaa":

root@ldaaaaa:~# nomad node status -self
ID            = 7ee2d161
Name          = ldaaaaa
Class         = <none>
DC            = AAA
Drain         = false
Eligibility   = eligible
Status        = ready
Uptime        = 8m49s
Driver Status = docker,exec

Node Events
Time                       Subsystem  Message
2019-04-10T13:04:07-04:00  Cluster    Node re-registered
2019-04-10T13:03:49-04:00  Cluster    Node heartbeat missed
2019-04-10T12:32:40-04:00  Drain      Node drain complete
2019-04-10T12:31:42-04:00  Drain      Node drain strategy set
2019-02-28T14:22:34-05:00  Cluster    Node registered

Allocated Resources
CPU            Memory           Disk
1420/8380 MHz  1.8 GiB/7.8 GiB  3.6 GiB/76 GiB

Allocation Resource Utilization
CPU          Memory
24/8380 MHz  204 MiB/7.8 GiB

Host Resource Utilization
CPU           Memory            Disk
273/8380 MHz  1015 MiB/7.8 GiB  7.3 GiB/87 GiB

Allocations
ID        Node ID   Task Group       Version  Desired  Status    Created    Modified
335d0fcb  7ee2d161  dns              0        run      complete  32s ago    30s ago
41885ae0  7ee2d161  backups          0        run      complete  2m32s ago  2m26s ago
b405ee3c  7ee2d161  vaultui          1        run      running   7m21s ago  7m7s ago
b22b41f5  7ee2d161  reverse_proxy    0        run      running   7m29s ago  7m5s ago
dc2c41dd  7ee2d161  database         9        run      running   7m29s ago  7m11s ago
17a5e8cb  7ee2d161  database         9        run      running   7m29s ago  7m10s ago
ae28e448  7ee2d161  loadbalance      6        run      running   7m29s ago  7m6s ago
8345f2d9  7ee2d161  reverse_proxy    0        run      running   7m29s ago  7m4s ago
59352cb2  7ee2d161  consul-template  0        run      running   7m29s ago  7m14s ago
3d830840  7ee2d161  metrics          0        run      running   7m29s ago  7m17s ago
1c8c60b6  7ee2d161  database         9        run      running   8m6s ago   7m48s ago

Reproduction steps

Run Nomad 0.9.0 with Prometheus metrics and allocation metrics enabled, it will provide incorrect number of pending allocations via nomad_client_allocations_pending on Nomad.

This instantly started happening when we went from v0.8.7 to v0.9.0.

@cgbaker
Copy link
Contributor

cgbaker commented Apr 10, 2019

thanks for the report, @billykwooten . i've verified this and filed it in our internal tracker.

@latchmihay
Copy link

perhaps is something to do with this? 9e2205f#diff-ccbd515c67aa55098b48f1106de134aa

cgbaker pushed a commit that referenced this issue Apr 22, 2019
cgbaker pushed a commit that referenced this issue Apr 22, 2019
cgbaker pushed a commit that referenced this issue Apr 23, 2019
@preetapan
Copy link
Member

@billykwooten We just released 0.9.1-rc which resolves this issue, available at https://releases.hashicorp.com/nomad/0.9.1-rc1/. Would be great if you can try it out and let us know if it fixed what you saw.

@latchmihay
Copy link

I have tested it and from what I conclude issue was fixed in 0.9.1 rc1

image

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants