Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update nomad metrics to include information about Job status #4036

Closed
sevagh opened this issue Mar 23, 2018 · 13 comments
Closed

Update nomad metrics to include information about Job status #4036

sevagh opened this issue Mar 23, 2018 · 13 comments
Labels
stage/waiting-reply theme/docs Documentation issues and enhancements theme/metrics

Comments

@sevagh
Copy link
Contributor

sevagh commented Mar 23, 2018

Hello,

We noticed that the metrics endpoint (scraped by Prometheus) exposes Task Group Summaries per Job's TaskGroup: https://github.com/hashicorp/nomad/blob/master/api/jobs.go#L691

However there are no metrics exposed for the job status (i.e. https://github.com/hashicorp/nomad/blob/master/api/jobs.go#L691)

Is that something I should compose by some aggregation on Task Group Summaries, or could this be a useful metric to add in a PR?

@chelseakomlo
Copy link
Contributor

Hi, thanks for the question. These metrics should already be enabled, look for the metric name nomad.nomad.job_summary. We will be sure to update our telemetry documentation to include this metric as well.

@chelseakomlo chelseakomlo added the theme/docs Documentation issues and enhancements label Mar 24, 2018
@chelseakomlo chelseakomlo changed the title [question] Why no exposed Prometheus metrics for Job status Update nomad metrics to include information about Job status Mar 24, 2018
@sevagh
Copy link
Contributor Author

sevagh commented Mar 24, 2018

Thanks. I'll ensure I have the correct settings (https://www.nomadproject.io/docs/agent/configuration/telemetry.html) by Monday and update this ticket.

@sevagh
Copy link
Contributor Author

sevagh commented Mar 26, 2018

I can't find it:

shanssian:nomad $ grep -Iirnw ./ -e '.*SetGaugeWithLabels.*job_summary.*'
./nomad/leader.go:623:                                          metrics.SetGaugeWithLabels([]string{"nomad", "job_summary", "queued"},
./nomad/leader.go:625:                                          metrics.SetGaugeWithLabels([]string{"nomad", "job_summary", "complete"},
./nomad/leader.go:627:                                          metrics.SetGaugeWithLabels([]string{"nomad", "job_summary", "failed"},
./nomad/leader.go:629:                                          metrics.SetGaugeWithLabels([]string{"nomad", "job_summary", "running"},
./nomad/leader.go:631:                                          metrics.SetGaugeWithLabels([]string{"nomad", "job_summary", "starting"},
./nomad/leader.go:633:                                          metrics.SetGaugeWithLabels([]string{"nomad", "job_summary", "lost"},

These are the TaskGroup summaries. I'm looking for Job Status, the higher level one (e.g. Running, Dead):

shanssian:nomad $ nomad status foo
ID            = foo
Name          = foo
Submit Date   = 01/20/18 19:50:34 PST
Type          = service
Priority      = 50
Datacenters   = bar
Status        = running

For the moment I'm exporting this myself with a custom Prometheus exporter:

func (nc *NomadCollector) Collect(ch chan<- prometheus.Metric) {
        jobs, _, err := nc.client.Jobs().List(&api.QueryOptions{})
        if err != nil {
                logError(err)
                return
        }
        for _, job := range jobs {
                ch <- prometheus.MustNewConstMetric(
                        nc.nomadJobStatus,
                        prometheus.GaugeValue,
                        1.0,
                        job.ID,
                        job.Name,
                        job.Type,
                        job.Status,
                )
        }

@maxramqvist
Copy link

We'd like above metrics exposed as well. Made a separate exporter for batch and parameterized jobs, scraping the HTTP API. 🐙

@sevagh
Copy link
Contributor Author

sevagh commented Mar 29, 2018

master...sevagh:feat/job-metrics
This replicates what I've exposed with our in-house Nomad exporter. I'd appreciate any comments or feedback on whether this is a good addition from others who run similar setups.
#4075

@sevagh
Copy link
Contributor Author

sevagh commented Nov 1, 2018

Some accidental churn as I deleted my old fork of Nomad, but my pull request to implement this feature request is as of yet unmerged, and according to some other users here, I think several people are running custom Nomad exporters to expose this info (which this PR would help eliminate):

#4831

@stale
Copy link

stale bot commented May 10, 2019

Hey there

Since this issue hasn't had any activity in a while - we're going to automatically close it in 30 days. If you're still seeing this issue with the latest version of Nomad, please respond here and we'll keep this open and take another look at this.

Thanks!

@stale
Copy link

stale bot commented Jun 9, 2019

This issue will be auto-closed because there hasn't been any activity for a few months. Feel free to open a new one if you still experience this problem 👍

@pete-woods
Copy link
Contributor

See #6003 for another PR to try and fix this.

@Fuco1
Copy link
Contributor

Fuco1 commented Oct 12, 2019

The documentation at https://www.nomadproject.io/docs/telemetry/metrics.html#tagged-metrics mentions these are under nomad.job_summary but this is (no longer?) true. You suggest here that they are under nomad.nomad.job_summary.

@pete-woods
Copy link
Contributor

Those docs seem to be for the unreleased 0.10 version where my PR is merged.

@Fuco1
Copy link
Contributor

Fuco1 commented Oct 12, 2019

Is there a way to display docs for released versions by default? I find this very confusing :(

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
stage/waiting-reply theme/docs Documentation issues and enhancements theme/metrics
Projects
None yet
Development

No branches or pull requests

5 participants