Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parametrized/periodic jobs per child tagged metric emmision #4392

Merged
merged 1 commit into from
Jun 21, 2018

Conversation

burdandrei
Copy link
Contributor

continuing better metrics emission started in #3882
fixes #4061

@burdandrei burdandrei force-pushed the telemetry-parametrized-jobs branch 2 times, most recently from 3159c4e to a0a9f1f Compare June 7, 2018 18:36
@jippi
Copy link
Contributor

jippi commented Jun 8, 2018

Given how high cardinality this telemetry will be, it would be nice to opt-in for it - influxdb will not like a lot of batch jobs emitting these extra labels often (like when jobs run every 5min 24/7)

Influx limit by default to just 1m series - which will run out very very quick already

@burdandrei
Copy link
Contributor Author

@jippi, InfluxDB-wize there is no difference.
Current status is that you receive dispatch/periodic I'd as a part of job name.
My PR makes parent job name to be emitted as job name and dispatch/periodic I'd as a tag

@burdandrei
Copy link
Contributor Author

ping @dadgar

@preetapan
Copy link
Member

@burdandrei could you resolve the merge conflict on your branch with a rebase?

One concern I have is that if the change will break existing users that somehow relied on parsing the jobid to get the parent and the dispatched child id out. If we decide to merge this we need to mention that in the changelog and in the metrics documentation as well.

@@ -288,6 +284,31 @@ func NewTaskRunner(logger *log.Logger, config *config.Config,
},
}

if tc.alloc.Job.ParentID != "" {
tc.baseLabels = append(tc.baseLabels, metrics.Label{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this with the team internally, and we don't want to break any existing users that rely on the full jobID for dispatched jobs. Could you add a new "parentJob" label here and below for alloc.job.ParentID and leave the existing "job" label as it is? That way it doesn't break backwards compatibility.

@burdandrei
Copy link
Contributor Author

@preetapan
rebase - done
task_runner - reverted job tag change, added parent_id + (dispatch/periodic_id)

What about summaries? Does it look ok?

nomad/leader.go Outdated
}
labels = append(labels, metrics.Label{
Name: "job",
Value: metricJobId,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still not quite right. The "job" label should have summary.JobId in all three cases, so I would suggest adding back the code in lines 619-622, and removing 646-648. In each of your case statements above for dispatch and periodic, jobInfo[0] should be added as new label called "parentJobId".

For example, if summary.JobID is 'myjob/dispatch-23423423', this should result in the following labels:

label value
"job" myjob/dispatch-23423423 --> (this maintains backwards compatibility)
"parentJob" myjob
"dispatch_id" 23423423

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got you,
generally, I tried to avoid this, cause if I got a dashboard with dropdown by JOB ID and lot's of periodic/dispatch it's really hard to find normal jobs. But nomad-wise you are right job id is the long one. Maybe it's better to add an option to config emit/skip for parametrized/periodic jobs. But again it can add redundant operational complexity.
Let me know if you want me to provide the config option.

@burdandrei
Copy link
Contributor Author

thanks, @preetapan!
can we add #3882 with this as well?

Value: jobInfo[0],
}, metrics.Label{
Name: "periodic_id",
Value: jobInfo[1],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure that the length of jobInfo is what is expected for safety.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this specific case, the contains check in line 642 guards against that. If the string ends with /periodic- that slice should still have 2 elements with one being the empty string.

if strings.Contains(tc.alloc.Job.Name, "/periodic-") {
tc.baseLabels = append(tc.baseLabels, metrics.Label{
Name: "periodic_id",
Value: strings.Split(tc.alloc.Job.Name, "/periodic-")[1],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First check the length after splitting the string to ensure the length is as expected before assigning the second element.

if strings.Contains(tc.alloc.Job.Name, "/dispatch-") {
tc.baseLabels = append(tc.baseLabels, metrics.Label{
Name: "dispatch_id",
Value: strings.Split(tc.alloc.Job.Name, "/dispatch-")[1],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See below comment. Consider pulling this into a helper function to reduce code duplication.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the input, I wrote helper function hour ago and now I'm trying to figure where to place it =)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, its probably fine to not introduce a helper method. This and the other place we do this, are in two different packages and we would have to put one helper method in a util package that both can import. Doesn't seem worth it this time..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That what's stopped me - 2 different packages without imports intersection.
Thanks @preetapan, and thanks @chelseakomlo for really usefull code review tips!

Value: jobInfo[0],
}, metrics.Label{
Name: "dispatch_id",
Value: jobInfo[1],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't assume that this array will have two elements, first check for safety.

}, metrics.Label{
Name: "periodic_id",
Value: jobInfo[1],
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two points overall:

  • Consider adding a helper function where there is code duplication
  • Don't assume the length of the array is what is as expected, first check before indexing.

@preetapan preetapan merged commit 7325cf6 into hashicorp:master Jun 21, 2018
schmichael added a commit that referenced this pull request Oct 10, 2018
PR #4392 was merged to master *after* allocrunnerv2 was branched, so the
client-specific portions must be ported from master to arv2.
schmichael added a commit that referenced this pull request Oct 12, 2018
PR #4392 was merged to master *after* allocrunnerv2 was branched, so the
client-specific portions must be ported from master to arv2.
schmichael added a commit that referenced this pull request Oct 15, 2018
PR #4392 was merged to master *after* allocrunnerv2 was branched, so the
client-specific portions must be ported from master to arv2.
schmichael added a commit that referenced this pull request Oct 16, 2018
PR #4392 was merged to master *after* allocrunnerv2 was branched, so the
client-specific portions must be ported from master to arv2.
schmichael added a commit that referenced this pull request Oct 16, 2018
PR #4392 was merged to master *after* allocrunnerv2 was branched, so the
client-specific portions must be ported from master to arv2.
@github-actions
Copy link

github-actions bot commented Mar 2, 2023

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 2, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Metrics for periodic jobs are creating too many timeseries due to timestamps in the labels
4 participants