Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

job region defaults to node region #6064

Merged
merged 5 commits into from
Aug 5, 2019
Merged

job region defaults to node region #6064

merged 5 commits into from
Aug 5, 2019

Conversation

jazzyfresh
Copy link
Contributor

@jazzyfresh jazzyfresh commented Aug 2, 2019

Overview

PR #5664 changed behavior to default to a region called "global", when it should default to the region of the node it's submitting the job to.

Behavior

Hcl Region := Region specified in the Job hcl config
Http Region := Region specified as a region query parameter in the http request to the command agent (so either via a script or usually a command line argument)

Http Region takes precedence over Hcl Region

http hcl result
A B A
A nil A
nil B B

"global" Region value defaults to region of the node you're submitting to

http hcl result
"some other region value" "nil" "some other region value"
nil "some region other value" "some region other value"
"some other region value" "global" "some other region value"
"global" "some region other value" "global" --> node region
"global" "global" "global" --> node region
"global" nil "global" --> node region
nil "global" "global" --> node region
nil nil default --> node region

Implementation

While working on PR #5664 we discovered that the Hcl region gets saved but ignored, as downstream RPC calls only reference the region provided in the WriteRequest in the JobRegisterRequest, which used to come from the Http Region. This means that the Job state and the JobRegisterRequest regions became decoupled, which I fixed in previous PR.

However, that actually broke the correct behavior because there is further canonicalization that happens to the WriteRequest, which sets Region to the node region if not present:
https://github.com/hashicorp/nomad/blob/master/command/agent/job_endpoint.go#L405
https://github.com/hashicorp/nomad/blob/master/command/agent/http.go#L472
https://github.com/hashicorp/nomad/blob/master/command/agent/http.go#L431-L438
(thus, if Http Region is nil, but a user set "global" in the Hcl, the Hcl "global" value would get saved but ignored and the actual Region value for the job would get configured to the node region)

Discussion

  • what is a "global" region?
    • if you create a cluster with no region specified, it creates a literal region called "global"
  • if a user explicitly asks for "global" region, where should the job go?
    • to the region of the node you're submitting to
  • where is the appropriate place to have the Job region value default to the node region?
    • in jobUpdate before ApiToJobStruct canonicalization or parseWriteRequest happens

api/jobs.go Outdated
@@ -704,7 +704,7 @@ func (j *Job) Canonicalize() {
j.Stop = boolToPtr(false)
}
if j.Region == nil {
j.Region = stringToPtr("global")
j.Region = stringToPtr("")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks all the canonicalize tests and seems like the wrong thing to do. Canonicalize should continue to use "global" as the region if its not set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be set to a sentinel value that indicates region was not provided and canonicalized. Changing this means I will have to change all the tests tho

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we ultimately decided to leave canonicalization alone but changing the way jobUpdate sets defaults. Or do I still canonicalize to a sentienl value? cc @schmichael @angrycub

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I'm leaning toward leaving canonicalization alone if for no other reason than backwards compatibility.

The job submission code can do the proper default for Job.Region and avoid this code taking effect entirely.

More detailed ramblings we can ignore for now:

I hate leaving intentionally-circumvented-cruft lying about, but I think if we change Canonicalize's behavior we should force it to accept a default region as an argument. This would break existing users of this code and force them to make an explicit region decision which I think is the ideal upgrade path. I don't think it's necessary to do that work today as it has implications for all of Canonicalize's callers.

@jazzyfresh
Copy link
Contributor Author

I think this is ready if someone could give it a final look over

@endocrimes
Copy link
Contributor

endocrimes commented Aug 5, 2019

Code looks good, the only thing that looks weird to me is

http hcl result
"global" "some region other value" node region
nil "some region other value"

not going to the HCL region?

@jazzyfresh
Copy link
Contributor Author

Great question @endocrimes, I think it should go to the HCL, but I will add more tests to confirm that behavior.

@jazzyfresh
Copy link
Contributor Author

jazzyfresh commented Aug 5, 2019

For this case:

http hcl result
nil "some region other value" "some region other value"

I think this test already confirms that behavior, but I forgot to add it to the truth table in this PR description
https://github.com/hashicorp/nomad/blob/master/command/agent/job_endpoint_test.go#L483-L488

UPDATE: updated truth table w/ what I think is the desired behavior

@jazzyfresh
Copy link
Contributor Author

for this case:

http hcl result
"global" "some region other value" node region

I think that is the intended behavior because http needs to take precedence over hcl. So if a user specifies "global" at the command line arg, they are expecting it to go to a region that corresponds to whatever "global" means.

However, if that is incorrect then my understanding of what should take precedence is also incorrect.

I think the main issue this PR is trying to address is when both hcl and http are nil, so I am not sure what the correct precedence is if global is explicitly specified in the http.

Copy link
Contributor

@endocrimes endocrimes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@jazzyfresh jazzyfresh merged commit 692cd9c into master Aug 5, 2019
jazzyfresh added a commit that referenced this pull request Aug 7, 2019
@github-actions
Copy link

github-actions bot commented Feb 5, 2023

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants