Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consul: correctly check consul acl token namespace when using consul oss #10720

Merged
merged 1 commit into from
Jun 8, 2021

Conversation

shoenig
Copy link
Member

@shoenig shoenig commented Jun 7, 2021

This PR fixes the Nomad Object Namespace <-> Consul ACL Token relationship
check when using Consul OSS (or Consul ENT without namespace support).

Nomad v1.1.0 introduced a regression where Nomad would fail the validation
when submitting Connect jobs and allow_unauthenticated set to true, with
Consul OSS - because it would do the namespace check against the Consul ACL
token assuming the "default" namespace, which does not work because Consul OSS
does not have namespaces.

Instead of making the bad assumption, expand the namespace check to handle
each special case explicitly.

Fixes #10718

@notnoop notnoop added this to the 1.1.1 milestone Jun 8, 2021
Copy link
Contributor

@notnoop notnoop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few questions; approach seems reasonable.

Comment on lines 95 to 96
case namespace == token.Namespace:
// ACLs enabled, namespaces are the same
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it a bit easier if the common cases are grouped together, and special cases are together. I'd suggest moving this case to be the second.

// provide a more informative error message.
return errors.Errorf("consul ACL token requires using namespace %q", token.Namespace)

default:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it a tad harder to reason through conditions, maybe because of ordering. I would probably find it easier if the conditions were order the following way:

  1. namespace == token.Namespace: either ACL not enabled, or it's enabled and they match
  2. token.Namespace == "default": special case
  3. token.Namespace != "default" && namespace == "" <- token.Namespace != "default" is redundant here but included for explicitness
  4. default: the mismatch case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this suggestion, this is easier to read 🙂

Comment on lines 91 to 92
// ACLs enabled, must defer to per-object checking, since the token could
// have namespace_prefix blocks allowing the operation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with namespace_prefix. Is it only applicable if token's namespace is default? Why doesn't it apply in the default or token.Namespace != "default" cases? Is a special case for handling legacy apps only? Might be useful to add more context.

Copy link
Member Author

@shoenig shoenig Jun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, Consul token ACL policies can supply namespace or namespace_prefix blocks which give the token permission beyond the "default" namespace. Those blocks are only allowable in tokens in the "default" namespace.

https://www.consul.io/docs/security/acl/acl-rules#namespace-rules

Expanded on the comment to clarify.

This PR fixes the Nomad Object Namespace <-> Consul ACL Token relationship
check when using Consul OSS (or Consul ENT without namespace support).

Nomad v1.1.0 introduced a regression where Nomad would fail the validation
when submitting Connect jobs and allow_unauthenticated set to true, with
Consul OSS - because it would do the namespace check against the Consul ACL
token assuming the "default" namespace, which does not work because Consul OSS
does not have namespaces.

Instead of making the bad assumption, expand the namespace check to handle
each special case explicitly.

Fixes #10718
@shoenig
Copy link
Member Author

shoenig commented Jun 8, 2021

@notnoop I applied your recommendations, thanks for the suggestion.

I also fixed the failing tests, which then made me realize we need to split the tests between oss and ent - they were originally written around the same bad assumption that caused this bug in the first place. So now we also have new files

nomad/consul_oss_test.go
nomad/consul_policy_oss_test.go
nomad/job_endpoint_oss_test.go

which are oriented around the fact that Nomad OSS does not consider the group.consul.namespace field. I'm working on a followup PR for ENT which expands on the same tests but with all combinations of possible input for group.consul.namespace and the Consul token Namespace.

@shoenig shoenig marked this pull request as ready for review June 8, 2021 19:25
@shoenig shoenig requested review from notnoop and tgross June 8, 2021 19:25
Copy link
Contributor

@notnoop notnoop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Awesome tests. LGTM.

Comment on lines +37 to +38
t.Run("check-permissions kv read", func(t *testing.T) {
t.Run("uses kv has permission", func(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how I personally feel about this style of nested t.Run: When the assertions are simply delegation to try, it feels a simple table tests might read better and is more concise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, for tests like these it's hard to find the right balance between readability of the code and readability of the description of the test. By composing t.Run we get a free way of composing the description, making it obvious what the difference is between one particular test from the tests surrounding it. OTOH it's a lot of lines of code that aren't actually doing anything. I'll take a look at refactoring these tests some more after the bugfix release.

@shoenig shoenig merged commit 402b19c into main Jun 8, 2021
@shoenig shoenig deleted the f-cns-acl-check branch June 8, 2021 20:43
@shoenig
Copy link
Member Author

shoenig commented Jun 9, 2021

Spot checking (oss Nomad + oss Consul)

$ nomad job run api.nomad
Error submitting job: Unexpected response code: 500 (job-submitter consul token denied: missing consul token)
$ nomad job run -consul-token $(uuid-tool) api.nomad
Error submitting job: Unexpected response code: 500 (job-submitter consul token denied: unable to read consul token: Unexpected response code: 403 (ACL not found))
$ nomad job run -consul-token 76221243-1631-e241-9b34-ffbcb6c36025 api.nomad
==> Monitoring evaluation "3d2dcf7f"
    Evaluation triggered by job "api"
    Evaluation within deployment: "8afe1793"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "3d2dcf7f" finished with status "complete"

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to submit Connect jobs when disabling allow_unauthenticated
2 participants