Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

secure vars implicit ACL policy expects invalid name #13995

Closed
tgross opened this issue Aug 3, 2022 · 8 comments · Fixed by #14140
Closed

secure vars implicit ACL policy expects invalid name #13995

tgross opened this issue Aug 3, 2022 · 8 comments · Fixed by #14140
Assignees
Milestone

Comments

@tgross
Copy link
Member

tgross commented Aug 3, 2022

The ACL policy parsing we added for secure variables (scheduled to ship in Nomad 1.4.0) allows workload identities to gain new privileges when an appropriately named policy is created. As noted in a conversation with @apollo13 9384ba1#r80061480, the name template for this policy is the rather ugly _:$job_id/$group/$task.

As it turns out, this isn't just ugly it's actually an invalid ACL policy name! We use this regex to check the policy name. I'm going to have to circle back to fix this later in the week but wanted to write it down so we don't forget to fix it. A proposed naming convention: nomadjob-$job-$group-$task

@tgross tgross added this to the 1.4.0 milestone Aug 3, 2022
@tgross tgross self-assigned this Aug 3, 2022
tgross referenced this issue Aug 3, 2022
Includes concept docs for secure variables, concept docs for workload
identity, and an operations docs for keyring management.
@tgross
Copy link
Member Author

tgross commented Aug 5, 2022

@schmichael I'm going to revisit this when I get back after next week but do you know if there's a hidden reason for this regex other than maybe CLI/UI ergonomics? It seems like it could be safe to allow / without breaking anything, but I'll need to dig in further.

@schmichael
Copy link
Member

tl;dr - Naming things is impossible. Lifting the restriction seems ok with caveats.

Context

if there's a hidden reason for this regex other than maybe CLI/UI ergonomics?

The digital record seems to be lacking in explanation, and my fuzzy memory isn't dredging up anything...

...however I'm guessing this is around the time we were feeling regret in Nomad and Consul about our lack of identifier validation early on. Consul obviously needs DNS-friendly identifiers in a number of places. Nomad had also just run face first into the issue of /s being valid in Job IDs and now used in APIs like /v1/job/:job_id/dispatch. nomad status <anything>/dispatch always returns a 405 because we can't differentiate between a sub-path of /dispatch and a job name ending in /dispatch.

So it seems likely this was an attempt to start with the minimum-viable-allowable-characters as it's far easier to expand allowed characters over time than restrict them further.

Problems with proposals

Both proposed naming schemes run afoul of the /-is-allowed-in-job-IDs problem. If you have two jobs:

  1. Job: fooSEPbar, Group: eggs, Task: spam
  2. Job: foo, Group: barSEPeggs, Task: spam

A single path would match both regardless of whether SEP = / or SEP = -.

Normally I'd be open to saying "well people just shouldn't name things in confusing ways without expecting negative consequences," but this I don't feel comfortable here for 2 reasons:

  1. This sort of automatic-linking-based-on-naming-scheme is already error prone: a typo anywhere in the name doesn't produce an error message, it just silently doesn't work. Not a show-stopper, but just another potential UX gotcha.
  2. This is a security sensitive operation. If it was for artifact downloading or linking upstream services where presumably external security measures would prevent misuse, it wouldn't be as big of a deal. However since this operator grants privileges I'd prefer we make it as strict as possible.

Alternative Approach: Secondary Index

I think the only way to address all naming issues is by adding fields to ACLPolicy and requiring explicitly linking policies to jobs:

type ACLPolicy struct {
        Rules       string      // HCL or JSON format
        RulesJSON   *acl.Policy // Generated from Rules on read
        Hash        []byte
+       JobID       string
+       Group       string
+       Task        string
        CreateIndex uint64
        ModifyIndex uint64
 }

We could either add an index on JobID and treat empty Group and Task fields as wildcards to apply to the whole job, or we could add a compound index on all 3 and require explicitness.

Unfortunately this also means a bunch of new command line options:

nomad acl policy apply -job foo -group bar -task eggs foo-bar-eggs-policy.hcl

Compromise Approach: No slashes in jobs that use this feature

The above is a lot of work for no real functional benefit. Error messages on typos wouldn't even be possible because you have to create the policy before the job (although allowing jobspecs to just include policy blocks and adding a new atomic raft message could be interesting).

So what if we just started warning on jobspecs with / in the names and simply disallow this feature for them? Want to link an ACL policy? Don't use /s. This could be a very natural way to deprecate /s in job identifiers.

If we ever add a sub-path to ACL policy API endpoints we even special case the 2 internal slashes case ... I think.

I think we'd also want to drop the : from the name template. I can't think of anywhere it would cause issues, but not thinking of where symbols in identifiers could cause issues is what has gotten us into this mess. 😅 I'd like to keep our symbol usage in identifiers to a minimum to avoid unforeseen issues and give ourselves opportunities to expand in the future (eg perhaps we concatenate identifiers with : like the default IFS for PATH in bash?).

Dispatched Jobs

As if the /dispatch endpoint doesn't cause us enough pain due to ambiguities with jobs using slashes in their name, dispatched jobs themselves dynamically generate an ID which breaks all proposals here.

I think it's safe to match Job's to a policy based on their ParentID if one is set. Apologies if you've already implemented or dismissed this, and I missed it.

I'm not sure we can switch our use of / in JobIDs as I suspect it's an easy way for folks to detect dispatch jobs in their own code, but if we do deprecate and eventually prevent /s in all user-supplied identifiers I think it opens up a lot of nice UX improvements.

@apollo13
Copy link
Contributor

apollo13 commented Aug 8, 2022

Given the use in API endpoints it seems to be better to forbid / in ACL policy names as well. It is kind of annoying that / is such a nice hierarchical separator but at the same time used in URLs where it does not have to have the same semantic meaning. One could argue that :job_id and :policy_name in the http API could use url-encoded slashes but this might have a tendency to break as soon as you put a proxy in front of it.

Coming from CPP I can recommend :: as separator that somewhat limits conflicts but I do understand that this might not be wanted. Either way if we are concatenating jobs, tasks & groups we should make sure that they don't allow for any separator that the policy names allow for :/

@tgross
Copy link
Member Author

tgross commented Aug 15, 2022

This is a security sensitive operation. If it was for artifact downloading or linking upstream services where presumably external security measures would prevent misuse, it wouldn't be as big of a deal. However since this operator grants privileges I'd prefer we make it as strict as possible.

Yeah, this is why the notion of having a name that couldn't possibly match an existing policy was ideal, but the /-in-jobs really does make this a non-starter without some really ugly compromises. 😦 Thanks for catching this.

I really like the secondary index approach, as it gives us a lot of future flexibility for applying policies. It has a tiny bit of friction at the time of creation, but it's not likely to be a very frequent operation either (and has to be done with a management token which implies someone in the org who knows what they're doing).

I also think we should probably start warning about / in Job ID and then gradually try to deprecate all the weirdness possible, just to reduce the pain that comes with this kind of problem in the future.

@apollo13
Copy link
Contributor

I also think we should probably start warning about / in Job ID and then gradually try to deprecate all the weirdness possible, just to reduce the pain that comes with this kind of problem in the future.

Thinking about this loud (very loud). What if we were to fast-track this deprecation. What about "hiding" secure variables behind a feature flag and one would have to put a setting (like with the SchedulerAlgorithm in https://www.nomadproject.io/api-docs/operator/scheduler) to enable them. Enabling them would also enforce job names without slashes. This way we could still keep the hierarchical structure in the policy names and wouldn't need the secondary index.

New clusters could have this feature flag enabled by default and operators could opt into it for new clusters.

@tgross
Copy link
Member Author

tgross commented Aug 16, 2022

I've opened #14140 with the secondary index approach and I'm pretty happy with how it turns out.

Enabling them would also enforce job names without slashes. This way we could still keep the hierarchical structure in the policy names and wouldn't need the secondary index.

I think where I'm coming down on this is that using the name of the policy is way too "magic" to be safe for operators. It's a little gross to have to add the flags to new policies, but that's an infrequent operation so I think it's worth the tiny added friction.

@apollo13
Copy link
Contributor

apollo13 commented Aug 17, 2022 via email

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants