Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nomad ACL roles with duplicated policies #17201

Closed
the-nando opened this issue May 16, 2023 · 2 comments · Fixed by #18419
Closed

Nomad ACL roles with duplicated policies #17201

the-nando opened this issue May 16, 2023 · 2 comments · Fixed by #18419
Assignees
Labels
hcc/cst Admin - internal stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/security type/bug

Comments

@the-nando
Copy link
Contributor

the-nando commented May 16, 2023

Nomad version

Nomad v1.5.5

Issue

The ACL lookup system fails if a token is issued with two roles containing policies with the same name.

Reproduction steps

I have two roles my-role and my-role-sre which share one or more policies with the same name.
In a simplified setup:

~ nomad acl role list
ID                                    Name         Description  Policies
5e437898-384e-328b-9a94-97765cfe90d3  my-role-sre  my-role-sre  cluster-user-readonly,my-namespace-power-user
dc0ec75b-9927-3147-586f-264ebc8dbca5  my-role      my-role      cluster-user-readonly,my-namespace-read-only
~
~ nomad acl policy info cluster-user-readonly   
Name        = cluster-user-readonly
Description = <none>
CreateIndex = 22
ModifyIndex = 45

Rules

# Access to a namespace
namespace "default" {
  policy = "read"
}

~ nomad acl policy info my-namespace-read-only 
Name        = my-namespace-read-only
Description = <none>
CreateIndex = 43
ModifyIndex = 43

Rules

namespace "my-namespace" {
  policy = "read"
}

~ nomad acl policy info my-namespace-power-user 
Name        = my-namespace-power-user
Description = <none>
CreateIndex = 42
ModifyIndex = 42

Rules

namespace "my-namespace" {
  policy = "write"
}

~ nomad namespace status my-namespace
Name            = my-namespace
Description     = <none>
Quota           = <none>
EnabledDrivers  = *
DisabledDrivers = <none>

I can then start a test job in my-namespace called test and issue a token with both roles:

nomad acl token create -role-name=my-role -role-name=my-role-sre

Expected Result

Given the policies defined above I expect to be able to exec in the alloc started by my test job:

nomad exec -i -t -namespace='my-namespace' -job test echo foo

Actual Result

Client:

failed to exec into task: rpc error: Permission denied

Server:

    2023-05-16T08:25:33.590+0200 [ERROR] client.rpc: error performing RPC to server: error="rpc error: Permission denied" rpc=ACL.GetPolicies server=127.0.0.1:4647
    2023-05-16T08:25:33.590+0200 [ERROR] client.rpc: error performing RPC to server which is not safe to automatically retry: error="rpc error: Permission denied" rpc=ACL.GetPolicies server=127.0.0.1:4647

The "Permission denied" error originates from here

[...]
         // Generate a set of policy names. This is initially generated from the
	// ACL role links.
	tokenPolicyNames, err := a.policyNamesFromRoleLinks(token.Roles)
	if err != nil {
		return err
	}

	// Add the token policies which are directly referenced into the set.
	tokenPolicyNames.InsertAll(token.Policies)

	// Ensure the token has enough permissions to query the named policies.
	if token.Type != structs.ACLManagementToken && !tokenPolicyNames.ContainsAll(args.Names) {
		return structs.ErrPermissionDenied
	}
[...]

tokenPolicyNames contains the deduplicate list of resolved polices from all the roles assigned to the token, whereas args.Names from the RPC call contains the "raw" list.
In my example:

tokenPolicyNames = [cluster-user-readonly my-namespace-power-user my-namespace-read-only]
args.Names = [my-namespace-read-only cluster-user-readonly my-namespace-power-user cluster-user-readonly]

Removing the duplicated cluster-user-readonly from one of the roles makes it to work as expected but it's not an uncommon pattern to scaffold ACL roles with common "basic read-only" policies.
Should the client's resolvePolicies implement the deduplication? https://github.com/hashicorp/nomad/blob/release/1.5.5/client/acl.go#L181

The issue doesn't affect all operations as not all of them rely on the RPC call. nomad job works and so does nomad alloc status, albeit with an error when retrieving stats:

~ nomad alloc status -namespace='my-namespace' 865872cf
ID                  = 865872cf-5afd-5680-45a6-fad231a93f68
Eval ID             = 6980575f
Name                = test.test[0]
Node ID             = a8bb5dbd
Node Name           = agent-1
Job ID              = test
Job Version         = 0
Client Status       = running
Client Description  = Tasks are running
Desired Status      = run
Desired Description = <none>
Created             = 4m ago
Modified            = 3m50s ago
Deployment ID       = 07238550
Deployment Health   = healthy

Couldn't retrieve stats: Unexpected response code: 403 (Permission denied)

Task "test" is "running"
Task Resources:
CPU      Memory  Disk     Addresses
100 MHz  64 MiB  300 MiB

Task Events:
Started At     = 2023-05-16T11:52:47Z
Finished At    = N/A
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                       Type        Description
2023-05-16T13:52:47+02:00  Started     Task started by client
2023-05-16T13:52:46+02:00  Task Setup  Building Task Directory
2023-05-16T13:52:46+02:00  Received    Task received by client
@tgross tgross added this to Needs Triage in Nomad - Community Issues Triage via automation May 16, 2023
@jrasell jrasell self-assigned this May 17, 2023
@jrasell jrasell moved this from Needs Triage to Triaging in Nomad - Community Issues Triage May 17, 2023
@jrasell
Copy link
Member

jrasell commented May 18, 2023

Hi @the-nando and thanks for raising this issue. We have been able to reproduce this internally and will post that shortly.

@jrasell jrasell added theme/security stage/accepted Confirmed, and intend to work on. No timeline committment though. labels May 18, 2023
@jrasell jrasell moved this from Triaging to Needs Roadmapping in Nomad - Community Issues Triage May 18, 2023
@davemay99 davemay99 added the hcc/cst Admin - internal label May 26, 2023
@jrasell jrasell removed their assignment Aug 1, 2023
@jrasell jrasell self-assigned this Aug 29, 2023
@jrasell
Copy link
Member

jrasell commented Aug 30, 2023

I have pushed my local reproduction: https://github.com/jrasell/dev-mess/tree/main/nomad/development/gh17201

I will be starting work on identifying the cause and fixing this bug within the next week, depending on other priorities.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hcc/cst Admin - internal stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/security type/bug
Projects
Development

Successfully merging a pull request may close this issue.

3 participants