-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vault: update task runner vault hook to support workload identity #18534
Conversation
829bf43
to
711e2e8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great @lgfa29. I've left a few comments around areas I'm unclear on
711e2e8
to
e79c4df
Compare
e79c4df
to
356a755
Compare
356a755
to
3f06537
Compare
3f06537
to
7ddead2
Compare
7ddead2
to
d76323c
Compare
d76323c
to
bcde86b
Compare
bcde86b
to
c110ee3
Compare
c110ee3
to
c98262b
Compare
c98262b
to
6c3e32b
Compare
2499ae0
to
9485001
Compare
9485001
to
71fe098
Compare
71fe098
to
89bc720
Compare
89bc720
to
7b34993
Compare
ca4feee
to
41fe4e1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice work on all the added tests. My only potential concern is the handling of recoverable/non-recoverable errors. But once you're happy with that I'm 👍 to merge.
return "", structs.WrapRecoverable( | ||
fmt.Sprintf("failed to derive Vault token for identity %s: %v", h.widName, err), | ||
err, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise, I'm not sure we can know if this is recoverable or not. It could be an issue with the Vault auth config, in which case there's nothing Nomad can do about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also happen due to an intermittent connectivity issue (network blip, Vault agent restart etc.) so making it recoverable allow us to retry.
testCases := []struct { | ||
name string | ||
vaultBlock *structs.Vault | ||
verifyTaskLifecycle func(*trtesting.MockTaskHooks) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like how this test works 👍
return nil | ||
} | ||
|
||
func TestVaultClient_DeriveTokenWithJWT(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a test helper that can skip the test if there's no Vault binary around? That might be nice to reduce wait times and spurious errors for non-CI development.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I will poke around other tests like this 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't find any check like this so I updated NewTestVault
in be3cedd to skip the test if the binary is not found.
Receiving a nil identity is an unexpected result as the WIDManager should have all identities by this point.
be3cedd
to
717d531
Compare
Rebased against |
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Update the task runner Vault hook and the Vault client so they are able to derive tokens using the task signed workload identity JWT.
The new flow is used whenever a task has an identity with the pattern
vault_<cluster>
, where<cluster>
matches the value of thevault.cluster
config applied to the task.When using JWT and workload identities, the
vault.token
value in the Nomad server configuration will likely be empty. This would cause the legacy Vault client in the Nomad servers to fail on start.When a
vault.default_identity
is given for the default cluster, the Vault client in the servers are replaced by a no-op implementation since it won't be used in any way.Until Nomad 1.9, where the legacy flow will be removed, the
default
Vault cluster can still use the legacy flow. In Nomad ENT this allows operators to mix authentication flows, where thedefault
cluster uses the legacy flow and additional clusters use the JWT flow. Non-default clusters are not allowed to use the legacy flow.Note to reviewers: most code changes are for tests and moving things around to avoid circular dependencies. Main logic is implemented in cf81d72