Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vault: update task runner vault hook to support workload identity #18534

Merged
merged 6 commits into from
Oct 16, 2023

Conversation

lgfa29
Copy link
Contributor

@lgfa29 lgfa29 commented Sep 18, 2023

Update the task runner Vault hook and the Vault client so they are able to derive tokens using the task signed workload identity JWT.

The new flow is used whenever a task has an identity with the pattern vault_<cluster>, where <cluster> matches the value of the vault.cluster config applied to the task.

When using JWT and workload identities, the vault.token value in the Nomad server configuration will likely be empty. This would cause the legacy Vault client in the Nomad servers to fail on start.

When a vault.default_identity is given for the default cluster, the Vault client in the servers are replaced by a no-op implementation since it won't be used in any way.

Until Nomad 1.9, where the legacy flow will be removed, the default Vault cluster can still use the legacy flow. In Nomad ENT this allows operators to mix authentication flows, where the default cluster uses the legacy flow and additional clusters use the JWT flow. Non-default clusters are not allowed to use the legacy flow.


Note to reviewers: most code changes are for tests and moving things around to avoid circular dependencies. Main logic is implemented in cf81d72

Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great @lgfa29. I've left a few comments around areas I'm unclear on

Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice work on all the added tests. My only potential concern is the handling of recoverable/non-recoverable errors. But once you're happy with that I'm 👍 to merge.

Comment on lines +419 to +422
return "", structs.WrapRecoverable(
fmt.Sprintf("failed to derive Vault token for identity %s: %v", h.widName, err),
err,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, I'm not sure we can know if this is recoverable or not. It could be an issue with the Vault auth config, in which case there's nothing Nomad can do about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also happen due to an intermittent connectivity issue (network blip, Vault agent restart etc.) so making it recoverable allow us to retry.

testCases := []struct {
name string
vaultBlock *structs.Vault
verifyTaskLifecycle func(*trtesting.MockTaskHooks) error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like how this test works 👍

return nil
}

func TestVaultClient_DeriveTokenWithJWT(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a test helper that can skip the test if there's no Vault binary around? That might be nice to reduce wait times and spurious errors for non-CI development.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I will poke around other tests like this 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any check like this so I updated NewTestVault in be3cedd to skip the test if the binary is not found.

@lgfa29
Copy link
Contributor Author

lgfa29 commented Oct 16, 2023

Rebased against main to fix conflict.

Copy link

github-actions bot commented Feb 8, 2025

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 8, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants