Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

identity: support jwt expiration and rotation #18262

Merged
merged 10 commits into from
Sep 8, 2023
Merged

identity: support jwt expiration and rotation #18262

merged 10 commits into from
Sep 8, 2023

Conversation

schmichael
Copy link
Member

@schmichael schmichael commented Aug 18, 2023

What is in this PR:

  • WorkloadIdentity.TTL field for allowing users to specify the lifetime of a specific jwt
  • Clientside token renewal code
  • Validation and testing

What's not in this PR:

  • Still no docs or changelog.
  • Still no e2e tests.
  • WorkloadIdentity.ChangeMode - it shouldn't be necessary for the Consul/Vault refresh as they will use internal plumbing. Still planning on adding ASAP.
  • Changes to root key rotation <- this is the last hard requirement as we need to prepublish keys before using them and ensure we keep old keys as long as unexpired tokens they have signed are still in use.

I felt like this was a nice sized PR and will work on root key rotation next.

Copy link
Contributor

@lgfa29 lgfa29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few minor nit-picks, but LGTM.

While testing to make sure the Prestart hook was called on task restore I encountered a panic:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x10 pc=0x10336152c]

goroutine 247 [running]:
github.com/hashicorp/nomad/client/widmgr.(*WIDMgr).SignIdentities(0x0, 0x14000aa8570?, {0x14000710020, 0x1, 0x1})
        github.com/hashicorp/nomad/client/widmgr/widmgr.go:57 +0x7c
github.com/hashicorp/nomad/client/allocrunner/taskrunner.(*identityHook).getIdentities(0x14000cc44e0, 0x1400086a600, 0x140001d1800)
        github.com/hashicorp/nomad/client/allocrunner/taskrunner/identity_hook.go:141 +0x1a0
github.com/hashicorp/nomad/client/allocrunner/taskrunner.(*identityHook).Prestart(0x104d1bf80?, {0x103de8705?, 0x14000cc44e0?}, 0x140009fc820, 0x1791ffd12?)
        github.com/hashicorp/nomad/client/allocrunner/taskrunner/identity_hook.go:59 +0x3c
github.com/hashicorp/nomad/client/allocrunner/taskrunner.(*TaskRunner).prestart(0x14000cb9880)
        github.com/hashicorp/nomad/client/allocrunner/taskrunner/task_runner_hooks.go:267 +0x608
github.com/hashicorp/nomad/client/allocrunner/taskrunner.(*TaskRunner).Run(0x14000cb9880)
        github.com/hashicorp/nomad/client/allocrunner/taskrunner/task_runner.go:594 +0x5dc
created by github.com/hashicorp/nomad/client/allocrunner.(*allocRunner).runTasks in goroutine 186
        github.com/hashicorp/nomad/client/allocrunner/alloc_runner.go:400 +0x58

But looking at the stack trace, it's not related to these changes, and it happens on main as well, so I opened #18356 to fix it.

task *structs.Task
tokenDir string
envBuilder *taskenv.Builder
tr tokenSetter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tr tokenSetter
ts tokenSetter

var retry uint64

for err := h.stopCtx.Err(); err == nil; {
h.logger.Trace("waiting to renew identities", "num", len(reqs), "wait", wait)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
h.logger.Trace("waiting to renew identities", "num", len(reqs), "wait", wait)

This seems a bit repetitive with the Debug log message in the end of the loop. Since they are at the extremes they always run one after the other:

    2023-08-29T10:23:22.631-0400 [DEBUG] client.alloc_runner.task_runner.task_hook.identity: waiting to renew workloading identities: alloc_id=49c8438f-a91e-701c-2abd-e91750bc03f0 task=hello next=10.716433902s
    2023-08-29T10:23:22.631-0400 [TRACE] client.alloc_runner.task_runner.task_hook.identity: waiting to renew identities: alloc_id=49c8438f-a91e-701c-2abd-e91750bc03f0 task=hello num=1 wait=10.716433902s
    2023-08-29T10:23:33.349-0400 [DEBUG] client.alloc_runner.task_runner.task_hook.identity: getting new signed identities: alloc_id=49c8438f-a91e-701c-2abd-e91750bc03f0 task=hello num=1
    2023-08-29T10:23:33.352-0400 [TRACE] client: next heartbeat: period=12.479408493s
    2023-08-29T10:23:33.352-0400 [DEBUG] client.alloc_runner.task_runner.task_hook.identity: waiting to renew workloading identities: alloc_id=49c8438f-a91e-701c-2abd-e91750bc03f0 task=hello next=10.997138552s
    2023-08-29T10:23:33.352-0400 [TRACE] client.alloc_runner.task_runner.task_hook.identity: waiting to renew identities: alloc_id=49c8438f-a91e-701c-2abd-e91750bc03f0 task=hello num=1 wait=10.997138552s

@lgfa29
Copy link
Contributor

lgfa29 commented Sep 1, 2023

Just noting here that with #18363 merged the new TTL field should also be added to config.WorkloadIdentityConfig.

Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

nomad/structs/workload_id.go Outdated Show resolved Hide resolved
client/allocrunner/taskrunner/identity_hook.go Outdated Show resolved Hide resolved
@schmichael
Copy link
Member Author

Just noting here that with #18363 merged the new TTL field should also be added to config.WorkloadIdentityConfig.

I was defeated by HCL parsing and didn't want to block this PR until I figured it out. Created #18438

Sorry :(

@schmichael schmichael merged commit ef24e40 into main Sep 8, 2023
@schmichael schmichael deleted the f-wid-exp branch September 8, 2023 21:50
@lgfa29 lgfa29 restored the f-wid-exp branch September 11, 2023 13:28
@lgfa29 lgfa29 deleted the f-wid-exp branch September 12, 2023 04:36
lgfa29 added a commit that referenced this pull request Sep 27, 2023
The initial intention behind the `vault.use_identity` configuration was
to indicate to Nomad servers that they would need to sign a workload
identities for allocs with a `vault` block.

But in order to support identity renewal, #18262 and #18431 moved the
token signing logic to the alloc runner since a new token needs to be
signed prior to the TTL expiring.

So #18343 implemented `use_identity` as a flag to indicate that the
workload identity JWT flow should be used when deriving Vault tokens for
tasks.

But this configuration value is set on servers so it is not available to
clients at the time of token derivation, making its meaning not clear: a
job may end up using the identity-based flow even when `use_identity` is
`false`.

The only reliable signal available to clients at token derivation time
is the presence of an `identity` block for Vault, and this is already
configured with the `vault.default_identity` configuration block, making
`vault.use_identity` redundant.

This commit removes the `vault.use_identity` configuration and
simplifies the logic on when an implicit Vault identity is injected into
tasks.
lgfa29 added a commit that referenced this pull request Sep 27, 2023
The initial intention behind the `vault.use_identity` configuration was
to indicate to Nomad servers that they would need to sign a workload
identities for allocs with a `vault` block.

But in order to support identity renewal, #18262 and #18431 moved the
token signing logic to the alloc runner since a new token needs to be
signed prior to the TTL expiring.

So #18343 implemented `use_identity` as a flag to indicate that the
workload identity JWT flow should be used when deriving Vault tokens for
tasks.

But this configuration value is set on servers so it is not available to
clients at the time of token derivation, making its meaning not clear: a
job may end up using the identity-based flow even when `use_identity` is
`false`.

The only reliable signal available to clients at token derivation time
is the presence of an `identity` block for Vault, and this is already
configured with the `vault.default_identity` configuration block, making
`vault.use_identity` redundant.

This commit removes the `vault.use_identity` configuration and
simplifies the logic on when an implicit Vault identity is injected into
tasks.
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 14, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants