Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Window for per-task Vault token renewal time extremely wide by design? #5471

Closed
cheeseprocedure opened this issue Mar 26, 2019 · 3 comments · Fixed by #5479
Closed

Window for per-task Vault token renewal time extremely wide by design? #5471

cheeseprocedure opened this issue Mar 26, 2019 · 3 comments · Fixed by #5479

Comments

@cheeseprocedure
Copy link
Contributor

Nomad version

Nomad v0.8.7 (21a2d93eecf018ad2209a5eab6aae6c359267933+CHANGES)

Vault version

1.0.3+prem.hsm (via /v1/sys/health)

Operating system and Environment details

Ubuntu 16.04LTS

Issue

We've been using Nomad's native Vault integration to issue tokens with a TTL of 72 hours. Until recently, one of our customers used this token only at bootstrap time. They've now adopted Vault's transit backend, and make regular use of the token throughout the guest instance's lifespan. As part of that work, the customer also implemented simple monitoring of the token's TTL.

I had provided (inaccurate) guidance that tokens were refreshed at approximately 1/2 lease duration. In our case, we expected that to be 36 hours, so a 24 hour threshold was chosen for the monitor. Shortly after configuring alerts against the monitor, the customers discovered token TTL was frequently dipping below the 24 hour threshold.

We've dug into this a bit, and it appears Nomad used to attempt token renewal at ~1/2 leaseDuration. We're trying to understand the reason for the current behaviour, which, if I am reading correctly, will choose a token renewal time between 10 and (lease duration - 10 seconds) seconds in the future (assuming a leaseDuration of >= 30 sec).

While it makes good sense to spread out token renewal times, this window seems very wide. Was the previous renewal attempt time of ~1/2 leaseDuration abandoned intentionally?

(Our current workloads are admittedly a bit of a funny fit: we inject per-task Vault tokens into VMs via cloud-init, hence the interest in reducing the risk of e.g. delaying token refresh until near-EOL and experiencing expired tokens due to brief Vault service disruptions.)

Reproduction steps

Schedule any job with Vault integration.

Nomad Server logs (if appropriate)

n/a

Nomad Client logs (if appropriate)

n/a

Job file (if appropriate)

n/a

@schmichael
Copy link
Member

Good question. I'm chatting with the Vault team internally about what the appropriate behavior should be as I agree: the current behavior seems problematic.

@schmichael
Copy link
Member

Confirmed this is a bug. The intended behavior is to use leaseDuration/2 as the max, not the full lease duration!

https://github.com/hashicorp/nomad/blob/v0.8.7/client/vaultclient/vaultclient.go#L420

@schmichael schmichael changed the title [question] Window for per-task Vault token renewal time extremely wide by design? Window for per-task Vault token renewal time extremely wide by design? Mar 27, 2019
@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants