Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vault accessors doesn't get deleted #7953

Closed
jorgemarey opened this issue May 14, 2020 · 3 comments · Fixed by #7959
Closed

Vault accessors doesn't get deleted #7953

jorgemarey opened this issue May 14, 2020 · 3 comments · Fixed by #7959

Comments

@jorgemarey
Copy link
Contributor

Nomad version

Nomad v0.10.5

Issue

Vault accessors doesn't get deleted.

We found out that our nomad servers in one environment had been using a lot more memory than they should. We took a look at the usage using the golang profiler and saw the following:

accessors

In the image you can see that most of the memory is thought the function VaultAccessorRestore

We had a problem a few weeks ago where nomad started creating vault tokens for allocations all the time. For context, we have some clients in the cluster that are not managed by our team. These clients access vault via a Load balancer while the servers access them using consul service DNS. The load balancer certificate changed and these clients stopped trusting vault.

ERR] client.vault: failed to derive token for allocation "437e671e-be25-b950-f22a-a84b649e9dfb" and tasks [task]: failed to unwrap the token for task "my-task": Put https://my-load-balancer.internal/v1/sys/wrapping/unwrap: x509: certificate signed by unknown authority

It seems that Nomad is storing all those tokens (about 500k) information in the raft database.

Looking at the code it seems like when a server adquire leadership it revokes vault accessors, but as it fails to find the accessor in vault (the tokens were already revoked) it stops and doesn't delete them from raft.

{"@level":"warn","@message":"failed to revoke tokens. Will reattempt until TTL","@module":"nomad.vault","@timestamp":"2020-05-12T17:19:28.249653Z","error":"failed to revoke token (alloc: \"58c8f4dc-1d98-c6a0-4492-babbae44d9ed\", node: \"712c90c0-cbc9-432b-3e32-b1c72237be15\", task: \"manager\"): Error making API request.\n\nURL: POST https://vault.service.consul:8200/v1/auth/token/revoke-accessor\nCode: 400. Errors:\n\n* 1 error occurred:\n\t* invalid accessor\n\n"}

I think that nomad should remove those tokens also when vault doesn't know about them. Any thoughts?

@notnoop notnoop added this to Needs Triage in Nomad - Community Issues Triage via automation May 14, 2020
@notnoop
Copy link
Contributor

notnoop commented May 14, 2020

Thanks @jorgemarey for raising this. I agree: if nomad gets "invalid accessor" error, it should clear the token from memory.

Curious to know if the memory does drop after the token TTL expire?

@jorgemarey
Copy link
Contributor Author

Hi @notnoop, thanks for the quick answer. We had this issue over a month ago and since that the memory didn't decrease.

Nomad - Community Issues Triage automation moved this from Needs Triage to Done May 14, 2020
@tgross tgross added this to the 0.11.3 milestone May 14, 2020
@github-actions
Copy link

github-actions bot commented Nov 7, 2022

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 7, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants