Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replacing all nomad server nodes results in inability to renew vault tokens #3987

Closed
jurajseffer opened this issue Mar 15, 2018 · 3 comments
Closed

Comments

@jurajseffer
Copy link

Nomad version

Nomad v0.7.1

Operating system and Environment details

Centos 7

Issue

Replacing all nomad servers results in Nomad not being able to renew application tokens.

Reproduction steps

Vault config:

auth "aws" {
      type = "aws"
      role "nomad-cluster" {
        policies                       = "nomad-server"
        auth_type                      = "ec2"
        max_ttl                        = "6h"
        period                         = "1h"
        allow_instance_migration       = false
        bound_iam_role_arn             = "arn:aws:iam::***"
      }
    }

auth "token" {
      type = "token"
      role "nomad-cluster" {
        disallowed_policies = "nomad-server"
        explicit_max_ttl    = 0
        name                = "nomad-cluster"
        orphan              = false
        period              = 3600
        renewable           = true
      }
   }

Replace all nomad server nodes (EC2 instances) in a proper rolling fashion, leader as last etc. Shortly after this, some applications are unable to use the token provided by Nomad to talk to Vault and we replace the worker nodes to alleviate the problem.

Nomad Client logs (if appropriate)

This particular entry (numerous times) is only observed after completely replacing Nomad servers.
[ERR] client.vault: renewal of token failed: failed to renew the vault token: Error making API request.
Expanding the above error with more information about why the request failed would be useful.

We also see these:
[ERR] client: failed to renew Vault token for task app on alloc "93e0a494-df1c-e908-6e13-3d6080759e01": failed to renew the vault token: Error making API request.

@dadgar
Copy link
Contributor

dadgar commented Mar 15, 2018

Hey @jurajseffer,

Sorry you hit this! This likely occurred since the new Nomad servers were given different Vault tokens than the servers they replaced. Since Nomad historically generates tokens for tasks with Orphan set to false, when the old Nomad tokens expired, your tasks tokens also got revoked. This behavior will change in Nomad 0.8: #3992

So I suggest you upgrade when it is released (few weeks) and set the token role to allow orphaned tokens.

@dadgar dadgar closed this as completed Mar 15, 2018
@jurajseffer
Copy link
Author

Thanks @dadgar. Do you happen to know what the easiest workaround is once nomad servers are replaced? Us replacing nomad clients and rescheduling tasks to new hosts fixes it but is there an easier way?

@github-actions
Copy link

github-actions bot commented Dec 1, 2022

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants