Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip checking Vault health #8524

Merged
merged 4 commits into from
Aug 11, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 12 additions & 10 deletions nomad/vault.go
Original file line number Diff line number Diff line change
Expand Up @@ -465,17 +465,18 @@ OUTER:
case <-v.tomb.Dying():
return
case <-retryTimer.C:
// Ensure the API is reachable
if !initStatus {
if _, err := v.clientSys.Sys().InitStatus(); err != nil {
v.logger.Warn("failed to contact Vault API", "retry", v.config.ConnectionRetryIntv, "error", err)
retryTimer.Reset(v.config.ConnectionRetryIntv)
continue OUTER
}
initStatus = true
}
// Retry validating the token till success
if err := v.parseSelfToken(); err != nil {
// if parsing token fails, try to distinguish legitimate token error from transient Vault initialization/connection issue
if !initStatus {
if _, err := v.clientSys.Sys().Health(); err != nil {
v.logger.Warn("failed to contact Vault API", "retry", v.config.ConnectionRetryIntv, "error", err)
retryTimer.Reset(v.config.ConnectionRetryIntv)
continue OUTER
}
initStatus = true
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@schmichael Can I get a second review please? I have restored the health check because of the way CreateToken handles connection errors:

nomad/nomad/vault.go

Lines 951 to 956 in a3b4f06

// Check if we have established a connection with Vault
if established, err := v.ConnectionEstablished(); !established && err == nil {
return nil, structs.NewRecoverableError(fmt.Errorf("Connection to Vault has not been established"), true)
} else if err != nil {
return nil, err
}

In the above code, we aim to detect if the server failed to make any API contact, vault will return a recoverable error, but if /sys/init (or health now) succeeds, then it marks the token failures as permenant.

I don't buy into this distinction much, but to be conservative in a point release, I brought back the health check. However, I made reversed the order of the check - only check health if parseToken failed. It should be functionally the same as before.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we instead make errors from parseSelfToken recoverable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't buy into the distinction but I'm afraid of making of any unintended side-effects considering it seems it was explicitly designed this way and this file has bitten as before. I'm more inclined to keep the change conservative for 0.12.X and make more ambitious refactor in a major release.


v.logger.Error("failed to validate self token/role", "retry", v.config.ConnectionRetryIntv, "error", err)
retryTimer.Reset(v.config.ConnectionRetryIntv)
v.l.Lock()
Expand All @@ -484,6 +485,7 @@ OUTER:
v.l.Unlock()
continue OUTER
}

break OUTER
}
}
Expand Down Expand Up @@ -1239,7 +1241,7 @@ func (v *vaultClient) revokeDaemon() {
case <-v.tomb.Dying():
return
case now := <-ticker.C:
if established, _ := v.ConnectionEstablished(); !established {
if established, err := v.ConnectionEstablished(); !established || err != nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code here was introduced in 42f7bc8#diff-87a4c76eca3de54716c55ba6b8c06475 - where established is true only when there is no error. So checking error wasn't strictly necessary. After many modifications, this file consistently checks for established && err == nil, but we missed this spot.

For some reason, not so clear to me, #3957 chose to represent invalid tokens as established with true but non-nil error. It seems like we missed this spot.

continue
}

Expand Down
3 changes: 2 additions & 1 deletion nomad/vault_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1015,6 +1015,7 @@ func TestVaultClient_LookupToken_RateLimit(t *testing.T) {

// Spin up many requests. These should block
ctx, cancel := context.WithCancel(context.Background())
defer cancel()

cancels := 0
numRequests := 20
Expand All @@ -1028,7 +1029,7 @@ func TestVaultClient_LookupToken_RateLimit(t *testing.T) {
cancels += 1
return
}
t.Fatalf("self lookup failed: %v", err)
t.Errorf("self lookup failed: %v", err)
return
}

Expand Down