Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fingerprint: add support for fingerprinting multiple Vault clusters #18253

Merged
merged 4 commits into from
Aug 18, 2023

Conversation

tgross
Copy link
Member

@tgross tgross commented Aug 17, 2023

Add fingerprinting we'll need to accept multiple Vault clusters in upcoming Nomad Enterprise features. The fingerprinter will create a map of Vault clients by cluster name. In Nomad CE, all but the default cluster will be ignored and there will be no visible behavior change.

Ref: https://github.com/hashicorp/team-nomad/issues/404
Don't merge until we've got a ENT repo PR ready for the vault_ent.go file.

@tgross tgross changed the title Multiple vault cluster fingerprint fingerprint: add support for fingerprinting multiple Vault clusters Aug 17, 2023
@tgross tgross added this to the 1.7.0 milestone Aug 17, 2023
@tgross tgross force-pushed the multiple-vault-cluster-fingerprint branch from 6a3f073 to 5e41ce4 Compare August 17, 2023 17:56
Add fingerprinting we'll need to accept multiple Vault clusters in upcoming
Nomad Enterprise features. The fingerprinter will create a map of Vault clients
by cluster name. In Nomad CE, all but the default cluster will be ignored and
there will be no visible behavior change.
Comment on lines +10 to +19
// vaultConfigs returns the set of Vault configurations the fingerprint needs to
// check. In Nomad CE we only check the default Vault.
func (f *VaultFingerprint) vaultConfigs(req *FingerprintRequest) map[string]*config.VaultConfig {
agentCfg := req.Config
if agentCfg.VaultConfig == nil || !agentCfg.VaultConfig.IsEnabled() {
return nil
}

return map[string]*config.VaultConfig{"default": agentCfg.VaultConfig}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewers: once I get to the point where I'm ready to start instantiating multiple clients in client/vaultclient, I'll probably pull this out to some shared helper, but I want to keep the review sizes small in the meanwhile.

@tgross tgross marked this pull request as ready for review August 17, 2023 19:55
@tgross tgross requested review from lgfa29 and pkazmierczak August 17, 2023 19:55
lgfa29
lgfa29 previously approved these changes Aug 17, 2023
Comment on lines 102 to 105
for _, lastState := range f.lastStates {
if lastState != vaultAvailable {
return true, 15 * time.Second
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not worth changing anything, but I think this means that a single cluster not being available will cause all cluster to be fingerprinted at this accelerated interval.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the architecture of periodic fingerprinting means we'd end up spinning up a separate fingerprinter goroutine for each Vault (and Consul, once that's done). Maybe we could do something a little more sophisticated here though and fire the fingerprinter goroutine without necessarily sending the API calls to healthy Vaults on each pass.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave this a go in 05a9047. The results are a little dissatisfying. The healthy Vaults effectively get checked on a fixed 45s interval when one of the other Vaults is unhealthy. But this avoids waking up the outer loop unnecessarily and avoids desync where healthy vaults get accidentally skipped for one cycle when the group as a whole skips from unhealthy to healthy.

Copy link
Contributor

@lgfa29 lgfa29 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix for the fingerprint timing! I think the 45s period for healthy clusters is OK. There will be a longer than usual delay in case of cascading Vault cluster failures, but by then fingerprint period is the least of your problems 😄

Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 18, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants