-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update lease renewer logic #4090
Conversation
It is believed by myself and members of the Nomad team that this logic should be much more robust in terms of causing large numbers of new secret acquisitions caused by a static grace period. See comments in the code for details. Fixes #3414
api/renewer.go
Outdated
|
||
// calculateGrace calculates the grace period based on a reasonable set of | ||
// assumptions given the total lease time; it also adds some jitter to not have | ||
// clients be in sync. We calculate this continuously so long as the new lease |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We calculate this continuously so long as the new lease
+// duration is greater than the previous; no change means we don't need to
+// recalculate, and if the lease duration keeps decreasing we've hit max and
+// want to be able to rely on this.
This is actually done by the callers so shouldn't be on this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed that part of the comment.
@@ -184,6 +178,9 @@ func (r *Renewer) renewAuth() error { | |||
return ErrRenewerNotRenewable | |||
} | |||
|
|||
priorDuration := time.Duration(r.secret.LeaseDuration) * time.Second |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be r.secret.Auth.LeaseDuration
instead of r.secret.LeaseDuration
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! Good catch.
// We keep evaluating a new grace period so long as the lease is | ||
// extending. Once it stops extending, we've hit the max and need to | ||
// rely on the grace duration. | ||
if leaseDuration > priorDuration { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we calculate the new grace if the lease duration doesn't change, which could be a likely common thing? In other words, should this be leaseDuration >= priorDuration
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the lease duration doesn't change, the grace period would be within the same parameters. Recalculating it would just shift the amount of random jitter, which if it's truly random won't either help or hurt, so can be skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense 👍
api/renewer.go
Outdated
|
||
// The sleep duration is set to 2/3 of the current lease duration plus | ||
// 1/3 of the current grace period, which adds jitter. | ||
sleepDuration := time.Duration(float64(leaseDuration.Nanoseconds())*2/3 + float64(r.grace.Nanoseconds()*1/3)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: The argument to time.Duration
is of the form a + b
. In that a
has different parenthesis order in comparison with b
. Can it be the same just for consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this was subtly wrong, good catch.
api/renewer.go
Outdated
|
||
// For a given lease duration, we want to allow 80-90% of that to elapse, | ||
// so the remaining amount is the grace period | ||
r.grace = time.Duration(leaseNanos*0.1) + time.Duration(uint64(r.random.Int63())%uint64(jitterMax)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the initialized jitterMax
var as argument to the time.Duration
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
} | ||
case <-time.After(3 * time.Second): | ||
case <-time.After(5 * time.Second): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be 10
to be able to wait for maximum time possible before we error out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because for any given renewal it should use the default lease so we should expect activity within the default lease period, not max lease period.
case renew := <-v.RenewCh(): | ||
t.Logf("renew called, remaining lease duration: %d", renew.Secret.LeaseDuration) | ||
continue outer | ||
case <-time.After(5 * time.Second): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outside of your thoughts on the wait time in the test, this LGTM!
@preetapan Merging, but feel free to review after the fact. |
It is believed by myself and members of the Nomad team that this logic
should be much more robust in terms of causing large numbers of new
secret acquisitions caused by a static grace period. See comments in the
code for details.
Fixes #3414