-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log warnings when too many leases are active #3957
Conversation
Thanks for digging into this! One immediate piece of feedback: this should really live inside the expiration manager. The reason is that the check for c.expiration not being nil is racy since the state lock isn't grabbed -- but there's no real reason for it to be grabbed, since we already know the lifecycle of the expiration manager. By moving it there, it can just start when the expiration manager starts and stop when it stops. The good news is that the expiration manager already has a function that is invoked periodically, called emitMetrics, and furthermore the pending lock is already grabbed in it! That makes it a great place to do this logging. The main issue is that it's invoked every second. So you'd want to put an atomic (for race safety) counter into the expiration manager that is incremented on every hit to emitMetrics; when it gets above a threshold, emit the warning (if needed) and then reset the counter. |
Thanks for the review @jefferai . I have moved the check into the ExpirationManager emitMetrics function. I have set it to log every 60 seconds. Let me know if this looks alright. |
vault/expiration.go
Outdated
atomic.AddUint32(&m.leaseCheckCounter, 1) | ||
} | ||
} else { | ||
atomic.StoreUint32(&m.leaseCheckCounter, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This else
clause is not necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was an edge case where the counter will not get reset if the value was flapping over the threshold. But, thinking about it now, it is insignificant to warrant the else loop running all the time. I will remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It won't get reset, but I think that's totally fine. It won't display more often than once every 60 seconds, it just may display less if it's flapping.
vault/expiration.go
Outdated
// Check if lease count is greater than the threshold | ||
if num > maxLeaseThreshold { | ||
if atomic.LoadUint32(&m.leaseCheckCounter) > 59 { | ||
m.logger.Warn("expiration: lease count exceeds maximum lease threshold") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "...exceeds warning lease threshold" is nicer or people might think that they have hit a hard max and might hit issues immediately.
quitContext: c.activeContext, | ||
coreStateLock: &c.stateLock, | ||
quitContext: c.activeContext, | ||
leaseCheckCounter: 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this to be initialized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but doesn't hurt, going to merge!
Thanks so much! |
This could be one way of solving #3903 .
If this looks alright, I could add a tunable var to set the threshold.