-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Manager.Elected() should beclosed after runnables are started #1354
🐛 Manager.Elected() should beclosed after runnables are started #1354
Conversation
During debugging a weird issue with some tests failing if they were running too fast. In other words, calling Start() and Close() on a manager too fast throws an error where the informers haven't been able to sync, which then makes Start() fail with an error. In an effort to improve this behavior, tried to use Elected() to wait for leader election to start, which also waits for the cache. During some debugging, this issue happened again and upon digging a bit more it seems that the channel was closed before starting the runnables in some cases. This change reorders the close on cm.elected, which should fix the above issue. Signed-off-by: Vince Prignano <vincepri@vmware.com>
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vincepri The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Isn't the purpose of this channel to tell consumers when we are elected? Why do we need to start Runnables first? The starting is async anyways, so I don't think this will do much in terms of consistent behavior |
The behavior is inconsistent in the two places, in one place we wait for the cache to sync and then close the channel, while in the other we were firing off a goroutine and then closing the channel. The reordering of operation is to make sure we're consistent |
/retest |
/lgtm but I'd like to get one more set of eyes on it to approve, just in case |
@@ -473,8 +473,8 @@ func (cm *controllerManager) Start(ctx context.Context) (err error) { | |||
} | |||
} else { | |||
// Treat not having leader election enabled the same as being elected. | |||
cm.startLeaderElectionRunnables() | |||
close(cm.elected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vincepri this is consistent today. We first close the channel, the we start the Runnables.
Also, is it intended to make the cm.StartLeaderElectionRunnables
synchronous?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that was an intentional change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah now I get it, this is abusing the fact that startLeaderElectionRunnables
synchronously waits for the cache to be synced. A tad implicit and has a good chance to be accidentally broken in the future again, but seems reasonable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now leader and leaderless starting of pre-election runnables is the same, that makes sense to me.
/hold for more 👀 |
/hold cancel |
During debugging a weird issue with some tests failing if they were
running too fast. In other words, calling Start() and Close() on
a manager too fast throws an error where the informers haven't been able
to sync, which then makes Start() fail with an error.
In an effort to improve this behavior, tried to use Elected() to wait
for leader election to start, which also waits for the cache.
During some debugging, this issue happened again and upon digging a bit
more it seems that the channel was closed before starting the runnables
in some cases.
This change reorders the close on cm.elected, which should fix the above
issue.
Signed-off-by: Vince Prignano vincepri@vmware.com