Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leader remove runlocal #7

Merged
merged 12 commits into from
Jul 24, 2020

Conversation

jmrodri
Copy link
Member

@jmrodri jmrodri commented Jul 22, 2020

No description provided.

@jmrodri jmrodri changed the title Leader remove runlocal WIP: Leader remove runlocal Jul 22, 2020
@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 22, 2020
@coveralls
Copy link

coveralls commented Jul 22, 2020

Pull Request Test Coverage Report for Build 181321825

  • 23 of 36 (63.89%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-25.9%) to 74.074%

Changes Missing Coverage Covered Lines Changed/Added Lines %
leader/leader.go 23 36 63.89%
Totals Coverage Status
Change from base Build 180379840: -25.9%
Covered Lines: 220
Relevant Lines: 297

💛 - Coveralls

leader/leader.go Outdated Show resolved Hide resolved
leader/leader.go Outdated Show resolved Hide resolved
@jmrodri jmrodri changed the title WIP: Leader remove runlocal Leader remove runlocal Jul 23, 2020
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 23, 2020
leader/leader.go Outdated Show resolved Hide resolved
leader/leader_test.go Outdated Show resolved Hide resolved
Copy link
Member

@joelanford joelanford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more review round (sorry!)

And another thing that just popped into my head. It occurs to me that controller-runtime has some pre- and post- leader-election runnables as part of the manager. I wonder if it would be possible to shim this leader election implementation into controller-runtime. This point is not a PR a blocker, but it would be nice to handle in a follow-up if it is possible. I'll investigate and report back.

Comment on lines 15 to 16
var _ = Describe("Leader", func() {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this Describe layer is unnecessary. Every test in this suite (i.e. package) is a leader test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remove it, it doesn't seem to work. I renamed it to Leader election to better describe it. I would say, it works so I'd rather not change this.

leader/leader.go Outdated
var log = logf.Log.WithName("leader")

// maxBackoffInterval defines the maximum amount of time to wait between
// attempts to become the leader.
const maxBackoffInterval = time.Second * 16

type Option func(*config) error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah crap. I totally messed this up. Now that I'm seeing this, I feel like it's a little user-hostile. If we don't export Config, users can't provide their own Option implementations.

Given that we only have one field to set, it might not seem important now, but if we add more fields in the future, we want to give users the flexibility to write their own Options.

Opening this up later would be a breaking change to the Option definition.

So given all of this, I think we want:

type Option func(*Config) error

type Config struct { ... }

// now that Config is exported, setDefaults would need 
// to be unexported since this only needs to be called
// by us after all options have run.
func (c *Config) setDefaults() error { ... }

Sorry for the back and forth!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted the commit that unexported Config. I then added a new commit to unexport just setDefaults

@joelanford
Copy link
Member

Following up about controller-runtime manager integration.

It doesn't seem immediately do-able given controller-runtime's direct use of resourcelock.Interface, which seems to have a hardcoded expectation of using leases.

Here's where the manager starts the pre-leader-election runnables vs the post-leader-election runnables: https://github.com/kubernetes-sigs/controller-runtime/blob/ab02953d0bcb6467333fcc5809eb1df83a8d89c4/pkg/manager/internal.go#L465-L478.

The only way I can see this working would be for controller-runtime to give callers a hook into startLeaderElection to override the default. Perhaps it could abstract the leader election implementation to something that allows a LeaderCallbacks to be injected and implements Run(context.Context). So perhaps:

type LeaderElector interface {
    Run(context.Context)
}

And then a new manager Option field:

type Option struct {
	// NewLeaderElector overrides the default lease-based leader elector
	// and allows callers to use their own leader election implementation.
	NewLeaderElector func(leaderelection.LeaderCallbacks) LeaderElector
}

This would definitely require a controller-runtime proposal and thorough discussion, so not anything that could happen anytime really soon.

// Become ensures that the current pod is the leader within its namespace. If
// run outside a cluster, it will skip leader election and return nil. It
// continuously tries to create a ConfigMap with the provided name and the
// current pod set as the owner reference. Only one can exist at a time with
// the same name, so the pod that successfully creates the ConfigMap is the
// leader. Upon termination of that pod, the garbage collector will delete the
// ConfigMap, enabling a different pod to become the leader.
func Become(ctx context.Context, lockName string) error {
func Become(ctx context.Context, lockName string, opts ...Option) error {
Copy link
Contributor

@camilamacedo86 camilamacedo86 Jul 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We replaced the sdk impl for the controller-runtime one. So, I do not think that we need to keep it at all.

@joelanford why we will keep our implementation since all opers now are using the controller-runtime one? If yes, i need to revert the change made in operator-framework/operator-sdk#3514.

c/c @jmrodri

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@camilamacedo86 my understanding was that this is a leader for life and we kept it for someone that wanted to have leader for life. It will not be used in the SDK but could be used by an operator developer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of existing operators are using the leader-for-life implementation. They'll either need a way to continue using that, or will need an easy path to migrating to the lease-based election.

These days the lease-based election is easy to use and robust, which hasn't always been the case. But even still, each pattern has advantages and disadvantages.


err := Become(context.TODO(), "leader-test", WithClient(client))
Expect(err).Should(BeNil())
})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test case for returning ErrNoNamespace from Become and using errors.Is(err, ErrNoNamespace)?

Copy link
Member

@joelanford joelanford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 24, 2020
@openshift-ci-robot
Copy link

New changes are detected. LGTM label has been removed.

@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Jul 24, 2020
@jmrodri jmrodri merged commit dca3fec into operator-framework:main Jul 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants