Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set LeaderElectionReleaseOnCancel to the scaffolded main.go #2593

Closed
AlmogBaku opened this issue Apr 3, 2022 · 16 comments · Fixed by #2596
Closed

Set LeaderElectionReleaseOnCancel to the scaffolded main.go #2593

AlmogBaku opened this issue Apr 3, 2022 · 16 comments · Fixed by #2596
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@AlmogBaku
Copy link
Member

What do you want to happen?

Kubebuilder is generating main.go scripts that are getting closed immediately when the Manager is done.
Given that, we can add LeaderElectionReleaseOnCancel: true to the manager (with an explanation comment) to improve the performance of re-election.

Extra Labels

No response

@AlmogBaku AlmogBaku added the kind/feature Categorizes issue or PR as related to a new feature. label Apr 3, 2022
@NikhilSharmaWe
Copy link
Member

I would like to work on this.
/assign

@camilamacedo86
Copy link
Member

camilamacedo86 commented Apr 4, 2022

See : https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/manager/manager.go#L171-L176

// LeaderElectionReleaseOnCancel defines if the leader should step down voluntarily
// when the Manager ends. This requires the binary to immediately end when the
// Manager is stopped, otherwise this setting is unsafe. Setting this significantly
// speeds up voluntary leader transitions as the new leader doesn't have to wait
// LeaseDuration time first.

Questions:

a) why should we use it by default?
b) why in the PR (https://github.com/kubernetes-sigs/kubebuilder/pull/2596/files) was suggested to set it only when is not {{ if not .ComponentConfig }}?

Additionally you say:

@joelanford @camilamacedo86 @AlmogBaku
Error logs says vet: ./main.go:64:3: unknown field LeaderElectionReleaseOnCancel in struct literal. Unable to understand the cause of it. The function NewManager uses type Options as one of the parameters, since Options is actually manager.Options and since it has LeaderElectionReleaseOnCancel field in it why this error is being displayed.

c) What errors logs? Can you please share the link?
d) How to reproduce the scenario to check the error?

Note that by performing the following steps I cannot see any error:

$ cd /testdata/project-v3
$ go mod tidy
$ go vet ./...

OR

By running make all to call all:

r$ make all
GOBIN=/Users/camilamacedo86/go/src/sigs.k8s.io/kubebuilder/testdata/project-v3/bin go install sigs.k8s.io/controller-tools/cmd/controller-gen@v0.8.0
/Users/camilamacedo86/go/src/sigs.k8s.io/kubebuilder/testdata/project-v3/bin/controller-gen object:headerFile="hack/boilerplate.go.txt" paths="./..."
go fmt ./...
go vet ./...
go build -o bin/manager main.go

c/c @NikhilSharmaWe @AlmogBaku

@NikhilSharmaWe
Copy link
Member

@camilamacedo86 Error is solved now. It was showing because I added LeaderElectionReleaseOnCancel for go/v2 also and this feature wasn't supported in the controller-runtime:v0.6.4 which is the version that the go/v2 plugin uses. Now I am patching only go/v3.

@AlmogBaku
Copy link
Member Author

Regarding (1)-

As the comment mentioned, when the program is terminated immediately after the Manager is done, it yields better performance and it's safe to use (or more accurately - not "not safe" :P).

Since Kubebuilder scaffolding main.go that is anyway behave like that, I suggest that it'll better to add it as a convention with a comment and provide with the best practices.

@camilamacedo86
Copy link
Member

camilamacedo86 commented Apr 4, 2022

Hi @AlmogBaku,

By not setting the option that means use its default value.
The usage of the default value is what is considered more appropriate in common cases.
So, if you would like to suggest that the best would be LeaderElectionReleaseOnCancel: true, by default then that needs to be done via controller runtime and not via Kuebuilder. I am afraid that by default this option should not be true. (unless KB would be providing any specific plugin/helper that justify do not use the default value for that case)

In this way, I'd like to suggest you raise it in controller runtime and begin a discussion there.

IMO: Would be great we have in its docs when the option should or not be true? What are the use cases where it is recommended to use that? So after discussing this topic with controller-runtime maintainers we can understand all caveats and then contribute with additions to the doc.

WDYT? Could we move forward in this way?

Hi @NikhilSharmaWe,

Thank you for letting us know that the problem does not exist with the default scaffolds.

So, @AlmogBaku and @NikhilSharmaWe:

Could we agree to close the PR (https://github.com/kubernetes-sigs/kubebuilder/pull/2596/files) as deferred, close this issue and then begin this discussion on the controller runtime if you see that fits?

@AlmogBaku
Copy link
Member Author

Unlike with the controller-runtime, with Kubebuilder the generated code runs the Runnables throuhgh the Manager, unless you intentionally delete it (hence- helper that justify it).

This is not required when using the controller-runtime, and developers can implement it however they desire (which is why it's not defaulted). Therefore - I think it should be done in Kubebuilder, and the C+R default value should remain the same.

I suggest we'll talk about it in the Triage meeting. WDYT @camilamacedo86 ?

@camilamacedo86
Copy link
Member

camilamacedo86 commented Apr 4, 2022

Kubebuilder is only a tool to scaffold the files to create "controller-runtime" based projects, see;

Screenshot 2022-04-04 at 18 38 04

Sorry, but then I do not follow your argument.

I'd suggest we begin here with:

Setting this significantly
// speeds up voluntary leader transitions as the new leader doesn't have to wait
// LeaseDuration time first.

When is a good idea or not to wait for the leaseDuration?
What are the problems that can be faced if we do not wait for the leaseDuration
So, if that is good to be used by default way controller-runtime does not make it is the default behaviour?

IMO: It does not seem to be a good idea to be used by default since it can cause concurrence issues. You have 2 leaders doing the same reconciliations.

@NikhilSharmaWe
Copy link
Member

Please inform me if there will be a meeting for this discussion, I would like to join in.

@camilamacedo86
Copy link
Member

@AlmogBaku
Copy link
Member Author

Hi,
Our scaffolded code is prefectly meeting the requirement - therefore, I do not see any way it can disrupt or cause the issues you mentioned.
If you are concerned about people misusing this, we can always add a comment.

I suggest we'll discuss it in our meeting.

@camilamacedo86
Copy link
Member

camilamacedo86 commented Apr 22, 2022

Hi @AlmogBaku,

Following is a summary and my vote about this one.
IMO we should NOT add this option. It has my vote -1. The main reasons are:

  • a) Kubebuilder provides a scaffold based on the default controller-runtime configuration. We have no intention of adding optional features and options that should be used to address specific needs to the default scaffold. That goes against its goals

  • b) Controller runtime has default values for all optional options defined to the manager. The default values defined for the options are what is expected in common cases scenarios. So, if you think this value should be true because that is a good practice and is the expected manager behaviour, in general, for all common cases then, it needs to be discussed and addressed via controller runtime and not Kubebuilder.

  • c) I am afraid that I cannot go along with your statement that LeaderElectionReleaseOnCancel optional as true is a good practice and the best default value for all common cases scenarios by reading its documentation:

See https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/manager/manager.go#L171-L176

// LeaderElectionReleaseOnCancel defines if the leader should step down voluntarily
// when the Manager ends. This requires the binary to immediately end when the
// Manager is stopped, otherwise, this setting is unsafe. Setting this significantly
// speeds up voluntary leader transitions as the new leader don't have to wait
// LeaseDuration time first.

My understanding here is that this could result in the old and new leaders doing the same reconciliations, and because of this, it is unsafe for the common scenarios.

@AlmogBaku
Copy link
Member Author

AlmogBaku commented Apr 23, 2022

The reason C+R states it's not defaulted as true is that it "requires the binary to immediately end when the
Manager is stopped"
.
Because C+R cannot enforce it, the default value is set as false.

That being said, that's not the case for our generated main (which immediately end when the Manager is stopped).

@joelanford
Copy link
Member

I would worry that folks would change their main.go (which we tell them is totally allowed) such that the binary no longer exits when the manager stops.

Most folks probably don't take the time to understand every default kubebuilder uses, so they'd likely leave this setting enabled despite it now being unsafe.

Could we add this to the scaffold with a comment about what it is, but then leave it commented out? That way

  1. It's more discoverable, so more folks use it
  2. It keeps the default as the safest possible configuration.
  3. It ostensibly means that when the operator author decides to enable it, they've read the comment about it and understand the consequences.

@camilamacedo86
Copy link
Member

camilamacedo86 commented Apr 26, 2022

I am ok with the proposed idea by @joelanford here as well.
So that would mean we scaffold the option LeaderElectionReleaseOnCancel with the value true but commented.
Also, we should add a comment on top that clarifies when is or not safe to use it.
Something like:

// LeaderElectionReleaseOnCancel defines if the leader should step down voluntarily
// when the Manager ends. This requires the binary to immediately end when the
// Manager is stopped, otherwise, this setting is unsafe. Setting this significantly
// speeds up voluntary leader transitions as the new leader don't have to wait
// LeaseDuration time first.
//
// In the default scaffold provided, the program ends immediately after 
// the manager stops, so would be fine to enable this option. However, 
// if you are doing or is intended to do any operation such as perform cleanups 
// after the manager stops then its usage might be unsafe.
// LeaderElectionReleaseOnCancel: true,

WDYT @AlmogBaku @NikhilSharmaWe would that be acceptable?
Could we move in this way?

@AlmogBaku
Copy link
Member Author

+1

@camilamacedo86
Copy link
Member

@NikhilSharmaWe, see I think we could reach a consensus here. Could you update your PR so we get this one merged as described here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
4 participants