Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check the node status when leader re-election. #24

Closed
kasonglee opened this issue Aug 5, 2020 · 7 comments
Closed

Check the node status when leader re-election. #24

kasonglee opened this issue Aug 5, 2020 · 7 comments
Assignees
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@kasonglee
Copy link
Contributor

kasonglee commented Aug 5, 2020

Feature Request

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Example: "I have an issue when (...)"

From leader/leader.go, leader re-election works after the default timeout 5-min since the condition
Pod.status.phase == "Failed" && Pod.Status.Reason == "Evicted" when a worker node is failed.
I have an opinion that leader re-election can work almost immediately when the condition contains checking the status of the node where the leader pod is running.

Describe the solution you'd like
A clear and concise description of what you want to happen. Add any considered drawbacks.

Check the condition of the node where the leader pod is running with [Node.Type == "NodeReady" && Node.Status != "ConditionTrue"] and when the node has been failed, delete the leaderPod (only mark the pod with 'terminating' because the node where the pod is running has been failed) and the configmap lock whose OwnerReference is leaderPod)

Pictures below are the test of node-check for leader re-election.(test-operator-xxx-xxxqb was the leaderPod)

3
4
5

Making --pod-eviction-timeout to be short can be another approach. However, I sure that above approach can bring more reliability since we don't know appropriate time out.

And Is there any drawbacks when making --pod-eviction-timeout to be very very short?

@HyungJune

@mhrivnak
Copy link
Member

mhrivnak commented Aug 6, 2020

And Is there any drawbacks when making --pod-eviction-timeout to be very very short?

The use case for the timeout is that a Node gets temporarily disconnected from the cluster but is then able to re-connect. The shorter the timeout, the more risk that the Node will re-appear and all the workloads on it will try to keep running despite having had their Pods deleted. Probably they would not run for long, but it may take some time for the local kubelet to catch up and stop all the containers.

This scenario greatly benefits from fencing. Ideally you allow a separate component to monitor the Node health, and then take action to ensure that a missing Node will not return before deleting its workloads. This can be done for example by powering off the Node's underlying machine.

@kasonglee
Copy link
Contributor Author

Here's what I understood : "making --pod-eviction-timeout to be very short can be risky because it may take some time for the local kubelet to catch up and stop all the containers. The solution is the component to monitor the node health and ensure that the missing node will not be 'Ready' status until deleting its workloads."

I think that like my case, the operator may need the feature of checking the Node health because of the fast leader re-election. so I suggest making a separate go-package for checking the Node health in the operator-lib repository and the package will be used by the operator that run on all the nodes in the cluster (i.e daemon-set).(Of course, I need to modify my code to deleting all workloads on the missing node.)

openshift-merge-robot pushed a commit that referenced this issue Oct 10, 2020
* leader: check the node status for the leader-election (#24)

* leader: enhance the coverage

Co-authored-by: kasong_lee <kasong_lee@tmax.co.kr>
@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 5, 2020
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 5, 2020
@Bryce-huang
Copy link

Is there any solution?

@Bryce-huang
Copy link

@varshaprasad96
Copy link
Member

varshaprasad96 commented Jan 11, 2021

There is still discussion required on modifying the controller-runtime interface to enable the use of leader for life/lease-based leader election. Since this particular issue concerns about waiting for a default timeout, even when the node status is false and a PR has already been merged to solve this, I am closing this issue. Please feel free to open another issue, if you would like to have any other modifications in the current (leader for life) implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

6 participants