Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

leader: check the node status for the leader-election (#24) #25

Merged
merged 2 commits into from
Oct 10, 2020

Conversation

HyungJune
Copy link
Contributor

Description of the change:
node-check for leader re-election #24

Motivation for the change:
#24
Check the condition of the node where the leader pod is running with [Node.Type == "NodeReady" && Node.Status != "ConditionTrue"] and when the node has been failed, delete the leaderPod (only mark the pod with 'terminating' because the node where the pod is running has been failed) and the configmap lock whose OwnerReference is leaderPod).

This approach can bring more reliability

@coveralls
Copy link

coveralls commented Aug 6, 2020

Pull Request Test Coverage Report for Build 232104749

  • 32 of 40 (80.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.005%) to 80.047%

Changes Missing Coverage Covered Lines Changed/Added Lines %
leader/leader.go 32 40 80.0%
Totals Coverage Status
Change from base Build 195168809: -0.005%
Covered Lines: 341
Relevant Lines: 426

💛 - Coveralls

leader/leader.go Outdated
@@ -167,6 +167,7 @@ func Become(ctx context.Context, lockName string, opts ...Option) error {
log.Info("Leader lock configmap owner reference must be a pod.", "OwnerReference", existingOwners[0])
default:
leaderPod := &corev1.Pod{}
// leaderNode := &corev1.Node{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not merge code commented

Copy link
Contributor

@kasonglee kasonglee Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@camilamacedo86 thanks, I will delete the comment at the new commit.

leader/leader.go Outdated
@@ -167,6 +167,7 @@ func Become(ctx context.Context, lockName string, opts ...Option) error {
log.Info("Leader lock configmap owner reference must be a pod.", "OwnerReference", existingOwners[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add the logic of the fragments to here as well for we are able to generate the changelog.

c/c @jmrodri

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's really no changelog fragment for the operator-lib. The releases are just a signed tag.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should have since we should have a way to know what changed in each version/release.

log.Info("Deleting the leader.")

//Mark the termainating status to the leaderPod and Delete the configmap lock
if err := deleteLeader(ctx, config.Client, leaderPod, existing); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not use the same func across all implementation? Also, why is required to remove the configMap?

Copy link
Contributor

@kasonglee kasonglee Aug 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@camilamacedo86

  1. The reason i added the function(delete Leader) is for the lint-test. golangci-lint limits Cyclomatic complexity in the function. so we have to divide the code into multiple functions.
  2. if i do not remove the configmap, leader election will not occur imediately when the node where the leader pod is running was 'not ready' status. Because the leader pod was just marked as 'terminating'.(the garbage collector do not delete the configmap until the leader pod is actually deleted)

@@ -209,4 +209,137 @@ var _ = Describe("Leader election", func() {
Expect(pod.TypeMeta.Kind).To(Equal("Pod"))
})
})

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS.: see that is not all changes covered by the tests. its % decreased.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@camilamacedo86 i will enhance the coverage in the new commit. Do i need to make a new pull-request?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kasonglee I see you already added tests that is just fine. There is no need for a new commit.

@jmrodri
Copy link
Member

jmrodri commented Aug 24, 2020

/assign jmrodri

@jmrodri
Copy link
Member

jmrodri commented Aug 24, 2020

/hold until I can review

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 24, 2020
@joelanford
Copy link
Member

/test test-all

Copy link
Member

@jmrodri jmrodri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two minor issue but I will not hold this PR up for it. I can fix those later. So far this looks fine to me.

log.Info("the status of the node where operator pod with leader lock was running has been 'notReady'")
log.Info("Deleting the leader.")

//Mark the termainating status to the leaderPod and Delete the configmap lock
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: should have a space after // : // Mark
Nit: typo termainating should be terminating.

Comment on lines +282 to +310
func getNode(ctx context.Context, client crclient.Client, nodeName string, node *corev1.Node) error {
key := crclient.ObjectKey{Namespace: "", Name: nodeName}
err := client.Get(ctx, key, node)
if err != nil {
log.Error(err, "Failed to get Node", "Node.Name", nodeName)
return err
}
return nil
}

func isNotReadyNode(ctx context.Context, client crclient.Client, nodeName string) bool {
leaderNode := &corev1.Node{}
if err := getNode(ctx, client, nodeName, leaderNode); err != nil {
return false
}
for _, condition := range leaderNode.Status.Conditions {
if condition.Type == corev1.NodeReady && condition.Status != corev1.ConditionTrue {
return true
}
}
return false

}

func deleteLeader(ctx context.Context, client crclient.Client, leaderPod *corev1.Pod, existing *corev1.ConfigMap) error {
err := client.Delete(ctx, leaderPod)
if err != nil {
log.Error(err, "Leader pod could not be deleted.")
return err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all good. Pretty clean and easy to read.

@@ -209,4 +209,137 @@ var _ = Describe("Leader election", func() {
Expect(pod.TypeMeta.Kind).To(Equal("Pod"))
})
})

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kasonglee I see you already added tests that is just fine. There is no need for a new commit.

@jmrodri
Copy link
Member

jmrodri commented Oct 10, 2020

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 10, 2020
@jmrodri
Copy link
Member

jmrodri commented Oct 10, 2020

/lgtm
/approve

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 10, 2020
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jmrodri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 10, 2020
@openshift-merge-robot openshift-merge-robot merged commit d8d1759 into operator-framework:main Oct 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants