Skip to content
This repository has been archived by the owner on Oct 12, 2023. It is now read-only.

Liveness probe #309

Merged
merged 13 commits into from
Jul 25, 2019
Merged

Liveness probe #309

merged 13 commits into from
Jul 25, 2019

Conversation

kkmsft
Copy link
Contributor

@kkmsft kkmsft commented Jul 25, 2019

Reason for Change:

  • Add liveliness at /healthz endpoint for MIC and NMI at 8080 by default.
    - Add readiness at /ready endpoint for MIC and NMI at 8080 by default.
    - Both liveness and readiness probe return back 200 okay as soon as they are activated from the current run.
  • The liveliness probe provides "Active" or "Not Active" status as per the state of the component. In case of MIC if the given instance is the elected leader then "Active" string is presented in the /healthz endpoint. For NMI it would show "Active" as soon as the iptables are put in place.
  • Fix the NMI iptables to run instantly instead of waiting for the first ticker instance.

Issue Fixed:

#288

Notes for Reviewers:

@kkmsft kkmsft changed the title Readiness liveness probe Readiness & liveness probe Jul 25, 2019
@kkmsft kkmsft requested a review from aramase July 25, 2019 02:57
@@ -103,7 +103,7 @@ spec:
name: iptableslock
containers:
- name: nmi
image: "mcr.microsoft.com/k8s/aad-pod-identity/nmi:1.5-rc1"
image: "testle5.azurecr.io/nmi:le"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to switch back to rc1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aramase

@@ -191,7 +203,7 @@ spec:
serviceAccountName: aad-pod-id-mic-service-account
containers:
- name: mic
image: "mcr.microsoft.com/k8s/aad-pod-identity/mic:1.5-rc1"
image: "testle5.azurecr.io/mic:le"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

var err error
var t *template.Template
if !old {
t, err = template.New("deployment-rbac.yaml").ParseFiles(path.Join("template", "deployment-rbac.yaml"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than having 2 different deployment files, we should just template the single file to include new specs/ignore based on a boolean flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good. Will take it up in a future PR.

AfterEach(func() {
fmt.Println("\nTearing down the test environment...")

if CurrentGinkgoTestDescription().Failed {
Copy link
Member

@aramase aramase Jul 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should skip cleanup if there is a failure. Instead we should dump the logs here. If we skip clean up, it can cause issues in automation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree. This is stop gap before going to the full copy logs and events atleast until that PR is done.

@@ -573,7 +590,7 @@ var _ = Describe("Kubernetes cluster using aad-pod-identity", func() {
Expect(ok).To(Equal(true))

// setup mic and nmi with old releases
setupInfra("mcr.microsoft.com/k8s/aad-pod-identity", "1.4", "1.3")
setupInfraOld("mcr.microsoft.com/k8s/aad-pod-identity", "1.4", "1.3")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead of 2 functions we should just pass a boolean and use a single template to generate the final deployment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not want to make changes in setupinfra all through, instead kept the changes to internal function only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should generate the required template based on version, <= 1.4. When that change is made in future PR, this single instance and setupold function will go away.

port: 8080
initialDelaySeconds: 10
periodSeconds: 5
readinessProbe:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReadinessProbe is typically used to see if an endpoint is ready to receive traffic. Only when the readinessprobe passes the pod ip is added to list of endpoints in the service that is exposing the pod. We don't have a service with mic or nmi. Is there a reason we are doing readiness probe too?

Copy link
Member

@aramase aramase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments.

@kkmsft kkmsft changed the title Readiness & liveness probe Liveness probe Jul 25, 2019
cmd/mic/main.go Outdated
@@ -39,6 +41,9 @@ func main() {
flag.StringVar(&leaderElectionCfg.Name, "leader-election-name", "aad-pod-identity-mic", "leader election name")
flag.DurationVar(&leaderElectionCfg.Duration, "leader-election-duration", time.Second*15, "leader election duration")

//Probe port
flag.StringVar(&httpProbePort, "http-probe-port", "8080", "http health and ready probe port")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: http health and liveness probe port

@@ -91,6 +103,11 @@ func (s *Server) updateIPTableRules() {
ticker := time.NewTicker(time.Second * time.Duration(s.IPTableUpdateTimeIntervalInSeconds))
defer ticker.Stop()

// Run once before the waiting on ticket for the rules to take effect
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/ticket/ticker

@kkmsft
Copy link
Contributor Author

kkmsft commented Jul 25, 2019

Ran 13 of 13 Specs in 2388.736 seconds
SUCCESS! -- 13 Passed | 0 Failed | 0 Pending | 0 Skipped
--- PASS: TestAADPodIdentity (2388.74s)
PASS
ok github.com/Azure/aad-pod-identity/test/e2e 2388.765s

Copy link
Member

@aramase aramase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@kkmsft kkmsft merged commit f9d0f4e into Azure:master Jul 25, 2019
@kkmsft kkmsft deleted the readiness_liveness_probe branch July 25, 2019 21:04
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants