Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement webserver in cnf-app-mac-operator to handle lifecycle and probe requests #35

Merged
merged 2 commits into from
Jan 5, 2024

Conversation

ramperher
Copy link
Collaborator

@ramperher ramperher commented Jan 4, 2024

build-depends: rh-nfv-int/nfv-example-cnf-deploy#44

Lifecycle hooks can be implemented with a httpGet resource, we don't really need to use a shell command for that: https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/#hook-handler-implementations

To make it available in cnf-app-mac-operator, I've added a webserver in the main application to handle these requests, and also for all lifecycle probes (readiness, liveness and startup).

With all this, example-cnf should continue working fine and, now, all these tests should be passing for cnf-app-mac-operator:

lifecycle-container-shutdown
lifecycle-container-startup
lifecycle-liveness-probe
lifecycle-readiness-probe
lifecycle-startup-probe

(Adding rh-nfv-int/nfv-example-cnf-deploy#44 as dependency just to avoid issues found in daily jobs with this case).

@ramperher ramperher force-pushed the cnf-app-mac-operator-webserver branch from de835c5 to 2a6f21c Compare January 4, 2024 17:14
@ramperher
Copy link
Collaborator Author

The installation of the cnf-app-mac-operator pod is working :) https://www.distributed-ci.io/jobs/503cbe5e-9703-41cb-bf86-4dd146fd26dd/jobStates?sort=date&task=ad954d05-b310-43ed-b9fb-092e8f4d67b6

And the endpoints are accessible:

$ oc get pods -n example-cnf -o wide
NAME                                                       READY   STATUS    RESTARTS   AGE   IP            NODE       NOMINATED NODE   READINESS GATES                                                           
cnf-app-mac-operator-controller-manager-78fb5b5cf6-p2m5f   1/1     Running   0          19s   10.130.2.87   worker-0   <none>           <none>                                                                    

$ ssh core@worker-0
$ curl 10.130.2.87:8095/healthz
ok

Let's wait for the results, but it's a good signal.

@ramperher
Copy link
Collaborator Author

The failure was probably due to the fact that the webserver was created in a synchronous way, so that the rest of the logic of cnf-app-mac-operator was not executed and then that's why the corresponding check was failing, now I've moved the webserver code to a goroutine to call it asynchronously.

@ramperher
Copy link
Collaborator Author

The job is passing, example-cnf is working, and now, cnf-app-mac-operator-controller-manager pod is passing the five lifecycle tests commented in the description of this PR. Take a look to https://www.distributed-ci.io/jobs/b3487a61-25d8-4e6f-9fbc-86ec9e8c776c/tests/e0230ba5-9d3a-48e8-afb7-cd9116263e5e and check the output of each test:

-  lifecycle-container-startup

{"CompliantObjectsOut":[{"ObjectType":"Container","ObjectFieldsKeys":["Reason For Compliance","Namespace","Pod Name","Container Name"],"ObjectFieldsValues":["Container has postStart defined","example-cnf","cnf-app-mac-operator-controller-manager-bdf956b79-tkrtp","manager"]}]

- lifecycle-container-shutdown

{"CompliantObjectsOut":[{"ObjectType":"Container","ObjectFieldsKeys":["Reason For Compliance","Namespace","Pod Name","Container Name"],"ObjectFieldsValues":["Container has preStop defined","example-cnf","cnf-app-mac-operator-controller-manager-bdf956b79-tkrtp","manager"]}],

- lifecycle-readiness-probe

{"CompliantObjectsOut":[{"ObjectType":"Container","ObjectFieldsKeys":["Reason For Compliance","Namespace","Pod Name","Container Name"],"ObjectFieldsValues":["Container has ReadinessProbe defined","example-cnf","cnf-app-mac-operator-controller-manager-bdf956b79-tkrtp","manager"]}

- lifecycle-liveness-probe

{"CompliantObjectsOut":[{"ObjectType":"Container","ObjectFieldsKeys":["Reason For Compliance","Namespace","Pod Name","Container Name"],"ObjectFieldsValues":["Container has LivenessProbe defined","example-cnf","cnf-app-mac-operator-controller-manager-bdf956b79-tkrtp","manager"]}

- lifecycle-startup-probe

{"CompliantObjectsOut":[{"ObjectType":"Container","ObjectFieldsKeys":["Reason For Compliance","Namespace","Pod Name","Container Name"],"ObjectFieldsValues":["Container has StartupProbe defined","example-cnf","cnf-app-mac-operator-controller-manager-bdf956b79-tkrtp","manager"]}]

Change is ready for review

Copy link
Collaborator

@betoredhat betoredhat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job!, LGTM

@ramperher ramperher merged commit 26ec9b6 into main Jan 5, 2024
1 check passed
@tonyskapunk
Copy link
Collaborator

A bit late, but here my feedback.

If the purpose of the change is to comply with CNF validation, i.e. to check boxes, then this approach is the right one.

If the purpose of the change is to follow applicable best practices, then it is a great skeleton to lay down the added value of those practices. This is, each type (probes and lifecycle) have a purpose, to do some validation or preparation or action within the pod. The current implementation always returns valid responses but does not implement any validation, then there's a few value added and could give a false impression of a good practice.

@ramperher
Copy link
Collaborator Author

A bit late, but here my feedback.

If the purpose of the change is to comply with CNF validation, i.e. to check boxes, then this approach is the right one.

If the purpose of the change is to follow applicable best practices, then it is a great skeleton to lay down the added value of those practices. This is, each type (probes and lifecycle) have a purpose, to do some validation or preparation or action within the pod. The current implementation always returns valid responses but does not implement any validation, then there's a few value added and could give a false impression of a good practice.

Hi @tonyskapunk ! I get your point and I agree that there's a great room of improvement. For the moment, the idea is to just comply with CNF validation, to really understand what we need to take into account when dealing with partner cases. For the second case, we would really need to understand right now how example-cnf components work and interact between them, and we're not yet in that point of understanding the solution. When reaching that point, of course we can improve this by setting the correct validation, but I'd say to do it as an improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants