Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests for reconciliation events #255

Closed
wants to merge 1 commit into from
Closed

Conversation

phillebaba
Copy link
Member

@phillebaba phillebaba commented Jan 24, 2021

This adds tests that are meant to standardize the expected behavior for events generated by KC. In the end the test should act as a guide for any future changes to how events are generated.

Steps:

  1. Create Kustomization which will deploy a Pod
  2. Set Pod to an unhealthy state and wait for Kustomization to receive status condition for failed health check.
  3. Trigger a reconcile and check the events to make sure that only one event has been created for the failed health check.
  4. Set Pod to a healthy state and wait for Kustomization to receive a healthy status condition
  5. Trigger a reconcile and check the events to make sure that only one event has been created for the health check.

TBD:

  1. Verify external event recorder event message matches status
  2. Update revision and make sure only one event is created for health check.

@phillebaba phillebaba force-pushed the fix/event-tests branch 2 times, most recently from f4ad401 to c23f84e Compare January 24, 2021 23:04
@phillebaba phillebaba changed the title Add tests for Kustomize events WIP: Add tests for Kustomize events Jan 24, 2021
@phillebaba phillebaba changed the title WIP: Add tests for Kustomize events Add tests for Kustomize events Jan 24, 2021
@phillebaba phillebaba marked this pull request as draft January 24, 2021 23:05
@@ -237,7 +237,7 @@ func (r *KustomizationReconciler) Reconcile(ctx context.Context, req ctrl.Reques
if reconcileErr != nil {
// record the reconciliation error
r.recordReadiness(ctx, reconciledKustomization)
return ctrl.Result{RequeueAfter: kustomization.Spec.Interval.Duration}, reconcileErr
return ctrl.Result{RequeueAfter: kustomization.Spec.Interval.Duration}, nil
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we return an error here the resource will be immediately reconciled. So what is the point of setting RequeueAfter? Reconciling immediately will mean that the Kustomization will not reach a unhealthy condition until the backoff is large enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm something is very wrong if it doesn’t schedule at the RetryInteval. Please undo this change as it breaks the retry behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to find docs about the Reconcile behavior to be sure, but it feels like the requeue after is ignored if an error is returned. When a health check timeout error occurs the resources will reconcile again after 1 ms. As I understand it that is not what we want to happen?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to retry the reconciliation based on what GetRetryInterval gives us. Maybe something changed in controller-runtime 0.8, is your PR based on the latest main branch?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep i rebased from main yesterday. So it might be that or the in memory cluster not behaving like a "normal" cluster. I am seeing the same behavior in a real cluster where the condition will go from progressing to progressing as it is immediately rescheduled. Also the way we are patching status updates means that we will always override conditions in that status. The only way to determine if the health check timed out is to check the events.

Copy link
Member

@stefanprodan stefanprodan Jan 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to introduce a new condition called healthCheck that records the last check result. But I would not mix this into this PR. First let’s figure out why the retry interval doesn’t work.

Copy link
Member Author

@phillebaba phillebaba Jan 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will investigate why it is reconciled immediately. So all we need to do is change the Type value in the status condition. The reason the conditions are replaced is because the merge key for conditions is the Type value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ready condition is reset by progressing and that’s ok, we need to create a dedicated condition for health checks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking into controller-runtime, if you return an error from a reconcile call the result will be ignored. This should be the right code that does the error handling.
https://github.com/kubernetes-sigs/controller-runtime/blob/9e78e653228851684b6b9cdabe5aae8559fe3722/pkg/internal/controller/controller.go#L297

I am not sure if it has always been like this or if it is a change introduced in the latest version. Either way we need to fix KC so that it actually waits the interval time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in #256, please rebase.

@stefanprodan stefanprodan changed the title Add tests for Kustomize events Add tests for reconciliation events Jan 25, 2021
Signed-off-by: Philip Laine <philip.laine@gmail.com>
@phillebaba phillebaba marked this pull request as ready for review February 2, 2021 22:18
@phillebaba
Copy link
Member Author

phillebaba commented Feb 2, 2021

Its better to try to merge this PR which adds the test to avoid regression and then I can verify that #191 is not present in a later PR. There is still the issue that the health passed even is sent immediatly when the kustomization is reconciled, even if the health check does not pass.

@stefanprodan
Copy link
Member

There is still the issue that the health passed even is sent immediatly when the kustomization is reconciled, even if the health check does not pass.

I don't see how is that possible, on a health check error we return early:
https://github.com/fluxcd/kustomize-controller/blob/main/controllers/kustomization_controller.go#L695

@stefanprodan
Copy link
Member

I've added various tests for the events in the last months, closing this.

@stefanprodan stefanprodan deleted the fix/event-tests branch January 5, 2022 08:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants