-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tests for reconciliation events #255
Conversation
f4ad401
to
c23f84e
Compare
@@ -237,7 +237,7 @@ func (r *KustomizationReconciler) Reconcile(ctx context.Context, req ctrl.Reques | |||
if reconcileErr != nil { | |||
// record the reconciliation error | |||
r.recordReadiness(ctx, reconciledKustomization) | |||
return ctrl.Result{RequeueAfter: kustomization.Spec.Interval.Duration}, reconcileErr | |||
return ctrl.Result{RequeueAfter: kustomization.Spec.Interval.Duration}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we return an error here the resource will be immediately reconciled. So what is the point of setting RequeueAfter? Reconciling immediately will mean that the Kustomization will not reach a unhealthy condition until the backoff is large enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm something is very wrong if it doesn’t schedule at the RetryInteval. Please undo this change as it breaks the retry behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to find docs about the Reconcile behavior to be sure, but it feels like the requeue after is ignored if an error is returned. When a health check timeout error occurs the resources will reconcile again after 1 ms. As I understand it that is not what we want to happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to retry the reconciliation based on what GetRetryInterval gives us. Maybe something changed in controller-runtime 0.8, is your PR based on the latest main branch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep i rebased from main yesterday. So it might be that or the in memory cluster not behaving like a "normal" cluster. I am seeing the same behavior in a real cluster where the condition will go from progressing to progressing as it is immediately rescheduled. Also the way we are patching status updates means that we will always override conditions in that status. The only way to determine if the health check timed out is to check the events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to introduce a new condition called healthCheck that records the last check result. But I would not mix this into this PR. First let’s figure out why the retry interval doesn’t work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will investigate why it is reconciled immediately. So all we need to do is change the Type value in the status condition. The reason the conditions are replaced is because the merge key for conditions is the Type value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ready condition is reset by progressing and that’s ok, we need to create a dedicated condition for health checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking into controller-runtime, if you return an error from a reconcile call the result will be ignored. This should be the right code that does the error handling.
https://github.com/kubernetes-sigs/controller-runtime/blob/9e78e653228851684b6b9cdabe5aae8559fe3722/pkg/internal/controller/controller.go#L297
I am not sure if it has always been like this or if it is a change introduced in the latest version. Either way we need to fix KC so that it actually waits the interval time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in #256, please rebase.
c23f84e
to
7670ef3
Compare
7670ef3
to
ce622e2
Compare
ce622e2
to
89b1318
Compare
Signed-off-by: Philip Laine <philip.laine@gmail.com>
89b1318
to
e55f0e5
Compare
Its better to try to merge this PR which adds the test to avoid regression and then I can verify that #191 is not present in a later PR. There is still the issue that the health passed even is sent immediatly when the kustomization is reconciled, even if the health check does not pass. |
I don't see how is that possible, on a health check error we return early: |
I've added various tests for the events in the last months, closing this. |
This adds tests that are meant to standardize the expected behavior for events generated by KC. In the end the test should act as a guide for any future changes to how events are generated.
Steps:
TBD: