-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update certificate observed generation always #5207
Update certificate observed generation always #5207
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@taragu: 0 warnings.
In response to this:
/lint
Part of #5076
Proposed Changes
- This PR updates certificate reconciler to always update the
ObservedGeneration
. Currently, the reconciler will not update theObservedGeneration
if an error is encountered inreconcileCMCertificate(...)
. This could cause problems similar to Service's RoutesReady is 'Unknown' when Route's Ready is 'False' #4561Release Note
NONE
/cc @jonjohnsonjr
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
Hi @taragu. Thanks for your PR. I'm waiting for a knative member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
func knCertWithGeneration(name, namespace string, gen int) *v1alpha1.Certificate { | ||
return knCertWithStatusAndGeneration(name, namespace, &v1alpha1.CertificateStatus{}, gen) | ||
} | ||
|
||
func knCertWithStatus(name, namespace string, status *v1alpha1.CertificateStatus) *v1alpha1.Certificate { | ||
return &v1alpha1.Certificate{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should use the new helper, then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vagababov I was thinking if I should adding another input param to knCert(...)
and knCertWithStatus(...)
instead of creating the new helpers knCertWithGeneration(...)
and knCertWithStatusAndGeneration(...)
. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah have one all inclusive and specialize others. That keeps the actual logic in one place.
One worry that @mattmoor had about bumping generation like this unconditionally is that we might prematurely mark a resource as ready if we encounter an error during reconciliation (the 5th point in #5076). Just wanted to reiterate that concern -- if a resource is ready, then its spec changes, and we encounter an error trying to reconcile that spec, do we still say its ready? (We shouldn't 😄) |
@jonjohnsonjr that's correct. I've checked to see if the certificate observed generation is used for indicating if the resource is ready like this, and I haven't seen it used in this way. @vagababov @dgerd could you please keep me honest here and confirm if this is true? |
bd977ca
to
f91fe11
Compare
/assign @jonjohnsonjr |
}, { | ||
Name: "observed generation is still updated when error is encountered", | ||
Objects: []runtime.Object{ | ||
knCertWithGeneration("knCert", "foo", generation+1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the status of the initial knCert has condition Ready=True instead of nil? That's the case I was worried about in this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jonjohnsonjr I see. Would marking the knCert as not ready solve this problem?
cmConfig := config.FromContext(ctx).CertManager | ||
cmCert := resources.MakeCertManagerCertificate(cmConfig, knCert) | ||
cmCert, err := c.reconcileCMCertificate(ctx, knCert, cmCert) | ||
if err != nil { | ||
knCert.Status.MarkNotReady(notReconciledReason, notReconciledMessage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to include err
in the message, WDYT?
I'm also worried that this isn't robust to future changes -- if we add another path to reconcile
that can return an error, we'll need to duplicate this MarkNotReady
call. I'd move this up into Reconcile
and check it immediately after we call reconcile
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if we want to include error message into Status because sometimes error message could be very verbose. Having it in the status may cause the Object exceeds k8s size limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point... it would still be nice to have something, maybe we could truncate it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It sounds good to me to put the truncated error message into the status message. We probably also want to add some description before the truncated error message just in case that the truncated error message is not descriptive enough. (e.g. have status message like: fmt.Sprintf("Cert-Manager certificate reconciliation failed: %s", truncatedErrorMessage)
).
Also do we want to consider as a general pattern to follow for MarkFailed
or MarkNotReady
when they are caused by errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much should we truncate the err message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the deep diving @jonjohnsonjr
- Generally I think we should reconcile an object when its status was changed because the status change of the object could be caused by the status change of its child resources, which means we need to enqueue the object to do "something".
- If we need some fields in the status for the informative purpose, we could exclude these fields when comparing the new and old objects.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally I think we should reconcile an object when its status was changed because the status change of the object could be caused by the status change of its child resources, which means we need to enqueue the object to do "something".
But the Foo reconciler is the thing that is updating the FooStatus, so we don't need to trigger the Foo reconciler on FooStatus updates. The Foo reconciler gets triggered by children updates via children informers, which is what caused the Foo reconcile to update the FooStatus in the first place :) When child statuses change, we should definitely re-reconcile the parent, but when our own status changes, we are triggering ourselves, which should always be a no-op unless we've made a huge mistake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, so I think it's possible to make this work, but let's drop err
from the message for now until we've landed some changes to make it feasible. Sorry for the back and forth :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. Should the enqueuing logic remain the same for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the status update should trigger one extra reconcile loop (as it does already, today), but since the error message will remain constant, we won't get stuck in a reconcile loop.
@@ -42,6 +42,8 @@ import ( | |||
const ( | |||
noCMConditionReason = "NoCertManagerCertCondition" | |||
noCMConditionMessage = "The ready condition of Cert Manager Certifiate does not exist." | |||
notReconciledReason = "CertificateNotConfigured" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love CertificateNotConfigured
-- maybe ReconcileFailed
-- cc @mattmoor may have opinions here.
@vagababov @ZhiminXiang @jonjohnsonjr I think this PR is ready for another round of review. Would you please take a look? |
/ok-to-test |
The following is the coverage report on pkg/.
|
/lgtm |
/lgtm |
/assign @dprotaso |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: taragu, vagababov The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test pull-knative-serving-integration-tests |
1 similar comment
/test pull-knative-serving-integration-tests |
The following jobs failed due to test flakiness:
Job pull-knative-serving-integration-tests expended all 3 retries without success. |
/test pull-knative-serving-integration-tests |
/lint
Part of #5076
Proposed Changes
ObservedGeneration
. Currently, the reconciler will not update theObservedGeneration
if an error is encountered inreconcileCMCertificate(...)
. This could cause problems similar to Service's RoutesReady is 'Unknown' when Route's Ready is 'False' #4561Release Note
/cc @jonjohnsonjr