Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update certificate observed generation always #5207

Conversation

taragu
Copy link
Contributor

@taragu taragu commented Aug 19, 2019

/lint

Part of #5076

Proposed Changes

Release Note

NONE

/cc @jonjohnsonjr

@googlebot googlebot added the cla: yes Indicates the PR's author has signed the CLA. label Aug 19, 2019
@knative-prow-robot knative-prow-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Aug 19, 2019
Copy link
Contributor

@knative-prow-robot knative-prow-robot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@taragu: 0 warnings.

In response to this:

/lint

Part of #5076

Proposed Changes

Release Note

NONE

/cc @jonjohnsonjr

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@knative-prow-robot
Copy link
Contributor

Hi @taragu. Thanks for your PR.

I'm waiting for a knative member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@knative-prow-robot knative-prow-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. area/API API objects and controllers labels Aug 19, 2019
func knCertWithGeneration(name, namespace string, gen int) *v1alpha1.Certificate {
return knCertWithStatusAndGeneration(name, namespace, &v1alpha1.CertificateStatus{}, gen)
}

func knCertWithStatus(name, namespace string, status *v1alpha1.CertificateStatus) *v1alpha1.Certificate {
return &v1alpha1.Certificate{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should use the new helper, then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vagababov I was thinking if I should adding another input param to knCert(...) and knCertWithStatus(...) instead of creating the new helpers knCertWithGeneration(...) and knCertWithStatusAndGeneration(...). WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah have one all inclusive and specialize others. That keeps the actual logic in one place.

@jonjohnsonjr
Copy link
Contributor

One worry that @mattmoor had about bumping generation like this unconditionally is that we might prematurely mark a resource as ready if we encounter an error during reconciliation (the 5th point in #5076). Just wanted to reiterate that concern -- if a resource is ready, then its spec changes, and we encounter an error trying to reconcile that spec, do we still say its ready? (We shouldn't 😄)

@taragu
Copy link
Contributor Author

taragu commented Aug 19, 2019

@jonjohnsonjr that's correct. I've checked to see if the certificate observed generation is used for indicating if the resource is ready like this, and I haven't seen it used in this way. @vagababov @dgerd could you please keep me honest here and confirm if this is true?

@taragu taragu force-pushed the certificate-reconcile-observedgeneration branch from bd977ca to f91fe11 Compare August 19, 2019 22:43
@markusthoemmes
Copy link
Contributor

/assign @jonjohnsonjr

}, {
Name: "observed generation is still updated when error is encountered",
Objects: []runtime.Object{
knCertWithGeneration("knCert", "foo", generation+1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the status of the initial knCert has condition Ready=True instead of nil? That's the case I was worried about in this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jonjohnsonjr I see. Would marking the knCert as not ready solve this problem?

cmConfig := config.FromContext(ctx).CertManager
cmCert := resources.MakeCertManagerCertificate(cmConfig, knCert)
cmCert, err := c.reconcileCMCertificate(ctx, knCert, cmCert)
if err != nil {
knCert.Status.MarkNotReady(notReconciledReason, notReconciledMessage)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to include err in the message, WDYT?

I'm also worried that this isn't robust to future changes -- if we add another path to reconcile that can return an error, we'll need to duplicate this MarkNotReady call. I'd move this up into Reconcile and check it immediately after we call reconcile.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if we want to include error message into Status because sometimes error message could be very verbose. Having it in the status may cause the Object exceeds k8s size limit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point... it would still be nice to have something, maybe we could truncate it?

Copy link

@ZhiminXiang ZhiminXiang Aug 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds good to me to put the truncated error message into the status message. We probably also want to add some description before the truncated error message just in case that the truncated error message is not descriptive enough. (e.g. have status message like: fmt.Sprintf("Cert-Manager certificate reconciliation failed: %s", truncatedErrorMessage)).

Also do we want to consider as a general pattern to follow for MarkFailed or MarkNotReady when they are caused by errors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much should we truncate the err message?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the deep diving @jonjohnsonjr

  1. Generally I think we should reconcile an object when its status was changed because the status change of the object could be caused by the status change of its child resources, which means we need to enqueue the object to do "something".
  2. If we need some fields in the status for the informative purpose, we could exclude these fields when comparing the new and old objects.
    Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally I think we should reconcile an object when its status was changed because the status change of the object could be caused by the status change of its child resources, which means we need to enqueue the object to do "something".

But the Foo reconciler is the thing that is updating the FooStatus, so we don't need to trigger the Foo reconciler on FooStatus updates. The Foo reconciler gets triggered by children updates via children informers, which is what caused the Foo reconcile to update the FooStatus in the first place :) When child statuses change, we should definitely re-reconcile the parent, but when our own status changes, we are triggering ourselves, which should always be a no-op unless we've made a huge mistake.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, so I think it's possible to make this work, but let's drop err from the message for now until we've landed some changes to make it feasible. Sorry for the back and forth :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Should the enqueuing logic remain the same for now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the status update should trigger one extra reconcile loop (as it does already, today), but since the error message will remain constant, we won't get stuck in a reconcile loop.

@@ -42,6 +42,8 @@ import (
const (
noCMConditionReason = "NoCertManagerCertCondition"
noCMConditionMessage = "The ready condition of Cert Manager Certifiate does not exist."
notReconciledReason = "CertificateNotConfigured"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love CertificateNotConfigured -- maybe ReconcileFailed -- cc @mattmoor may have opinions here.

@taragu
Copy link
Contributor Author

taragu commented Aug 26, 2019

@vagababov @ZhiminXiang @jonjohnsonjr I think this PR is ready for another round of review. Would you please take a look?

@taragu
Copy link
Contributor Author

taragu commented Aug 26, 2019

/ok-to-test

@knative-prow-robot knative-prow-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 26, 2019
@knative-metrics-robot
Copy link

The following is the coverage report on pkg/.
Say /test pull-knative-serving-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/certificate/certificate.go 76.7% 88.0% 11.3

@ZhiminXiang
Copy link

/lgtm

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Aug 26, 2019
@jonjohnsonjr
Copy link
Contributor

/lgtm

@taragu
Copy link
Contributor Author

taragu commented Aug 26, 2019

/assign @dprotaso

Copy link
Contributor

@vagababov vagababov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@vagababov
Copy link
Contributor

/approve

@knative-prow-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: taragu, vagababov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 27, 2019
@taragu
Copy link
Contributor Author

taragu commented Aug 27, 2019

/test pull-knative-serving-integration-tests

1 similar comment
@taragu
Copy link
Contributor Author

taragu commented Aug 28, 2019

/test pull-knative-serving-integration-tests

@knative-test-reporter-robot

The following jobs failed due to test flakiness:

Test name Triggers Retries
pull-knative-serving-integration-tests pull-knative-serving-integration-tests
pull-knative-serving-integration-tests
pull-knative-serving-integration-tests
pull-knative-serving-integration-tests
pull-knative-serving-integration-tests
pull-knative-serving-integration-tests
3/3

Job pull-knative-serving-integration-tests expended all 3 retries without success.

@taragu
Copy link
Contributor Author

taragu commented Aug 28, 2019

/test pull-knative-serving-integration-tests

@knative-prow-robot knative-prow-robot merged commit e9c007b into knative:master Aug 28, 2019
@taragu taragu deleted the certificate-reconcile-observedgeneration branch November 12, 2019 16:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/API API objects and controllers cla: yes Indicates the PR's author has signed the CLA. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants