Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DependsOn readiness check should only rely on Ready condition #946

Open
seaneagan opened this issue Apr 18, 2024 · 10 comments · May be fixed by #947
Open

DependsOn readiness check should only rely on Ready condition #946

seaneagan opened this issue Apr 18, 2024 · 10 comments · May be fixed by #947

Comments

@seaneagan
Copy link
Contributor

It appears .status.observedGeneration does not reliably get updated after successful helm release upgrades. I can work on filing a separate bug for this, need to find time to come up with a minimal re-production, seems possible an issue could have been introduced in 48cad68

However, when checking dependencies it seems preferable to rely solely on the Ready condition, including both its Status and ObservedGeneration and ignore .Status.ObservedGeneration. Which then in turn has the side effect of avoiding the above mentioned bug.

if dHr.Generation != dHr.Status.ObservedGeneration || !conditions.IsTrue(dHr, meta.ReadyCondition) {

@seaneagan seaneagan linked a pull request Apr 18, 2024 that will close this issue
@souleb
Copy link
Member

souleb commented Apr 19, 2024

can you give more information on the issue?

The code is good for me. We expected a dependency to be ready only if the current version have been successfully reconciled.

@seaneagan
Copy link
Contributor Author

The Ready condition's ObservedGeneration is reliably getting updated, but not .Status.ObservedGeneration, which is resulting in "dependency not ready" even though it is ready. I think these two fields are mostly duplicative, but probably too late to remove .Status.ObservedGeneration at this point anyway? But I wonder if they should simply mirror each other?

@stefanprodan
Copy link
Member

@seaneagan I don't see how .Status.ObservedGeneration is not updated if the release upgrade succeeds given:

if errors.Is(retErr, reconcile.TerminalError(nil)) || (retErr == nil && (result.IsZero() || !result.Requeue)) {
patchOpts = append(patchOpts, patch.WithStatusObservedGeneration{})
}

Can you please add a unit test to prove this can actually happen?

Removing .Status.ObservedGeneration is not an option, this field is used by kstatus to determine readiness and almost all standard Kubernetes APIs have it. If .Status.ObservedGeneration is not kept up to date, then Flux kustomize-controller would be stuck waiting for HelmReleases which no one reported so far, so I'm really puzzled by your report.

@seaneagan
Copy link
Contributor Author

is it possible that an error which is not a TerminalError can happen while the Ready condition still gets set to True? What is the intention of when .status.observedGeneration should get updated? Would it make sense to simply always update it since in the most basic sense the generation has been observed? Or perhaps it can look at the conditions ObservedGenerations to determine whether it has been observed "enough" to ensure consistency vs looking at the transient result of the individual reconciliation?

separately, would it be ok to consider my proposed change independently of the above, as it feels more correct to rely on the Ready condition ObservedGeneration as it is intimately tied to the Ready condition status which is also being checked for dependencies?

@seaneagan
Copy link
Contributor Author

trying to translate the intention of that if condition into status conditions, perhaps if Ready==true or Stalled==true at that generation, then that generation has been observed?

@seaneagan
Copy link
Contributor Author

ok, so it looks like i was hitting #925 which was still allowing the Ready condition to be set to true, even though Reconciling was also true. Should Ready take Reconciling into account?

func summarize(req *Request) {

@seaneagan
Copy link
Contributor Author

@stefanprodan
Copy link
Member

I think we need a test for when drift correction is in loop that should used the kstatus verifier that catches any miss configuration https://github.com/fluxcd/kustomize-controller/blob/e9f5628eccbfbc722a7637ecbf7f66580e2e4416/internal/controller/kustomization_wait_test.go#L135

The checks and logic we use for the Ready/Stalled/Reconciling is here https://github.com/fluxcd/pkg/blob/60815565aaaa9bc92edc296a107288089764a384/runtime/conditions/check/fail.go

@stefanprodan
Copy link
Member

I think this will fix it #885

@stefanprodan
Copy link
Member

@seaneagan with #885 merged I think this will solve your issue. Ready will reflect now that drift detection is running without reaching correction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants