Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: stage reconciler failures if cache is stale #1475

Merged
merged 1 commit into from
Feb 12, 2024

Conversation

gdsoumya
Copy link
Contributor

@gdsoumya gdsoumya commented Feb 12, 2024

Sometimes the controller cache for stages is stale and doesn't show updated result of a promotion leading to a situation where CurrentFreight is nil temporary and so it sets the phase as NotApplicable. But in reality the CurrentFreight is already updated in live state but because the controller has now patched the phase as not applicable any future reconciliations completely skip the verification step as phase doesn't match the required value.

@gdsoumya gdsoumya requested a review from a team as a code owner February 12, 2024 19:43
Copy link

netlify bot commented Feb 12, 2024

Deploy Preview for docs-kargo-akuity-io ready!

Name Link
🔨 Latest commit 0a9a1ff
🔍 Latest deploy log https://app.netlify.com/sites/docs-kargo-akuity-io/deploys/65ca76c5e72dc20008ce5c02
😎 Deploy Preview https://deploy-preview-1475.kargo.akuity.io
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Signed-off-by: Soumya Ghosh Dastidar <gdsoumya@gmail.com>
@gdsoumya gdsoumya force-pushed the fix/stage_reconciler branch from 5bb6f67 to 0a9a1ff Compare February 12, 2024 19:51
Copy link
Member

@krancour krancour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good as a workaround, but I will also investigate the underlying issues in controller runtime.

@gdsoumya gdsoumya added this pull request to the merge queue Feb 12, 2024
Merged via the queue into akuity:main with commit 6ad1a1c Feb 12, 2024
14 checks passed
@gdsoumya gdsoumya deleted the fix/stage_reconciler branch February 12, 2024 20:14
@gdsoumya
Copy link
Contributor Author

gdsoumya commented Feb 12, 2024

@krancour log dump where I encounter the situation :

time="2024-02-12T20:16:55Z" level=debug msg="promo added to priority queue"
time="2024-02-12T20:16:55Z" level=debug msg="begin promo"
time="2024-02-12T20:16:55Z" level=info msg="began promotion" freight=b4e53022bda6ddf48c521357ef92e7cbe9696ba0 namespace=kargo-proj promotion=test1.01hpffeaftt94x0h7dpma20gqj.b4e5302 stage=test1
time="2024-02-12T20:16:55Z" level=debug msg="found associated Stage" freight=b4e53022bda6ddf48c521357ef92e7cbe9696ba0 namespace=kargo-proj promotion=test1.01hpffeaftt94x0h7dpma20gqj.b4e5302 stage=test1
time="2024-02-12T20:16:55Z" level=debug msg="executing promotion mechanisms" freight=b4e53022bda6ddf48c521357ef92e7cbe9696ba0 namespace=kargo-proj promotion=test1.01hpffeaftt94x0h7dpma20gqj.b4e5302 stage=test1
time="2024-02-12T20:16:55Z" level=debug msg="executing Git-based promotion mechanisms" freight=b4e53022bda6ddf48c521357ef92e7cbe9696ba0 namespace=kargo-proj promotion=test1.01hpffeaftt94x0h7dpma20gqj.b4e5302 stage=test1
time="2024-02-12T20:16:55Z" level=debug msg="executing Kustomize promotion mechanism" freight=b4e53022bda6ddf48c521357ef92e7cbe9696ba0 namespace=kargo-proj promotion=test1.01hpffeaftt94x0h7dpma20gqj.b4e5302 stage=test1
time="2024-02-12T20:16:55Z" level=debug msg="obtained credentials for git repo" freight=b4e53022bda6ddf48c521357ef92e7cbe9696ba0 namespace=kargo-proj promotion=test1.01hpffeaftt94x0h7dpma20gqj.b4e5302 repo="https://github.com/gdsoumya/kargo-demo" stage=test1
time="2024-02-12T20:16:57Z" level=debug msg="done executing Kustomize promotion mechanism" freight=b4e53022bda6ddf48c521357ef92e7cbe9696ba0 namespace=kargo-proj promotion=test1.01hpffeaftt94x0h7dpma20gqj.b4e5302 stage=test1
time="2024-02-12T20:16:57Z" level=debug msg="done executing promotion mechanisms. aggregated status: Succeeded" freight=b4e53022bda6ddf48c521357ef92e7cbe9696ba0 namespace=kargo-proj promotion=test1.01hpffeaftt94x0h7dpma20gqj.b4e5302 stage=test1
time="2024-02-12T20:16:57Z" level=debug msg="done executing promotion mechanisms. aggregated status: Succeeded" freight=b4e53022bda6ddf48c521357ef92e7cbe9696ba0 namespace=kargo-proj promotion=test1.01hpffeaftt94x0h7dpma20gqj.b4e5302 stage=test1
time="2024-02-12T20:16:57Z" level=debug msg="promotion Succeeded" freight=b4e53022bda6ddf48c521357ef92e7cbe9696ba0 namespace=kargo-proj promotion=test1.01hpffeaftt94x0h7dpma20gqj.b4e5302 stage=test1
time="2024-02-12T20:16:57Z" level=info msg="promotion Succeeded" freight=b4e53022bda6ddf48c521357ef92e7cbe9696ba0 namespace=kargo-proj promotion=test1.01hpffeaftt94x0h7dpma20gqj.b4e5302 stage=test1
time="2024-02-12T20:16:57Z" level=debug msg="conclude promo" namespace=kargo-proj promotion=test1.01hpffeaftt94x0h7dpma20gqj.b4e5302
time="2024-02-12T20:16:57Z" level=debug msg="reconciling Stage" namespace=kargo-proj stage=test1
time="2024-02-12T20:16:57Z" level=debug msg="found Stage" namespace=kargo-proj stage=test1
time="2024-02-12T20:16:57Z" level=debug msg="Stage has no current Freight; no health checks or verification to perform" namespace=kargo-proj stage=test1
time="2024-02-12T20:16:57Z" level=debug msg="checking if auto-promotion is permitted..." namespace=kargo-proj stage=test1
time="2024-02-12T20:16:57Z" level=debug msg="found no PromotionPolicy associated with the Stage" namespace=kargo-proj stage=test1
time="2024-02-12T20:16:57Z" level=debug msg="auto-promotion is not permitted for the Stage" namespace=kargo-proj stage=test1
time="2024-02-12T20:16:57Z" level=debug msg="done reconciling Stage" namespace=kargo-proj stage=test1
time="2024-02-12T20:17:44Z" level=debug msg="reconciling Stage" namespace=kargo-proj stage=test1
time="2024-02-12T20:17:44Z" level=debug msg="found Stage" namespace=kargo-proj stage=test1
time="2024-02-12T20:17:44Z" level=debug msg="Stage health deemed not applicable" freight=b4e53022bda6ddf48c521357ef92e7cbe9696ba0 namespace=kargo-proj stage=test1
time="2024-02-12T20:17:44Z" level=debug msg="checking if auto-promotion is permitted..." namespace=kargo-proj stage=test1
time="2024-02-12T20:17:44Z" level=debug msg="found no PromotionPolicy associated with the Stage" namespace=kargo-proj stage=test1
time="2024-02-12T20:17:44Z" level=debug msg="auto-promotion is not permitted for the Stage" namespace=kargo-proj stage=test1
time="2024-02-12T20:17:44Z" level=debug msg="done reconciling Stage" namespace=kargo-proj stage=test1
time="2024-02-12T20:17:45Z" level=debug msg="enqueued Stage for reconciliation" analysisRun=test1.01hpfffv434dcv08dgr6fbs54x.b4e5302 namespace=kargo-proj stage=test1
time="2024-02-12T20:17:45Z" level=debug msg="reconciling Stage" namespace=kargo-proj stage=test1
time="2024-02-12T20:17:45Z" level=debug msg="found Stage" namespace=kargo-proj stage=test1
time="2024-02-12T20:17:45Z" level=debug msg="Stage health deemed not applicable" freight=b4e53022bda6ddf48c521357ef92e7cbe9696ba0 namespace=kargo-proj stage=test1
time="2024-02-12T20:17:45Z" level=debug msg="marked Freight as verified in Stage" freight=b4e53022bda6ddf48c521357ef92e7cbe9696ba0 namespace=kargo-proj stage=test1
time="2024-02-12T20:17:45Z" level=debug msg="checking if auto-promotion is permitted..." namespace=kargo-proj stage=test1
time="2024-02-12T20:17:45Z" level=debug msg="found no PromotionPolicy associated with the Stage" namespace=kargo-proj stage=test1
time="2024-02-12T20:17:45Z" level=debug msg="auto-promotion is not permitted for the Stage" namespace=kargo-proj stage=test1
time="2024-02-12T20:17:45Z" level=debug msg="done reconciling Stage" namespace=kargo-proj stage=test1

The stage was reconciled and promoted, but then when it came to running verification stage it checks stage but sees current freight as empty : Stage has no current Freight; no health checks or verification to perform

Because I am running my change a manual refresh or an eventual resync after resync duration will trigger the verification. In the above case it was a manual refresh from UI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants