Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move Argo CD sync & wait logic to promotion #1613

Closed
jessesuen opened this issue Mar 13, 2024 · 3 comments · Fixed by #1753
Closed

Move Argo CD sync & wait logic to promotion #1613

jessesuen opened this issue Mar 13, 2024 · 3 comments · Fixed by #1753

Comments

@jessesuen
Copy link
Member

jessesuen commented Mar 13, 2024

Proposed Feature

Currently, the Argo CD sync happens in the promotion, but waiting for the sync operation to complete is currently happening in the Stage reconciler. I believe that the wait should instead be part of the promotion. Instead of the Stage reconciler doing the sync and wait, I propose this logic be moved to the Promotion reconciler.

Motivation

Earlier, we did not do this because Promotions were not long-lived. But now that they are, I feel it would make more sense for sync + wait to be part of the promotion. This will align better with our future desire for promotions to integrate with other tools aside from Argo CD, which might have asynchronous, long-lived deployment processes.

Moving it to Promotion will make it more visible when the sync failed, because the promotion would fail too.

Suggested Implementation

  • When a promotion job issues a sync, it does not complete the Promotion immediately like it does now. It leaves it in Running.
  • We have a predicate for Argo CD such that when an operation completes, we enqueue the promotion job instead of the stage. When the Promotion job sees that it is completed successfully, the promotion is Successful. If the operation fails, the promotion is considered failed.
  • We will need a way to terminate Promotions, which in the case of Argo CD, will terminate the operation. This can be a separate issue.

Open Issues

There is also the question of whether or not waiting for an Application to become Healthy should be part of promotion. I suggest we leave this in Stage reconciler, because this is more like a verification than it is a promotion.

@krancour
Copy link
Member

krancour commented Mar 13, 2024

I agree with this, but tentatively think that checking sync state should happen in both places. While an Application's sync state might be distinct from its health, this feels less true for a Kargo Stage. Detecting that an Application isn't synced to the current Freight indicates the Stage is unhealthy (even if the App is). I feel that's an important check to continue doing periodically.

wdyt?

@jessesuen
Copy link
Member Author

think that checking sync state should happen in both places

I didn't consider moving the sync state check into promotion. I think that would be fine to have that in promotion as well, because we could clearly indicate why the promotion job failed in the status message, and have a record of it.

@jessesuen jessesuen added this to the v0.6.0 milestone Mar 26, 2024
@hiddeco hiddeco self-assigned this Apr 2, 2024
@hiddeco
Copy link
Contributor

hiddeco commented Apr 2, 2024

Based on a lot of reading today, I believe this is an opportunity to also address #1451.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment