Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invoke Argo Rollouts AnalysisRuns as part of Freight verification #1257

Closed
jessesuen opened this issue Dec 8, 2023 · 4 comments
Closed

Invoke Argo Rollouts AnalysisRuns as part of Freight verification #1257

jessesuen opened this issue Dec 8, 2023 · 4 comments

Comments

@jessesuen
Copy link
Member

jessesuen commented Dec 8, 2023

Proposed Feature

Today, verifying a Freight is equivalent to waiting for the Argo CD application (if specified) to reach a healthy state.

We want to extend this to do other things, including:

  • running user-defined tests/scripts (e.g. kicking off a Kubernetes Job that must return success)
  • querying metrics (prometheus, datadog, custom API endpoints, etc...) and ensuring KPIs are met

The way we should support the above is to re-use Argo Rollout's Analysis feature which does pretty much exactly what we want.

Motivation

Today, CI is commonly handling the testing aspect of post-promotion. If Kargo is to replace all facets of promotion in a deployment pipeline, it must be able to run user-defined tests.

Suggested Implementation

The approach will be very similar to our Argo CD integration, which does the following:

  • After freight is promoted, we sync the ArgoCD app
  • Then we wait until the app is healthy before marking the freight as verified in the stage

The Argo Rollouts Analysis would be done similarly:

  • After freight is promoted, and after the Argo CD app is healthy, we kick off an AnalysisRun
  • Then we wait until AnalysisRun is successful before marking the freight as verified in the stage

Kargo will need to add an analysis stanza to the stage spec, where it references one or more AnalysisTemplates:

      analysis:
        templates:
        - templateName: success-rate
        args:
        - name: service-name
          value: guestbook-svc.default.svc.cluster.local

When the Stage reconcile is trying to verify Freight after promotion, it would create an AnalysisRun from the referenced AnalysisTemplate(s). This is how it is done in Rollouts (we would have similar logic):
https://github.com/argoproj/argo-rollouts/blob/54d83d676b10d8246766bdbd5d407c0d39e3d9d0/utils/analysis/helpers.go#L291-L322

One difference from the above logic, is that we should avoid import dependency on Argo Rollouts. So the above logic will need to be translated to deal with unstructured objects, so we preserve all fields of an AnalysisTemplate when converting it to an AnalysisRun.

We would also need an informer on analysisruns that waits until AnalysisRun status.phase is completed (just like we have an informer on applications that waits until applications are healthy.

@geowalrus4gh
Copy link

geowalrus4gh commented Dec 11, 2023

We usually do it in the Rollout spec. Any reason not to do this in here and move it to kargo level ?

@jessesuen
Copy link
Member Author

I can think of several:

  1. Rollouts operate on a single workload and during the update. You may need analysis to be performed at a conceptually higher level (e.g. after multiple microservices are deployed and reach healthy, or after the whole Argo CD application becomes healthy)
  2. If you aren't using Argo Rollouts
  3. If promotion is not even Kubernetes based, but is promoting config for something else (e.g. terraform, uploading assets to CDN).

@geowalrus4gh
Copy link

The first point is super valid! at least in our case. Fingers crossed!

@krancour krancour added this to the v0.3.0 milestone Dec 23, 2023
@krancour
Copy link
Member

This was closed by #1259

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants