Skip to content

Commit

Permalink
docs: update retry docs (akuity#3065)
Browse files Browse the repository at this point in the history
Signed-off-by: Kent Rancourt <kent.rancourt@gmail.com>
  • Loading branch information
krancour authored Dec 5, 2024
1 parent 58588b8 commit cda7fee
Showing 1 changed file with 41 additions and 17 deletions.
58 changes: 41 additions & 17 deletions docs/docs/35-references/10-promotion-steps.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,26 +58,46 @@ steps:
input: ${{ outputs.alias.someOutput }}
```

### Step Retry

A step can be given a `retry` configuration to specify the number of `attempts`
it should make to complete successfully. By default, steps will not be retried
unless they imply polling behavior, like [`argocd-update`](#argocd-update) and
[`git-wait-for-pr`](#git-wait-for-pr) which will poll indefinitely until a
condition is met.
### Step Retries

When a step fails for any reason, it can be retried instead of immediately
failing the entire `Promotion`. An _error threshold_ specifies the number of
_consecutive_ failures required for retry attempts to be abandoned and the
`Promotion` to fail.

Independent of the error threshold, steps are also subject to a _timeout_. Any
step that doesn't achieve its goal within that interval will cause the
`Promotion` to fail. For steps that exhibit any kind of polling behavior, the
timeout can cause a `Promotion` to fail with no _other_ failure having occurred.

System-wide, the default error threshold is 1 and the default timeout is
indefinite. Thus, default behavior is effectively no retries when a step fails
for any reason and steps with any kind of polling behavior will poll
indefinitely _as long a no other failure occurs._

The implementations of individual steps can override these defaults. Users also
may override these defaults through configuration. In the following example, the
`git-wait-for-pr` step is configured not to fail the `Promotion` until three
consecutive failed attempts to execute it. It is also configured to wait a
maximum of 48 hours for the step to complete successfully (i.e. for the PR to be
merged).

```yaml
steps:
- uses: step-name
# ...
- uses: wait-for-pr
retry:
attempts: 3
errorThreshold: 3
timeout: 48h
config:
prNumber: ${{ outputs['open-pr'].prNumber }}
```

:::info
This feature was introduced in Kargo v1.1.0, and is still undergoing refinements
and improvements to better distinguish between transient and non-transient errors,
and to provide more control over retry behavior like backoff strategies or time
limits.
and improvements to better distinguish between transient and non-transient
errors, and to provide more control over retry behavior like backoff strategies
or time limits.
:::

## Built-in Steps
Expand Down Expand Up @@ -1290,10 +1310,11 @@ with a wide variety of external services.

:::note
An HTTP response that is not conclusively determined to have succeeded or failed
will result in the step reporting a result of `Running`. Kargo will retry such
a step on its next attempt at reconciling the `Promotion` resource. This will
continue until the step succeeds, fails, exhausts the configured maximum number
of retries, or a configured timeout has elapsed.
will result in the step reporting a result of `Running`. Kargo will
[retry](#step-retries) such a step on its next attempt at reconciling the
`Promotion` resource. This will continue until the step succeeds, fails,
exhausts the configured maximum number of retries, or a configured timeout has
elapsed.
:::

#### `http` Expressions
Expand Down Expand Up @@ -1379,13 +1400,16 @@ The step would succeed and produce the following outputs:

Building on the basic example, this configuration defines explicit success and
failure criteria. Any response meeting neither of these criteria will result in
the step reporting a result of `Running` and being retried.
the step reporting a result of `Running` and being retried. Note the use of
[retry](#step-retries) configuration to set a timeout for the step.

```yaml
steps:
# ...
- uses: http
as: cat-facts
retry:
timeout: 10m
config:
method: GET
url: https://www.catfacts.net/api/
Expand Down

0 comments on commit cda7fee

Please sign in to comment.