Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs] Unify strategy descriptions and add Telemetry sections #2060

Merged
merged 30 commits into from
Apr 22, 2024
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
26f75ea
Unify Timeout, Retry
peter-csala Apr 15, 2024
40bf86f
Merge branch 'App-vNext:main' into unify-strategy-descriptions
peter-csala Apr 15, 2024
547553e
Add runtime to the wordlist
peter-csala Apr 15, 2024
950a84c
Fix heading
peter-csala Apr 15, 2024
fa7b6ba
Apply suggestions from code review
peter-csala Apr 15, 2024
8f6cbc5
Apply suggestions
peter-csala Apr 15, 2024
4c46442
Unify Fallback
peter-csala Apr 16, 2024
ac1a2bb
Unify Hedging
peter-csala Apr 16, 2024
44b9937
Fix linting issue related to italic usage
peter-csala Apr 16, 2024
fe6f713
Unify Rate limiter
peter-csala Apr 17, 2024
5ffd9c8
Apply suggestions from code review
peter-csala Apr 17, 2024
1a2452a
Use note instead of important
peter-csala Apr 17, 2024
661b8d5
Unify Circuit Breaker
peter-csala Apr 17, 2024
67e8331
Fix table formatting
peter-csala Apr 17, 2024
1b2d1d5
Apply suggestions from code review
peter-csala Apr 17, 2024
a046c8f
Fix table format
peter-csala Apr 17, 2024
94ee246
Add telemetry section to timeout
peter-csala Apr 18, 2024
3b9cab6
Remove unused variable
peter-csala Apr 18, 2024
e64edad
Add telemetry section to retry
peter-csala Apr 18, 2024
ccae18c
Update docs/strategies/timeout.md
peter-csala Apr 18, 2024
6a9a59f
Add telemetry section to fallback
peter-csala Apr 18, 2024
1a40202
Update docs/strategies/retry.md
peter-csala Apr 18, 2024
d81f3e4
Fix note section for fallback telemetry
peter-csala Apr 18, 2024
552a0e2
Add telemetry to rate limiter
peter-csala Apr 19, 2024
d6e1334
Add telemetry section to hedging
peter-csala Apr 19, 2024
eed98a8
Remove extra whitespace
peter-csala Apr 19, 2024
20fe470
Add telemetry section to circuit breaker
peter-csala Apr 19, 2024
6f10f8d
Fix cb telemetry events' severity
peter-csala Apr 19, 2024
fc494c3
Apply suggestions from code review
peter-csala Apr 22, 2024
899a5c4
Apply suggested changes
peter-csala Apr 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ rethrow
rethrows
retryable
reusability
runtime
saas
sdk
serializers
Expand Down
41 changes: 25 additions & 16 deletions docs/strategies/circuit-breaker.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,20 @@

## About

- **Options**:
- **Option(s)**:
- [`CircuitBreakerStrategyOptions`](xref:Polly.CircuitBreaker.CircuitBreakerStrategyOptions)
- [`CircuitBreakerStrategyOptions<T>`](xref:Polly.CircuitBreaker.CircuitBreakerStrategyOptions`1)
- **Extensions**: `AddCircuitBreaker`
- **Extension(s)**:
- `AddCircuitBreaker`
- **Strategy Type**: Reactive
- **Exceptions**:
- **Exception(s)**:
- `BrokenCircuitException`: Thrown when a circuit is broken and the action could not be executed.
- `IsolatedCircuitException`: Thrown when a circuit is isolated (held open) by manual override.

---

The circuit breaker **reactive** resilience strategy shortcuts the execution if the underlying resource is detected as unhealthy. The detection process is done via sampling. If the sampled executions' failure-success ratio exceeds a predefined threshold then a circuit breaker will prevent any new executions by throwing a `BrokenCircuitException`. After a preset duration the circuit breaker performs a probe, because the assumption is that this period was enough for the resource to self-heal. Depending on the outcome of the probe, the circuit will either allow new executions or continue to block them.

> [!NOTE]
> Be aware that the Circuit Breaker strategy [rethrows all exceptions](https://github.com/App-vNext/Polly/wiki/Circuit-Breaker#exception-handling), including those that are handled. A Circuit Breaker's role is to monitor faults and break the circuit when a certain threshold is reached; it does not manage retries. Combine the Circuit Breaker with a Retry strategy if needed.

Expand Down Expand Up @@ -91,23 +94,29 @@ new ResiliencePipelineBuilder<HttpResponseMessage>().AddCircuitBreaker(optionsSt

## Defaults

| Property | Default Value | Description |
| ----------------------- | -------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------ |
| `ShouldHandle` | Predicate that handles all exceptions except `OperationCanceledException`. | Specifies which results and exceptions are managed by the circuit breaker strategy. |
| `FailureRatio` | 0.1 | The ratio of failures to successes that will cause the circuit to break/open. |
| `MinimumThroughput` | 100 | The minimum number of actions that must occur in the circuit within a specific time slice. |
| `SamplingDuration` | 30 seconds | The time period over which failure ratios are calculated. |
| `BreakDuration` | 5 seconds | The time period for which the circuit will remain broken/open before attempting to reset. |
| `BreakDurationGenerator` | `null` | Enables adaptive adjustment of break duration based on the current state of the circuit. |
| `OnClosed` | `null` | Event triggered when the circuit transitions to the `Closed` state. |
| `OnOpened` | `null` | Event triggered when the circuit transitions to the `Opened` state. |
| `OnHalfOpened` | `null` | Event triggered when the circuit transitions to the `HalfOpened` state. |
| `ManualControl` | `null` | Allows for manual control to isolate or close the circuit. |
| `StateProvider` | `null` | Enables the retrieval of the current state of the circuit. |
| Property | Default Value | Description |
|--------------------------|---------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------|
| `ShouldHandle` | Any exceptions other than `OperationCanceledException`. | Defines a predicate to determine what results and/or exceptions are handled by the circuit breaker strategy. |
| `FailureRatio` | 0.1 | The failure-success ratio that will cause the circuit to break/open. `0.1` means 10% failed of all sampled executions. |
| `MinimumThroughput` | 100 | The minimum number of executions that must occur within the specified sampling duration. |
| `SamplingDuration` | 30 seconds | The time period over which the failure-success ratio is calculated. |
| `BreakDuration` | 5 seconds | Defines a **static** time period for which the circuit will remain broken/open before attempting to reset. |
| `BreakDurationGenerator` | `null` | This delegate allows you to **dynamically** calculate the break duration by utilizing information that is only available at runtime (like failure count). |
| `ManualControl` | `null` | If provided then the circuit's state can be manually controlled via a `CircuitBreakerManualControl` object. |
| `StateProvider` | `null` | If provided then the circuit's current state can be retrieved via a `CircuitBreakerStateProvider` object. |
| `OnClosed` | `null` | If provided then it will be invoked after the circuit transitions to either the `Closed` or `Isolated` states. |
| `OnOpened` | `null` | If provided then it will be invoked after the circuit transitions to the `Opened` state. |
| `OnHalfOpened` | `null` | If provided then it will be invoked after the circuit transitions to the `HalfOpened` state. |

> [!NOTE]
> If both `BreakDuration` and `BreakDurationGenerator` are specified then `BreakDuration` will be ignored.

---

> [!IMPORTANT]
> If the `MinimumThroughput` is not reached during the `SamplingDuration` then the `FailureRatio` is ignored.
> In other words, the circuit will not break even if all of the executions failed when their quantity is below the minimum throughput.

## Diagrams

### State diagram
Expand Down
56 changes: 46 additions & 10 deletions docs/strategies/fallback.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,19 @@

## About

- **Options**: [`FallbackStrategyOptions<T>`](xref:Polly.Fallback.FallbackStrategyOptions`1)
- **Extensions**: `AddFallback`
- **Strategy Type**: Reactive
- **Option(s)**:
- [`FallbackStrategyOptions<T>`](xref:Polly.Fallback.FallbackStrategyOptions`1)
- **Extension(s)**:
- `AddFallback`
- **Exception(s)**: -

---

The fallback **reactive** resilience strategy provides a substitute if the execution of the callback fails. Failure can be either an `Exception` or a result object indicating unsuccessful processing. Typically this strategy is used as a last resort, meaning that if all other strategies failed to overcome the transient failure you could still provide a fallback value to the caller.

> [!NOTE]
> In this document the *fallback*, *substitute*, and *surrogate* terms are used interchangeably.

## Usage

<!-- snippet: fallback -->
Expand Down Expand Up @@ -59,11 +66,40 @@ new ResiliencePipelineBuilder<UserAvatar>().AddFallback(optionsOnFallback);

## Defaults

| Property | Default Value | Description |
| ---------------- | -------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------- |
| `ShouldHandle` | Predicate that handles all exceptions except `OperationCanceledException`. | Predicate that determines what results and exceptions are handled by the fallback strategy. |
| `FallbackAction` | `Null`, **Required** | Fallback action to be executed. |
| `OnFallback` | `null` | Event that is raised when fallback happens. |
| Property | Default Value | Description |
|------------------|---------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------|
| `ShouldHandle` | Any exceptions other than `OperationCanceledException`. | Defines a predicate to determine what results and/or exceptions are handled by the fallback strategy. |
| `FallbackAction` | `Null`, **Required** | This delegate allows you to **dynamically** calculate the surrogate value by utilizing information that is only available at runtime (like the outcome). |
| `OnFallback` | `null` | If provided then it will be invoked before the strategy calculates the fallback value. |

## Telemetry

The fallback strategy reports the following telemetry events:

| Event Name | Event Severity | When? |
|--------------|----------------|----------------------------------------------------------|
| `OnFallback` | `Warning` | Just before the strategy calls the `OnFallback` delegate |

Here are some sample events:

```none
Resilience event occurred. EventName: 'OnFallback', Source: 'MyApplication/MyTestPipeline/MyFallbackStrategy', Operation Key: 'MyFallbackGuardedOperation', Result: '-1'

Resilience event occurred. EventName: 'OnFallback', Source: '(null)/(null)/Fallback', Operation Key: '', Result: 'Exception of type 'CustomException' was thrown.'
CustomException: Exception of type 'CustomException' was thrown.
at Program.<>c.<Main>b__0_3(ResilienceContext ctx)
...
at Polly.ResiliencePipeline.<>c__8`1.<<ExecuteAsync>b__8_0>d.MoveNext() in /_/src/Polly.Core/ResiliencePipeline.AsyncT.cs:line 95
```

> [!NOTE]
> Please note that the `OnFallback` telemetry event will be reported **only if** the fallback strategy provides a surrogate value.
>
> So, if the callback either returns an acceptable result or throws an unhandled exception then there will be no telemetry emitted.
>
> Also remember that the `Result` will be **always populated** for the `OnFallback` telemetry event.

For further information please check out the [telemetry page](../advanced/telemetry.html).

## Diagrams

Expand Down Expand Up @@ -298,7 +334,7 @@ return await fallback.ExecuteAsync(CallPrimary, CancellationToken.None);

### Nesting `ExecuteAsync` calls

Combining multiple strategies can be achieved in various ways. However, deeply nesting `ExecuteAsync` calls can lead to what's commonly referred to as _`Execute` Hell_.
Combining multiple strategies can be achieved in various ways. However, deeply nesting `ExecuteAsync` calls can lead to what's commonly referred to as *`Execute` Hell*.

> [!NOTE]
> While this isn't strictly tied to the Fallback mechanism, it's frequently observed when Fallback is the outermost layer.
Expand All @@ -323,7 +359,7 @@ return result;

**Reasoning**:

This is akin to JavaScript's [callback hell](http://callbackhell.com/) or _[the pyramid of doom](https://en.wikipedia.org/wiki/Pyramid_of_doom_(programming))_. It's easy to mistakenly reference the wrong `CancellationToken` parameter.
This is akin to JavaScript's [callback hell](http://callbackhell.com/) or *[the pyramid of doom](https://en.wikipedia.org/wiki/Pyramid_of_doom_(programming))*. It's easy to mistakenly reference the wrong `CancellationToken` parameter.

✅ DO

Expand Down
28 changes: 15 additions & 13 deletions docs/strategies/hedging.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,17 @@

## About

- **Options**: [`HedgingStrategyOptions<T>`](xref:Polly.Hedging.HedgingStrategyOptions`1)
- **Extensions**: `AddHedging`
- **Strategy Type**: Reactive
- **Option(s)**:
- [`HedgingStrategyOptions<T>`](xref:Polly.Hedging.HedgingStrategyOptions`1)
- **Extension(s)**:
- `AddHedging`
- **Exception(s)**: -

---

The hedging strategy enables the re-execution of a user-defined callback if the previous execution takes too long. This approach gives you the option to either run the original callback again or specify a new callback for subsequent hedged attempts. Implementing a hedging strategy can boost the overall responsiveness of the system. However, it's essential to note that this improvement comes at the cost of increased resource utilization. If low latency is not a critical requirement, you may find the [retry strategy](retry.md) is more appropriate.
The hedging **reactive** strategy enables the re-execution of the callback if the previous execution takes too long. This approach gives you the option to either run the original callback again or specify a new callback for subsequent *hedged* attempts. Implementing a hedging strategy can boost the overall responsiveness of the system. However, it's essential to note that this improvement comes at the cost of increased resource utilization. If low latency is not a critical requirement, you may find the [retry strategy](retry.md) more appropriate.

This strategy also supports multiple [concurrency modes](#concurrency-modes) for added flexibility.
This strategy also supports multiple [concurrency modes](#concurrency-modes) to flexibly tailor the behavior for your own needs.

> [!NOTE]
> Please do not start any background work when executing actions using the hedging strategy. This strategy can spawn multiple parallel tasks, and as a result multiple background tasks can be started.
Expand Down Expand Up @@ -59,14 +61,14 @@ new ResiliencePipelineBuilder<HttpResponseMessage>().AddHedging(optionsDefaults)

## Defaults

| Property | Default Value | Description |
|---------------------|----------------------------------------------------------------------------|------------------------------------------------------------------------------------------|
| `ShouldHandle` | Predicate that handles all exceptions except `OperationCanceledException`. | Predicate that determines what results and exceptions are handled by the retry strategy. |
| `MaxHedgedAttempts` | 1 | The maximum number of hedged actions to use, in addition to the original action. |
| `Delay` | 2 seconds | The maximum waiting time before spawning a new hedged action. |
| `ActionGenerator` | Returns the original callback that was passed to the hedging strategy. | Generator that creates hedged actions. |
| `DelayGenerator` | `null` | Used for generating custom delays for hedging. |
| `OnHedging` | `null` | Event that is raised when a hedging is performed. |
| Property | Default Value | Description |
|---------------------|--------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `ShouldHandle` | Any exceptions other than `OperationCanceledException`. | Defines a predicate to determine what results and/or exceptions are handled by the hedging strategy. |
| `MaxHedgedAttempts` | 1 | The maximum number of hedged actions to use, in addition to the original action. |
| `Delay` | 2 seconds | The maximum waiting time before spawning a new hedged action. |
| `ActionGenerator` | It returns the original callback that was passed to this strategy. | This delegate allows you to **dynamically** calculate the hedged action by utilizing information that is only available at runtime (like the attempt number). |
| `DelayGenerator` | `null` | This optional delegate allows you to **dynamically** calculate the delay by utilizing information that is only available at runtime (like the attempt number). |
| `OnHedging` | `null` | If provided then it will be invoked before the strategy performs the hedged action. |

You can use the following special values for `Delay` or in `DelayGenerator`:

Expand Down
Loading