Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change default reconnection policy to Exponential #1896

Merged
merged 12 commits into from
Nov 5, 2021

Conversation

lolodi
Copy link
Collaborator

@lolodi lolodi commented Oct 25, 2021

Change default reconnection policy to Exponential
Update the Exponential policy to take the base as parameter to allow for different growth rates

Description:
To speed up reconnections, and also avoid reconnection storms, we are changing the default reconnection policy to ExponentialRetry and reducing the deltaBackOff from the previous default of 5s to 2.5s.
This allows more frequent retries early on, when a reconnection is more probable, and less frequent retries later on when it has already been impossible for some time.
Another plus for the ExponentialRetry is the additional randomness added to the intervals so that we avoid reconnection storms.
Here is a comparison of the two retry policies.
image
image

And here is a real world example:
image
The randomized exponential (orange) is within the the minimum delta (yellow) and the maximum exponential (gray).
The retry happens when the timeout (blue) is greater than the retry limit (orange).

Update the Exponential policy to take the base as parameter to allow for different growth rate.
@@ -334,7 +334,7 @@ public bool PreserveAsyncOrder
/// <summary>
/// The retry policy to be used for connection reconnects
/// </summary>
public IReconnectRetryPolicy ReconnectRetryPolicy { get { return reconnectRetryPolicy ??= new LinearRetry(ConnectTimeout); } set { reconnectRetryPolicy = value; } }
public IReconnectRetryPolicy ReconnectRetryPolicy { get { return reconnectRetryPolicy ??= new ExponentialRetry(1000, ConnectTimeout*2, 1.5); } set { reconnectRetryPolicy = value; } }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExponentialRetry

What if we default to Exponential only for Azure connections? @NickCraver is that a good way to scope the impact?

/// <summary>
/// The default base of the exponential function. The higher the base the faster the growth of the exponential
/// </summary>
public const double DefaultExponentialBase = 1.1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're saying the default exponential base is 1.1, but it's actually 1.5 in practice because that's what we're passing in the default options. IMO there should be no difference here, the default in options should be the default here. Overall I worry this is hard to reason about - linear is straightforward but in this case we have 3 factors combined based on connect timeout but only kind of with a default in class and another value from options constructors coming in. If a user asks "what's the default policy?" we should be able to answer in a sentence or two max...I don't feel like we're there in this case.

Let's get closer at least by aligning on the default. Either it's 1.1 or it's 1.5, but in the case of existing tech we have 1.1 in a shipped version below and that's what it's always been. IMO, that's where we should stay in this release - if we want to observe and tweak in 3.x totally open to that, but we shouldn't change existing/expected behavior, IMO. Where are y'all at on this - thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we add a new class DefaultReconnectRetry, which uses ExponentialRetry, but applies the more aggressive 1.5 curve?

That way anyone who's using ExponentialRetry with its default 1.1 curve will be unaffected, but anyone who's not specifying a retry policy will get the benefit of the 1.5 exponential curve.

If that adds more complexity than it's worth, we could just switch the default to use ExponentialRetry with 1.1. That limits the benefit somewhat because 1.1 is pretty small, and doesn't provide much curve in the scale of number of retries needed for a typical reconnect. But it would be a step in the right direction, and make a switch to Exponential 1.5 in 3.x less abrupt.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on going with 1.1 for the moment, watching how it's handled and takes effect, with a plan for shifting to 1.5 in v3 (which isn't far off), good thing to pair with other larger changes where people are going to read notes much closer.

@philon-msft
Copy link
Collaborator

we can revert all changes in this file now


Refers to: src/StackExchange.Redis/ExponentialRetry.cs:2 in 033a09b. [](commit_id = 033a09b, deletion_comment = False)

@NickCraver
Copy link
Collaborator

This was a ways behind - merging main in so that we're good to go!

@NickCraver NickCraver merged commit 05dc252 into main Nov 5, 2021
@NickCraver NickCraver deleted the dev/lolodi/exponential-retry branch November 5, 2021 00:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants