-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change default reconnection policy to Exponential #1896
Conversation
Update the Exponential policy to take the base as parameter to allow for different growth rate.
@@ -334,7 +334,7 @@ public bool PreserveAsyncOrder | |||
/// <summary> | |||
/// The retry policy to be used for connection reconnects | |||
/// </summary> | |||
public IReconnectRetryPolicy ReconnectRetryPolicy { get { return reconnectRetryPolicy ??= new LinearRetry(ConnectTimeout); } set { reconnectRetryPolicy = value; } } | |||
public IReconnectRetryPolicy ReconnectRetryPolicy { get { return reconnectRetryPolicy ??= new ExponentialRetry(1000, ConnectTimeout*2, 1.5); } set { reconnectRetryPolicy = value; } } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we default to Exponential only for Azure connections? @NickCraver is that a good way to scope the impact?
Introduce UseLinearReconnectPolicy in config
…nts to constants in ExponentialRetry
/// <summary> | ||
/// The default base of the exponential function. The higher the base the faster the growth of the exponential | ||
/// </summary> | ||
public const double DefaultExponentialBase = 1.1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're saying the default exponential base is 1.1, but it's actually 1.5 in practice because that's what we're passing in the default options. IMO there should be no difference here, the default in options should be the default here. Overall I worry this is hard to reason about - linear is straightforward but in this case we have 3 factors combined based on connect timeout but only kind of with a default in class and another value from options constructors coming in. If a user asks "what's the default policy?" we should be able to answer in a sentence or two max...I don't feel like we're there in this case.
Let's get closer at least by aligning on the default. Either it's 1.1 or it's 1.5, but in the case of existing tech we have 1.1 in a shipped version below and that's what it's always been. IMO, that's where we should stay in this release - if we want to observe and tweak in 3.x totally open to that, but we shouldn't change existing/expected behavior, IMO. Where are y'all at on this - thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we add a new class DefaultReconnectRetry
, which uses ExponentialRetry
, but applies the more aggressive 1.5 curve?
That way anyone who's using ExponentialRetry
with its default 1.1 curve will be unaffected, but anyone who's not specifying a retry policy will get the benefit of the 1.5 exponential curve.
If that adds more complexity than it's worth, we could just switch the default to use ExponentialRetry
with 1.1. That limits the benefit somewhat because 1.1 is pretty small, and doesn't provide much curve in the scale of number of retries needed for a typical reconnect. But it would be a step in the right direction, and make a switch to Exponential 1.5 in 3.x less abrupt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on going with 1.1 for the moment, watching how it's handled and takes effect, with a plan for shifting to 1.5 in v3 (which isn't far off), good thing to pair with other larger changes where people are going to read notes much closer.
This was a ways behind - merging main in so that we're good to go! |
Change default reconnection policy to Exponential
Update the Exponential policy to take the base as parameter to allow for different growth rates
Description:
To speed up reconnections, and also avoid reconnection storms, we are changing the default reconnection policy to ExponentialRetry and reducing the deltaBackOff from the previous default of 5s to 2.5s.
This allows more frequent retries early on, when a reconnection is more probable, and less frequent retries later on when it has already been impossible for some time.
Another plus for the ExponentialRetry is the additional randomness added to the intervals so that we avoid reconnection storms.
Here is a comparison of the two retry policies.
And here is a real world example:
The randomized exponential (orange) is within the the minimum delta (yellow) and the maximum exponential (gray).
The retry happens when the timeout (blue) is greater than the retry limit (orange).