SqlClient's transient error handling strategy going forward? #416

Plasma · 2020-02-09T06:01:20Z

Is your feature request related to a problem? Please describe.

Recently saw #307 being discussed but was closed in an effort to add some retry support to SqlClient.

This is very much needed, as I feel there is no out of the box retry support for SqlConnection / SqlClient today to combat transient connectivity issues.

Focusing specifically on connecting to Azure SQL, which has pretty terrible connectivity issues, where connections can break and commands can fail mid flight during maintenance operations. Just google "SQL Azure retries" and you will find an abundance of deprecated, copy/paste code, DIY strategies, etc.

As SqlClient / DbConnection / DbCommand / etc classes are either not inheritable or behind interfaces there are problems in implementing retry support today and handing off "resilient" connection instances to application and library code, where they don't need to make any code changes.

There seems to be some support in EF as per https://docs.microsoft.com/en-us/ef/ef6/fundamentals/connection-resiliency/retry-logic but otherwise you are on your own, as documented by https://docs.microsoft.com/en-us/azure/architecture/best-practices/retry-service-specific#sql-database-using-adonet

All examples of providing retry logic on Azure / MSDN for example are incomplete, not practical, or forget about library code. This unfortunately is due to SqlClient not being flexible enough to support this kind of logic.

Describe the solution you'd like

Simple, out of the box, "pit of success", connection and command execution resiliency support for SqlClient.

Automatically handle the generic and SQL Azure specific connectivity issues, with proper backoff support, etc, and optional events to notice when a connection reconnect was required to let the application make a decision and provide logging.

A perfect solution to me feels like simply passing in an IErrorHandlingStrategy interface (example name) to the constructor of SqlConnection, which is invoked when there are SQL errors during connection/command execution that require a decision to be made (retry? backoff?) based on the exception.

By default, we could have No Strategy (existing behavior), or Azure Strategy, which knows to handle its bunch of connectivity errors (of which there are many).

Then downstream code has no idea about the retry strategy. It needs no code changes. Applications dont need further changes either. It just works.

Describe alternatives you've considered

Manually wrapping our code in Polly retry blocks when hitting the database (via extension methods such that command.ExecuteAsync becomes command.ExecuteAndHandleErrorsAsync etc). This makes our code messier to write and also means downstream library code we use is not fixed as we cannot change what method they call.
Attempting to inherit SqlClient / SqlConnection / DbCommand / etc and implement retry logic by overriding appropriate methods. This is not possible because some classes are sealed, and some methods are not virtual.

Additional context

There is unfortunately no clear way to implement reliable resiliency logic today that flows between app code and library code seamlessly due to SqlClient and friends being sealed or not having appropriate methods override-able.

It would be fantastic to discuss what a reasonable approach could look like to implement retry support.

I appreciate it is very nuanced, for example, what if a command fails mid-transaction -- retrying via reconnect is not necessary a correct thing to do, but with perhaps appropriate hook points the application can make a decision.

The text was updated successfully, but these errors were encountered:

cheenamalhotra · 2020-02-11T20:12:55Z

Hi @Plasma

We discussed the ideas from proposed PR #307, and we have new design ideas that would be addressed by creating a pluggable add-on.

@scoriani would you like to share update regarding progress on the same?

scoriani · 2020-02-12T09:08:40Z

Thanks @cheenamalhotra!

Hey @Plasma, yes original PR was closed as we're working on a more robust and pluggable design for embedded retry logic. Basically, the ideas is to adopt a provider-based approach, where out-of-the-box there will be a "default" retry logic provider that can be associated to SqlConnection and SqlCommand operations, but anyone can implement his own provider with custom logic.
Default provider will come with some pre-defined strategies (e.g. fixed interval, exponential backoff, etc.) and error detection strategies, but these can be replaced/extended in custom providers.

Don't have a precise ETA yet, due to some internal re-prioritization happening, but when ready will create a new PR for review.
HTH

Plasma · 2020-02-12T10:22:07Z

That sounds great, thank you.

If I may, let me point out an edge case to consider, if an application gets a SqlConnection, and the app then initializes that connection (for example, to set the security/session context on the connection), if a DbCommand/SqlConnection disconnects and a retry is invoked, we would want a way to detect a connection has been re-established.

For example, perhaps a "On connection re-established" hook point also, so the necessary context can be (re)configured before the underlying command is retried (without the right context) - or something like that.

Thank you

scoriani · 2020-02-12T11:39:42Z

That is actually a great point that we're considering and, possibly, extend even further with the idea of a server-side session that can survive network disconnections, and client libraries that can automatically re-establish connectivity and "re-hydrate" existing context in a fully transparent mode. That said, "connection re-established" hook will give devs the ability to inject their own logic there.

roji · 2020-04-13T20:49:51Z

See the ongoing discussion about resilience in database drivers in general: dotnet/runtime#34817

Although it's very understandable to want a driver solution that "just works" without any application changes, it's important to realize that there are some real complications here. See especially dotnet/runtime#34817 (comment) and dotnet/runtime#34817 (comment). Specifically:

if you have issued 3 commands and the 4th fails due to error 1205 (result streaming) you have to retry all 4, if they're in the same transaction.

A driver-only solution would have to internally track all commands which have been executed against the transaction, and be able to retry them upon transient failure. Even then, what happens if the user has already read the results for the first three commands (and half of the fourth)? Does the driver read and discard all that information in the second pass, and somehow resume in the right place in the middle of the fourth resultset? What happens if the actual data in those resultsets has changed, since a new transaction was started and the database has been changed in the meantime?

These problems seem extremely complicated, and possibly insurmountable at the driver level, without any sort of application knowledge/handling.

rgarrison12345 · 2020-04-13T21:06:04Z

That is actually a great point that we're considering and, possibly, extend even further with the idea of a server-side session that can survive network disconnections, and client libraries that can automatically re-establish connectivity and "re-hydrate" existing context in a fully transparent mode. That said, "connection re-established" hook will give devs the ability to inject their own logic there.

Correct me if I'm wrong, but doesn't the DbConnection state change event already provide a way to inspect if the connection was previously open and is going to be closed (before and after states)? I would think the event handler would be where the developer would hook in and perform logic.

Is the idea to provide more contextual information to state change event or some other mechanism?

cheenamalhotra · 2021-06-07T18:55:09Z

Closing issue as Configurable Retry logic support is now released with v3.0.0-preview1.

Furthermore, seamless reconnection (connection resiliency) abilities during command execution are not supported by SqlClient as it also requires support from server, which is not available at the moment.

cheenamalhotra added the 💡 Enhancement New feature request label Feb 12, 2020

David-Engel mentioned this issue Apr 13, 2020

Database-agnostic way to detect transient database errors dotnet/runtime#34817

Closed

roji mentioned this issue Aug 15, 2020

Configurable Retry Logic - Preview 1 #693

Merged

cheenamalhotra closed this as completed Jun 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SqlClient's transient error handling strategy going forward? #416

SqlClient's transient error handling strategy going forward? #416

Plasma commented Feb 9, 2020

cheenamalhotra commented Feb 11, 2020

scoriani commented Feb 12, 2020

Plasma commented Feb 12, 2020

scoriani commented Feb 12, 2020

roji commented Apr 13, 2020

rgarrison12345 commented Apr 13, 2020

cheenamalhotra commented Jun 7, 2021

SqlClient's transient error handling strategy going forward? #416

SqlClient's transient error handling strategy going forward? #416

Comments

Plasma commented Feb 9, 2020

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

cheenamalhotra commented Feb 11, 2020

scoriani commented Feb 12, 2020

Plasma commented Feb 12, 2020

scoriani commented Feb 12, 2020

roji commented Apr 13, 2020

rgarrison12345 commented Apr 13, 2020

cheenamalhotra commented Jun 7, 2021