Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can the SqlServerTransientExceptionDetector be moved to System.Data.SqlClient? #39

Closed
divega opened this issue Oct 23, 2018 · 13 comments
Closed
Labels
💡 Enhancement New feature request

Comments

@divega
Copy link

divega commented Oct 23, 2018

From @avantida on October 18, 2018 8:17

The SqlServerTransientExceptionDetector class is really handy to detect if a Sql operation needs to be retried while building anything based on a SqlConnection.
It basically is the core component driving the SqlServerRetryingExecutionStrategy of .NET EF Core. (which used to be the AzureSqlExecutionStrategy in EF)

This logic is sort of replicated also in the Enterprise Library Transient Fault Handling Application Block and then again even in the Elastic database tools for Azure SQL Database

Moving this code 'upstream' would aid a lot to enable reliable operation with Azure Sql with any of the SQL compatible .NET Core Data Access tools out there.

Today we're not falling in the pit of success to run our code on Azure. (We're using Dapper @NickCraver )

Copied from original issue: dotnet/efcore#13665

@divega divega self-assigned this Oct 23, 2018
@divega
Copy link
Author

divega commented Oct 23, 2018

This issue was created on the EF Core issue tracking suggesting that at least part of the logic EF Core uses to detect transient errors from SQL Server should be generally useful for customers using SqlClient and therefore belongs more naturally in SqlClient, so moving it here for consideration by the SqlClient team.

I think this is a reasonable idea from the technical perspective: Instead of having multiple teams improving the list of transient errors and sometimes coordinating updates, we could have more of a centralized effort doing this, and a single, authoritative transient error detector for SQL Server in .NET that then multiple libraries could take advantage of to implement their retry logic.

For this to be practical thought, there has to be non-trivial coordination among all the teams currently maintaining one of these detectors, the Azure SQL Database team, and some team (presumably SqlClient, but perhaps some other team), would need to sign up to maintain a type that works for everyone else, is stable, is kept up to date when service changes, versions correctly, and works in all the .NET implementations where it is required.

@GSPP
Copy link

GSPP commented Oct 24, 2018

If this is put in System.Data.SqlClient then there will be a big compatibility burden in the future. It likely would be extremely difficult to ever change the detection again.

@divega
Copy link
Author

divega commented May 16, 2019

As recently announced in the .NET Blog, focus on new SqlClient features an improvements is moving to the new Microsoft.Data.SqlClient package. For this reason, we are moving this issue to the new repo at https://github.com/dotnet/SqlClient. We will still use https://github.com/dotnet/corefx to track issues on other providers like System.Data.Odbc and System.Data.OleDB, and general ADO.NET and .NET data access issues.

@divega divega transferred this issue from dotnet/corefx May 16, 2019
@divega divega removed their assignment May 16, 2019
@David-Engel David-Engel added the 💡 Enhancement New feature request label May 20, 2019
@cheenamalhotra cheenamalhotra added this to Under Investigation in SqlClient Triage Board Jul 11, 2019
@Plasma
Copy link

Plasma commented Dec 17, 2019

Really supportive of having this in the base package (or an extension package) with a production-ready integration working. The guidance on Azure website and MSDN is a total mess.

  • There was originally reference to TOPAZ library and ReliableSqlConnection, that is now obsolete.

  • The Azure docs provide a range of error codes to be retried, https://docs.microsoft.com/en-us/azure/sql-database/troubleshoot-connectivity-issues-microsoft-azure-sql-database#transient-fault-error-messages with no clear production-ready strategy to handle them (just wrap connection.Open? What about command execution? What about pooled connections?).

  • As mentioned in this ticket, there are multiple implementations of the same logic, but are tied to the higher-level frameworks (Elastic pools, EF) instead of being based on the low-level client

  • When SqlConnection was updated to support retries, it was flagged as the replacement of ReliableSqlConnection, but it doesn't actually work well, so back to square one.

  • SqlConnection and other classes are sealed, making it difficult to override methods to support some kind of retry, so the reliable connection can be provided to 3rd party code that expects SqlConnection etc

Any suggestions about where and how the retry logic should be placed for production-ready code would be appreciated, but going forward, this being built-in to the package itself would be ideal.

@scoriani
Copy link

scoriani commented Dec 17, 2019

There's work going on for this topic, see this PR (#307)

@Wraith2
Copy link
Contributor

Wraith2 commented Dec 17, 2019

When SqlConnection was updated to support retries, it was flagged as the replacement of ReliableSqlConnection, but it doesn't actually work well,

Can you expand on this? If there is already something in the library that's supposed to do the job but doesn't work i think it shoould be fixed if possible.

@scoriani
Copy link

Topaz (a.k.a. Transient Fault Handling Application Block) is not maintained anymore since quite some time and has been discontinued. Plus, the need for having retry logic capabilities embedded in client drivers is also helping customers with legacy apps that can't/don't want to change their codebase.

@ErikEJ
Copy link
Contributor

ErikEJ commented Dec 17, 2019

@Plasma
Copy link

Plasma commented Dec 17, 2019

When SqlConnection was updated to support retries, it was flagged as the replacement of ReliableSqlConnection, but it doesn't actually work well,

Can you expand on this? If there is already something in the library that's supposed to do the job but doesn't work i think it shoould be fixed if possible.

Hi @Wraith2 thanks for your interest.

  1. Here's some example guidance:

https://docs.microsoft.com/en-us/azure/sql-database/sql-database-connectivity-issues#net-sqlconnection-parameters-for-connection-retry

Note the discussion about using the ConnectRetryCount/Interval system.

This doesn't seem to be useful in Azure, we still get these errors:

SqlException: Database 'xxx on server 'xxx' is not currently available. Please retry the connection later. If the problem persists, contact customer support, and provide them the session tracing ID of 'xxx'. 
SqlException: Cannot open database "xxx" requested by the login. The login failed.Login failed for user 'xxx'. A severe error occurred on the current command. The results, if any, should be discarded.
SqlException: A transport-level error has occurred when sending the request to the server. (provider: TCP Provider, error: 0 - An existing connection was forcibly closed by the remote host.) 

These all happen during failover/reconfiguration/behind-the-scenes events in Azure.

The "The login failed" error for example is due to the temporarily moment there is no primary replica during a failover, etc.

All of these errors are frequent for a customer of Azure. Its really demoralizing. This is more of a criticism to Azure SQL, but as SqlConnection is the only place we can mitigate it, its depressingly difficult to work with.

We've been using ReliableSqlConnection from TOPAZ library for a few years which has shielded us from most of these, but as it is obsolete (and means we can't use async queries) we want to move away from it.

  1. Second, I think this only handles connections -- and won't do anything for mid-command execution, which can still error. TOPAZ library handled mid-flight query retries, so SqlClient/SqlConnection should too.

@avantida
Copy link

I'd like to revive this topic again, as we have been experiencing some issues with a 3rd party library, that is depending on SqlClient, but again is not Azure aware.

Also, I see there is another case added to the SqlServerTransientExceptionDetector class here: dotnet/efcore@a094aba

So EF will work, but raw SqlClient users (or any package using it) will need to find out the hard way.

@roji
Copy link
Member

roji commented May 27, 2020

Note dotnet/runtime#34817, which would introduce transient error categorization into ADO.NET itself, and allow SqlClient to internally determine which list of codes represent transient errors. I think that's the right way forward, rather then introducing an SqlClient-specific mechanism to do the same.

@avantida
Copy link

@roji Thanks for the heads up, read the (rather long) discussion there. And I agree that to be the way forward, and unfortunately with Azure SQL it still is required. It would indeed be a lot better if Azure SQL could handle the resilience better server-side, but I'm assuming this has historical and deep architectural reasons, not easily remediated.

I was also not aware of the ConnectRetry* properties for the connectionstring, and we will also give that a try.

@cheenamalhotra
Copy link
Member

As pointed by @roji above, dotnet/runtime#34817 is going to supported with #649 and #648 to introduce APIs on SqlException to detect transient errors. Closing this discussion.

SqlClient Triage Board automation moved this from Under Investigation to Closed Nov 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💡 Enhancement New feature request
Projects
Development

No branches or pull requests

10 participants