Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement MySqlException.IsTransient #849

Closed
roji opened this issue Jul 12, 2020 · 13 comments
Closed

Implement MySqlException.IsTransient #849

roji opened this issue Jul 12, 2020 · 13 comments

Comments

@roji
Copy link

roji commented Jul 12, 2020

dotnet/runtime#34817 introduces DbException.IsTransient for .NET 5.0, database-agnostic transient error detection. Not sure which transient errors Sqlite surfaces (e.g. locking/transactions?), these could be surfaced via this property.

MySQL and PostgreSQL both expose SQLSTATE codes directly from the database. If the actual codes and their transience overlap to a considerable amount, we should consider bringing some logic into DbException itself, i.e. provide a default implementation of IsTransient which identifies some hard-coded values in SqlState and returns true.

Here's the current Npgsql implementation for IsTransient.

@bgrainger
Copy link
Member

Similar to npgsql, this could probably just be driven from the ErrorCode property. Error codes that could be considered transient would include:

  • ConnectionCountError
  • UnableToConnectToHost
  • OutOfResources
  • DiskFull

Because the default value of IsTransient is false, false negatives will occur. Depending on how important it is to avoid false positives, CommandTimeoutExpired could also be included; it may be transient if the database is busy servicing other clients, but not if the query is inherently slow.

@roji
Copy link
Author

roji commented Jul 12, 2020

Which specific errors are classified as transient is of course a provider decision. However, the way I think about it is to err on the side of false positives, rather than conservatively erring on the side of false negatives. A false positive would typically mean a needless retry (which doesn't seem critical), whereas a false negative would mean throwing where a retry may have result in success.

@bgrainger
Copy link
Member

@roji What are your thoughts on IsTransient and transaction deadlock? For example, MySqlErrorCode.LockWaitTimeout and LockDeadlock have an explicit message from the server that says to retry: "Lock wait timeout exceeded; try restarting transaction" or "Deadlock found when trying to get lock; try restarting transaction". However, simply retrying the last command is usually insufficient to (correctly) resolve this; the entire transaction needs to be retried, which may need more complicated logic than just retrying the statement that generated an IsTransient exception.

@roji
Copy link
Author

roji commented Jul 12, 2020

That's an important question, I addressed it in the proposal:

In at least some cases, if an operation transiently fails within a transaction, the entire transaction must be rolled back and retried (including all previous operations). This would be out of scope of this proposal: DbException would only inform the consumers that the exception is transient, but the actual handling (rolling back, reissuing previous commands) would be the consumer's responsibility.

In other words, IsTransient=true doesn't in any way imply that the command alone should be retried for a possibly successful outcome - only that the error itself is (possibly) transient. It's indeed the responsibility of the layer above the ADO.NET provider to track commands within the transaction, and I'd expect that layer to rollback and retry the entire transaction when getting a transient error. This is exactly how EF Core behaves (also employing transaction savepoints).

Let me know if that makes sense to you.

@bgrainger
Copy link
Member

bgrainger commented Jul 12, 2020

Thanks, that's really helpful. (And sorry I missed that you had already explicitly addressed this.)

@roji
Copy link
Author

roji commented Jul 13, 2020

Sure thing, and starting out from the Pomelo list sounds like a good idea.

It's interesting that a MySqlErrorCode is used - does that mean there are two error codes in MySql, i.e. the 5-character SqlState and also this? I wonder what the pros/cons of each are.

@bgrainger
Copy link
Member

A user points out that this code path logs an error then throws an exception that will be IsTransient:

Log.Error("Session{0} connecting failed", m_logArguments);
throw new MySqlException(MySqlErrorCode.UnableToConnectToHost, "Unable to connect to any of the specified MySQL hosts.");

Logging an error (that may be passed through to the client's logs) may be too serious in cases when we expect that user code will retry to handle this error.

I guess philosophically it's going to come down to who sets the definition of "Error": MySqlConnector, or the consuming application, and what counts as an error?

@roji
Copy link
Author

roji commented Jul 15, 2020

It's definitely an interesting discussion, but it seems somewhat orthogonal to error transience. Even for a totally non-transient error, the application may catch the exception above in the stack and handle it (e.g. by connecting to another database), so it could be claimed that the low-level driver shouldn't be logging. However, that would basically mean that no driver error can ever be logged, since who knows whether the calling application consider it an error or not...

Pragmatically speaking, if a low-level component (like a DB driver) logs (and that generally seems like a good idea), I think it should do so regardless of what's going up the stack, and cannot assume that transient errors would actually be retried; it can't know if a retry strategy is configured. Users always have the option to selectively disable logging of certain messages based on rules configured in their logging library... It's also possible for the driver to be configurable as to whether it logs for transient errors, though I'd personally wait and see if lots of people actually request that.

@bgrainger
Copy link
Member

I used Pomelo's list as the definition for IsTransient.

@roji
Copy link
Author

roji commented Jan 6, 2021

Great to see this getting implemented... At some point when it's released, the Pomelo provider could simply call IsTransient on your exception.

@bgrainger
Copy link
Member

Yes; I opened PomeloFoundation/Pomelo.EntityFrameworkCore.MySql#1285 to track that.

@bgrainger
Copy link
Member

Added in 1.3.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants