Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExecuteScalar does not always raise an exception when selected as deadlock victim #458

Open
deadalusai opened this issue Mar 6, 2020 · 13 comments

Comments

@deadalusai
Copy link

deadalusai commented Mar 6, 2020

Describe the bug

We have encountered an issue in production where ExecuteScalar run in context of a RepeatableRead transaction intermittently fails to raise an exception when SQL Server selects it as a deadlock victim and rolls back the transaction. The transaction state is correctly set to Aborted, but the lack of an exception confuses application code and libraries like EF Core and who assume (fairly reasonably) that no exception means success.

Additionally, ExecuteScalar is returning the value that would be expected for a successful transaction even though the transaction has been rolled back.

The issue does not seem to occur when using ExecuteReader or ExecuteNonQuery.

I have a reduced repro case which consistently reproduces the error on SQL Server 12 and 13, tested across multiple machines. The issue reproduces with System.Data.SqlClient/dotnet core 2.1 and Microsoft.Data.SqlClient/dotnet core 3.1.

To reproduce

  1. Clone the repository https://github.com/deadalusai/SqlClient-Aborted-Transaction-Repro

  2. Create a new database using the ZombieDb.sql script

  3. Run the test program:

    For .NET Core 3.1 with Microsoft.Data.SqlClient:

    > dotnet run -p .\ZombieTester.NET31.csproj
    

    For .NET Core 2.1 with System.Data.SqlClient:

    > dotnet run -p .\ZombieTester.NET21.csproj
    
  4. Press any key to start testing for the error case. The test will run until it reproduces the issue or times out.

Expected behavior

A SqlException to be raised when a command is selected as a deadlock victim.

Further technical details

Microsoft.Data.SqlClient version: 1.1.1
System.Data.SqlClient version: 4.8.1
.NET target: .NET Core 3.1, .NET Core 2.1
SQL Server version: SQL Server 2017, SQL Server 2016
Operating system: Windows 10

@deadalusai
Copy link
Author

This error interacts poorly with EF Core which does not check the transaction state before executing the next command in the batch. Every command after the failed one is executed outside a transaction, thus committing only some of the changes in the batch with no possibility of rollback. There's no indication that the transaction has been aborted until transaction.Complete is called which raises an exception.

It would be nice if attempting to use an aborted/completed transaction on a command would raise an exception, but I guess this is a known limitation?

@cheenamalhotra
Copy link
Member

Hi @deadalusai

I was able to reproduce the issue with v1.1.1 but when I tried with latest 'master' branch and also with 2.0.0-preview1 version of driver, this issue is not reproducible. I think the PR that fixed async deadlock issues also covered this case (#349) ?

<PackageReference Include="Microsoft.Data.SqlClient" Version="2.0.0-preview1.20021.1" />

Could you give the new package a try and let us know?
I'm yet to confirm if this is the only case here.

@cheenamalhotra
Copy link
Member

However I do see normal exception captured as below:

Using Microsoft.Data.SqlClient
[100] Transaction state before first command: Active
[200] Transaction state before first command: Active
[100] Read SequenceId 58751
[100] Transaction state after first command: Active
[100] Transaction state after second command: Active
[100] Transaction state after commit: No transaction
[200] Failed with a normal exception
Error: Microsoft.Data.SqlClient.SqlException (0x80131904): Transaction (Process ID 53) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
   at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at Microsoft.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at Microsoft.Data.SqlClient.SqlDataReader.TryHasMoreRows(Boolean& moreRows)
   at Microsoft.Data.SqlClient.SqlDataReader.TryReadInternal(Boolean setTimeout, Boolean& more)
   at Microsoft.Data.SqlClient.SqlDataReader.Read()
   at Microsoft.Data.SqlClient.SqlCommand.CompleteExecuteScalar(SqlDataReader ds, Boolean returnSqlValue)
   at Microsoft.Data.SqlClient.SqlCommand.ExecuteScalar()
   at ZombieTester.Command.Execute(String keyValue) in D:\GitHub\Issues\458\repro\Program.cs:line 208
ClientConnectionId:830ce54b-b3a6-4c15-8592-9e7f960b8895
Error Number:1205,State:51,Class:13
Trapped escaped exception: Transaction (Process ID 53) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

@deadalusai
Copy link
Author

deadalusai commented Mar 7, 2020

Hi @cheenamalhotra, thanks for investigating.

I was able to reproduce the error using 2.0.0-preview1.20021.1:
(I've modified the program to print out the SqlClient assembly version)

Using Microsoft.Data.SqlClient 2.0.20021.1
[100] Transaction state before first command: Active
[200] Transaction state before first command: Active
[200] Read SequenceId 3768
[100] Read SequenceId 3767
[100] Transaction state after first command: Active
[200] Transaction state after first command: Aborted
[100] Transaction state after second command: Active
[200] Transaction state after second command: Aborted
[100] Transaction state after commit: No transaction
[200] Failed with problem exception
Error: System.InvalidOperationException: This SqlTransaction has completed; it is no longer usable.
   at Microsoft.Data.SqlClient.SqlTransaction.ZombieCheck()
   at Microsoft.Data.SqlClient.SqlTransaction.Commit()
   at ZombieTester.Command.Execute(String keyValue) in C:\Dev\SqlClient-Aborted-Transaction-Repro\Program.cs:line 194
Trapped escaped exception: This SqlTransaction has completed; it is no longer usable.
Results:
        [TableA] KEY_100
        [TableA] KEY_200
        [TableB] KEY_100

Halting (detected failure mode)

Edit: I reproduced the error against SQL Server 12.0.4100.1 - I'll attempt a repro on my other machine with a more recent version of SQL Server as well some time today.

@deadalusai
Copy link
Author

Reproduced using 2.0.0-preview1.20021.1 on a second machine against SQL Server 13.0.4001

@deadalusai
Copy link
Author

Hi @cheenamalhotra - any update on this issue?

An observation we made is that the command which ends up being chosen as deadlock victim and causing the transaction to be aborted also successfully returns a scalar result. See thread 200:

[100] Transaction state before first command: Active
[200] Transaction state before first command: Active
[200] Read SequenceId 3768
[100] Read SequenceId 3767
[100] Transaction state after first command: Active
[200] Transaction state after first command: Aborted

The SequenceId it returns is correct, though it never ends up committed to the table. This could point to why ExecuteScalar is not checking for an error?

@cheenamalhotra
Copy link
Member

After running multiple times I do see 3 mixed cases..

  1. No deadlock, post commit transaction state: No transaction
  2. Failed with a normal exception (0x80131904) as posted above
  3. Problem Case: Transaction state after second command: Aborted
    Failed with problem exception
    Error: System.InvalidOperationException: This SqlTransaction has completed; it is no longer usable.

On further testing, this is not only in .NET Core, but also reproducible with .NET Framework and with both drivers System.Data.SqlClient and Microsoft.Data.SqlClient.

However, I would also investigate more about executing multiple commands in "RepeatableRead" Isolation Level, and if this case is absolutely valid.

@deadalusai
Copy link
Author

Hi @cheenamalhotra - any update on this issue?

@cheenamalhotra
Copy link
Member

Hi @deadalusai

Thanks for your patience. We'll get back to investigating this soon!

@deadalusai
Copy link
Author

Hi @cheenamalhotra - any update on this issue?

@kentonbmax
Copy link

Any update or work around on this?

@deadalusai
Copy link
Author

deadalusai commented May 18, 2021

We've had some success with prepending something like the following before every insert/update/delete SQL operation:

IF (@@TRANCOUNT = 0) THROW 51000, 'Not running in a transaction.', 1;

For SqlClient you just prepend it to the command text.
For EF Core 3.0 you can use interceptors: https://docs.microsoft.com/en-us/ef/core/what-is-new/ef-core-3.x/#interception-of-database-operations

@vonzshik
Copy link

vonzshik commented Aug 3, 2021

Hello! Tested @deadalusai's example with SqlClient 3.0, the problem is still reproducible. Did a small research, and it looks like SqlClient reports the error depending on when exactly that error is read - either on SqlDataReader.Read (then TdsParser.TryRun is called with RunBehavior.ReturnImmediately) or on SqlDataReader.Close (then TdsParser.TryRun is called with RunBehavior.Clean, which prevents TdsParser.TryRun from throwing that error).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants