Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock with SQL Server when inserting a large number of related rows #21899

Open
mrpmorris opened this issue Aug 1, 2020 · 63 comments
Open

Comments

@mrpmorris
Copy link

mrpmorris commented Aug 1, 2020

Description of problem

I've created a server-side system that is updated very heavily and runs on SQL Server on Azure. I am seeing deadlocks all over the place when inserting completely unrelated data even in the most simple application. Even when READ_COMMITTED_SNAPSHOT is enabled.

If this cannot be fixed we will have to switch to another persistence framework, which will cost us a lot of time and money and possibly cause our project to overrun (it is imperative this does not happen as there is a hard deadline to enter the market).

I am currently seeing this on locally run Azure functions updating a local SQL Server 2019 developer edition database.

Steps to reproduce

Create the following .NET Core Console app and run it

//Program.cs
	class Program
	{
		const int Tasks = 5;
		static async Task Main(string[] args)
		{
			var trigger = new ManualResetEvent(false);
			var readySignals = new List<ManualResetEvent>();
			var processingTasks = new List<Task>();
			foreach(int index in Enumerable.Range(1, Tasks))
			{
				var readySignal = new ManualResetEvent(false);
				readySignals.Add(readySignal);
				var task = CreateDataAsync(trigger, readySignal);
				processingTasks.Add(task);
				
			}
			WaitHandle.WaitAll(readySignals.ToArray());
			trigger.Set();
			await Task.WhenAll(processingTasks.ToArray());
			Console.WriteLine("Finished");
		}

		private static async Task CreateDataAsync(ManualResetEvent trigger, ManualResetEvent signalReady)
		{
			await Task.Yield();
			using (var context = new AppDbContext())
			{
				var incomingFile = new IncomingFile();
				for(int i = 1; i <= 1000; i++)
				{
					new IncomingFileEvent(incomingFile);
				}
				context.IncomingFile.Add(incomingFile);
				signalReady.Set();
				trigger.WaitOne();
				await context.SaveChangesAsync().ConfigureAwait(false);
			}
		}
	}
	public class AppDbContext : DbContext
	{
		public DbSet<IncomingFile> IncomingFile { get; set; }

		private static readonly DbContextOptions<AppDbContext> Options = CreateOptions();

		public AppDbContext() : base(Options)
		{
		}

		public static DbContextOptions<AppDbContext> CreateOptions()
		{
			var builder = new DbContextOptionsBuilder<AppDbContext>();
			builder.UseSqlServer("Server=DESKTOP-G05BF1U;Database=EFCoreConcurrencyTest;Trusted_Connection=True;");
			return builder.Options;
		}

	}
	public abstract class EntityBase
	{
		[Key]
		[DatabaseGenerated(DatabaseGeneratedOption.None)]
		public Guid Id { get; set; } = Guid.NewGuid();
	}
	public class IncomingFile : EntityBase
	{
		public virtual ICollection<IncomingFileEvent> Events { get; private set; } = new List<IncomingFileEvent>();
	}
	public class IncomingFileEvent : EntityBase
	{
		public Guid IncomingFileId { get; private set; }
		public virtual IncomingFile IncomingFile { get; private set; }

		[Obsolete("Serialization only")]
		public IncomingFileEvent() { }

		public IncomingFileEvent(IncomingFile incomingFile)
		{
			if (incomingFile is null)
				throw new ArgumentNullException(nameof(incomingFile));

			IncomingFile = incomingFile;
			IncomingFileId = incomingFile.Id;
			IncomingFile.Events.Add(this);
		}
	}
CREATE TABLE [dbo].[IncomingFile]
(
	[Id] UNIQUEIDENTIFIER NOT NULL, 
	[ConcurrencyVersion] RowVersion NOT NULL,
	CONSTRAINT [PK_IncomingFile] PRIMARY KEY CLUSTERED([Id])
)
GO

CREATE TABLE [dbo].[IncomingFileEvent]
(
	[Id] UNIQUEIDENTIFIER NOT NULL, 
	[ConcurrencyVersion] RowVersion NOT NULL,
	[IncomingFileId] UNIQUEIDENTIFIER NOT NULL,
	CONSTRAINT [PK_IncomingFileEvent] PRIMARY KEY CLUSTERED([Id]),
	CONSTRAINT [FK_IncomingFileEvent_IncomingFileId]
		FOREIGN KEY ([IncomingFileId])
		REFERENCES [dbo].[IncomingFile] ([Id])
)
GO

Having turned on READ_COMMITTED_SNAPSHOT I also tried every IsolationType available (except Chaos) and none solved the problem.

		public override async Task<int> SaveChangesAsync(bool acceptAllChangesOnSuccess, CancellationToken cancellationToken = default)
		{
			var trans = await Database.BeginTransactionAsync(System.Data.IsolationLevel.Snapshot).ConfigureAwait(false);
			int result = await base.SaveChangesAsync(acceptAllChangesOnSuccess, cancellationToken).ConfigureAwait(false);
			await trans.CommitAsync().ConfigureAwait(false);
			return result;
		}

Exception

System.InvalidOperationException
HResult=0x80131509
Message=An exception has been raised that is likely due to a transient failure. Consider enabling transient error resiliency by adding 'EnableRetryOnFailure()' to the 'UseSqlServer' call.
Source=Microsoft.EntityFrameworkCore.SqlServer
StackTrace:
at Microsoft.EntityFrameworkCore.SqlServer.Storage.Internal.SqlServerExecutionStrategy.d__72.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Runtime.CompilerServices.TaskAwaiter1.GetResult()
at Microsoft.EntityFrameworkCore.DbContext.d__54.MoveNext()
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult()
at EFCoreConcurrencyTest.Program.d__2.MoveNext() in C:\Users\x\source\repos\EFCoreConcurrencyTest\EFCoreConcurrencyTest\Program.cs:line 45
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.GetResult()
at EFCoreConcurrencyTest.Program.

d__1.MoveNext() in C:\Users\x\source\repos\EFCoreConcurrencyTest\EFCoreConcurrencyTest\Program.cs:line 28
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.GetResult()
at EFCoreConcurrencyTest.Program.(String[] args)

This exception was originally thrown at this call stack:
Microsoft.Data.SqlClient.SqlCommand.ExecuteDbDataReaderAsync.AnonymousMethod__164_0(System.Threading.Tasks.Task<Microsoft.Data.SqlClient.SqlDataReader>)
System.Threading.Tasks.ContinuationResultTaskFromResultTask<TAntecedentResult, TResult>.InnerInvoke()
System.Threading.Tasks.Task..cctor.AnonymousMethod__274_0(object)
System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, object)
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, object)
System.Threading.Tasks.Task.ExecuteWithThreadLocal(ref System.Threading.Tasks.Task, System.Threading.Thread)
System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(System.Threading.Tasks.Task)
System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task)
...
[Call Stack Truncated]

Inner Exception 1:
DbUpdateException: An error occurred while updating the entries. See the inner exception for details.

Inner Exception 2:
SqlException: Transaction (Process ID 52) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

Further technical details

EF Core version: 3.1.6
Database provider: Microsoft.EntityFrameworkCore.SqlServer
Target framework: .NET Core 3.1
Operating system: Windows 10
IDE: Visual Studio 2019 16.6.5

DB settings

USE [master]
GO

/****** Object:  Database [EFCoreConcurrencyTest]    Script Date: 02/08/2020 15:44:34 ******/
CREATE DATABASE [EFCoreConcurrencyTest]
 CONTAINMENT = NONE
 ON  PRIMARY 
( NAME = N'EFCoreConcurrencyTest', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQL\DATA\EFCoreConcurrencyTest.mdf' , SIZE = 73728KB , MAXSIZE = UNLIMITED, FILEGROWTH = 65536KB )
 LOG ON 
( NAME = N'EFCoreConcurrencyTest_log', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQL\DATA\EFCoreConcurrencyTest_log.ldf' , SIZE = 139264KB , MAXSIZE = 2048GB , FILEGROWTH = 65536KB )
 WITH CATALOG_COLLATION = DATABASE_DEFAULT
GO

IF (1 = FULLTEXTSERVICEPROPERTY('IsFullTextInstalled'))
begin
EXEC [EFCoreConcurrencyTest].[dbo].[sp_fulltext_database] @action = 'enable'
end
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET ANSI_NULL_DEFAULT OFF 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET ANSI_NULLS OFF 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET ANSI_PADDING OFF 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET ANSI_WARNINGS OFF 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET ARITHABORT OFF 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET AUTO_CLOSE OFF 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET AUTO_SHRINK OFF 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET AUTO_UPDATE_STATISTICS ON 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET CURSOR_CLOSE_ON_COMMIT OFF 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET CURSOR_DEFAULT  GLOBAL 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET CONCAT_NULL_YIELDS_NULL OFF 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET NUMERIC_ROUNDABORT OFF 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET QUOTED_IDENTIFIER OFF 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET RECURSIVE_TRIGGERS OFF 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET  DISABLE_BROKER 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET AUTO_UPDATE_STATISTICS_ASYNC OFF 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET DATE_CORRELATION_OPTIMIZATION OFF 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET TRUSTWORTHY OFF 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET ALLOW_SNAPSHOT_ISOLATION ON 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET PARAMETERIZATION SIMPLE 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET READ_COMMITTED_SNAPSHOT ON 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET HONOR_BROKER_PRIORITY OFF 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET RECOVERY FULL 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET  MULTI_USER 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET PAGE_VERIFY CHECKSUM  
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET DB_CHAINING OFF 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET FILESTREAM( NON_TRANSACTED_ACCESS = OFF ) 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET TARGET_RECOVERY_TIME = 60 SECONDS 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET DELAYED_DURABILITY = DISABLED 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET QUERY_STORE = OFF
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET  READ_WRITE 
GO
@ajcvickers
Copy link
Contributor

ajcvickers commented Aug 3, 2020

@mrpmorris What version of Microsoft.Data.SqlClient are you using? Some deadlock issues were fixed in the last few months, so make sure you are using 1.1.3 or 2.0.0. (This may require adding a specific package reference to your project.)

@mrpmorris
Copy link
Author

mrpmorris commented Aug 3, 2020

Thanks for your quick reply!

If I type SqlConnection and press F12 I see it is microsoft.data.sqlclient\1.0.19269.1

But I do not reference this package myself so it must be referenced indirectly by Microsoft.EntityFrameworkCore.SqlServer 3.1.6, or something else perhaps?

Could you tell me how to update to V2? Is it as simple as adding the nuget package to my project? It doesn't seem to be.

@ajcvickers
Copy link
Contributor

@mrpmorris

Is it as simple as adding the nuget package to my project?

It should be. For example:

  <ItemGroup>
    <PackageReference Include="Microsoft.EntityFrameworkCore.SqlServer" Version="3.1.6" />
    <PackageReference Include="Microsoft.Data.SqlClient" Version="2.0.0" />
  </ItemGroup>

@mrpmorris
Copy link
Author

@ajcvickers I tried adding 2.0.0 and also 1.1.3 but neither solved the problem.

@roji
Copy link
Member

roji commented Aug 4, 2020

I've managed to repro this based on your code (see self-contained repro below).

I'm not an expert in SQL Server or it's locking/deadlocking behavior, but there indeed seem to be cases where INSERTs alone can trigger a deadlock scenario (e.g. see this question). I don't think EF Core is doing anything wrong here, i.e. you'd be able to see the same errors happening if executing these inserts concurrently without EF Core.

As a workaround, you can enable retry on failure in your model's OnModelConfiguring - this will automatically retry applying the SaveChanges when a deadlock error occurs (I've confirmed this makes the code sample pass):

builder.UseSqlServer("Server=DESKTOP-G05BF1U;Database=EFCoreConcurrencyTest;Trusted_Connection=True;",
    o => o.EnableRetryOnFailure());

There may be other better solutions for avoiding the deadlock situation - I'd recommend searching for SQL Server solutions on this without any connection to EF Core.

Complete repro sample
class Program
{
    const int Tasks = 5;

    static async Task Main(string[] args)
    {
        await using var ctx = new AppDbContext();
        await ctx.Database.EnsureDeletedAsync();
        await ctx.Database.EnsureCreatedAsync();

        var trigger = new ManualResetEvent(false);
        var readySignals = new List<ManualResetEvent>();
        var processingTasks = new List<Task>();
        foreach(int index in Enumerable.Range(1, Tasks))
        {
            var readySignal = new ManualResetEvent(false);
            readySignals.Add(readySignal);
            var task = CreateDataAsync(trigger, readySignal);
            processingTasks.Add(task);

        }
        WaitHandle.WaitAll(readySignals.ToArray());
        trigger.Set();
        await Task.WhenAll(processingTasks.ToArray());
        Console.WriteLine("Finished");
    }

    private static async Task CreateDataAsync(ManualResetEvent trigger, ManualResetEvent signalReady)
    {
        await Task.Yield();
        using (var context = new AppDbContext())
        {
            var incomingFile = new IncomingFile();
            for(int i = 1; i <= 1000; i++)
            {
                new IncomingFileEvent(incomingFile);
            }
            context.IncomingFile.Add(incomingFile);
            signalReady.Set();
            trigger.WaitOne();
            await context.SaveChangesAsync().ConfigureAwait(false);
        }
    }
}

public class AppDbContext : DbContext
{
    public DbSet<IncomingFile> IncomingFile { get; set; }

    static ILoggerFactory ContextLoggerFactory
        => LoggerFactory.Create(b => b.AddConsole().AddFilter("", LogLevel.Information));

    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        => optionsBuilder
            .UseSqlServer(@"Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0")
            .EnableSensitiveDataLogging()
            .UseLoggerFactory(ContextLoggerFactory);
}

public abstract class EntityBase
{
    [Key]
    [DatabaseGenerated(DatabaseGeneratedOption.None)]
    public Guid Id { get; set; } = Guid.NewGuid();
}

public class IncomingFile : EntityBase
{
    public virtual ICollection<IncomingFileEvent> Events { get; private set; } = new List<IncomingFileEvent>();
}

public class IncomingFileEvent : EntityBase
{
    public Guid IncomingFileId { get; private set; }
    public virtual IncomingFile IncomingFile { get; private set; }

    [Obsolete("Serialization only")]
    public IncomingFileEvent() { }

    public IncomingFileEvent(IncomingFile incomingFile)
    {
        if (incomingFile is null)
            throw new ArgumentNullException(nameof(incomingFile));

        IncomingFile = incomingFile;
        IncomingFileId = incomingFile.Id;
        IncomingFile.Events.Add(this);
    }
}

@mrpmorris
Copy link
Author

Hi @roji

I have EnableRetryOnFailure in my app already, it doesn't fix the problem, it just causes it to happen more before it fails.

I wrote some code to do the same thing using SqlCommand and I don't get a deadlock.

@roji
Copy link
Member

roji commented Aug 4, 2020

I wrote some code to do the same thing using SqlCommand and I don't get a deadlock.

At the end of the day, EF Core is just sending SqlCommands here - can you please turn on EF logging and compare what it sends to your own SqlCommand attempt? My "complete repro sample" above does this (it's very close to your original code).

If you really can't repro this with raw SqlCommand, and are convinced EF is doing something specific here which triggers the deadlock, I can also look at reproducing with SqlCommand.

@mrpmorris
Copy link
Author

EFCore creates temporary tables and merges them, this is something I didn't do in my SqlCommand code. Perhaps that is the problem?

Running 64 concurrent tasks each with 3000 child rows caused a deadlock for me in EFCore but not using SqlCommand.

Does this help?

public static async Task SaveToSqlServer(IncomingFile incomingFile)
{
	using var connection = new SqlConnection(MSSqlConnectionString);
	await connection.OpenAsync().ConfigureAwait(false);
	var transaction = (SqlTransaction) await connection.BeginTransactionAsync(System.Data.IsolationLevel.ReadCommitted).ConfigureAwait(false);

	string fileCommandSql = "insert into IncomingFile (Id) values (@Id)";
	using var fileCommand = new SqlCommand(fileCommandSql, connection, transaction);
	fileCommand.Parameters.Add("@Id", System.Data.SqlDbType.UniqueIdentifier).Value = incomingFile.Id;
	await fileCommand.ExecuteNonQueryAsync().ConfigureAwait(false);

	string fileEventCommandSql = "insert into IncomingFileEvent (Id, IncomingFileId) values (@Id, @IncomingFileId)";
	using var fileEventCommand = new SqlCommand(fileEventCommandSql, connection, transaction);
	fileEventCommand.Parameters.Add("@Id", System.Data.SqlDbType.UniqueIdentifier);
	fileEventCommand.Parameters.Add("@IncomingFileId", System.Data.SqlDbType.UniqueIdentifier);
	foreach (IncomingFileEvent fileEvent in incomingFile.Events)
	{
		fileEventCommand.Parameters["@Id"].Value = fileEvent.Id;
		fileEventCommand.Parameters["@IncomingFileId"].Value = incomingFile.Id;
		await fileEventCommand.ExecuteNonQueryAsync().ConfigureAwait(false);
	}
	await transaction.CommitAsync().ConfigureAwait(false);
}

@roji
Copy link
Member

roji commented Aug 4, 2020

EFCore creates temporary tables and merges them, this is something I didn't do in my SqlCommand code. Perhaps that is the problem?

What are you referring to here? Looking at the logs produced by EF for the code sample, there's aren't any temporary tables or merges... I'd recommend turning on logging (again, look at my full code sample above) and comparing that to whatever you're doing with raw SqlCommand.

Note that as this is a concurrency issue, timing is extremely important here, and the SqlCommand sample should be as close as possible to the EF Core sample. For example, your original EF code above has await Task.Yield(), but I'm not seeing that in your SqlCommand (maybe because it's partial).

@AndriySvyryd
Copy link
Member

You can ensure EF doesn't create temporary tables by setting MaxBatchSize to 1

@mrpmorris
Copy link
Author

mrpmorris commented Aug 4, 2020

@roji In my real app I see EFCore is inserting into @inserted4 etc and then doing a MERGE into the target table. I am not seeing this in the repo app now I have enabled logging though. Not sure why this is, perhaps something to do with the size of the rows (two TEXT columns)?

SET NOCOUNT ON;
DECLARE @inserted0 TABLE ([Id] uniqueidentifier, [_Position] [int]);
MERGE [IncomingFileTransaction] USING (
VALUES (@p0, @p1, @p2, @p3, @p4, @p5, @p6, @p7, @p8, @p9, @p10, @p11, @p12, @p13, 0),
(@p14, @p15, @p16, @p17, @p18, @p19, @p20, @p21, @p22, @p23, @p24, @p25, @p26, @p27, 1),
(@p28, @p29, @p30, @p31, @p32, @p33, @p34, @p35, @p36, @p37, @p38, @p39, @p40, @p41, 2),
(@p42, @p43, @p44, @p45, @p46, @p47, @p48, @p49, @p50, @p51, @p52, @p53, @p54, @p55, 3),
etc

@AndriySvyryd That prevents the bug, but it takes over twice as long to insert. I can set the max batch size to 64 and it is close to the SqlCommand speed and doesn't error - but how can I know which setting is safe to use across my whole DB?

@AndriySvyryd
Copy link
Member

@mrpmorris You'll just have to fine-tune it for your scenario. In EF Core 5.0 we'll set it to 42 by default as that is close to the sweet spot in typical scenarios.

@mrpmorris
Copy link
Author

@AndriySvyryd How is that tuning done? Simply changing it and running lots of scenarios?

@mrpmorris
Copy link
Author

mrpmorris commented Aug 6, 2020

I cannot possibly emulate every scenario of a production system. With so many different people uploading so many data files there are just too many combinations.

This concerns me. It means I have to test as much as I can, release the app into production, and then keep watching it and releasing a new version every time I want to tweak the value.

It's a magic number.

@roji Could you please explain why this ticket is considered closed-external?

@ErikEJ
Copy link
Contributor

ErikEJ commented Aug 6, 2020

Maybe consider using SqlBulkCopy ?

@mrpmorris
Copy link
Author

Hi @ErikEJ Is that a suggestion for me or the EFCore team? I'm using EFCore SaveChangesAsync to persist domain objects, the SqlCommand code was just to compare behaviour.

@AndriySvyryd
Copy link
Member

@mrpmorris There isn't much else we can do here. Perhaps with #18990 the performance would be good enough that we don't need to create a temporary table. But until then you'll have to pick a conservative batch size.

@roji
Copy link
Member

roji commented Aug 6, 2020

@mrpmorris I'm not an expert here so I may well be missing something - would love to be corrected if so - but here's my reasoning.

The code sample in #21899 (comment) (which is almost identical to your own original code sample) reliably reproduces a deadlock, but inspecting the actual SQL shows only plain old INSERT statements - no temporary tables or merging. Since it's possible to trigger deadlocks with such basic SQL on SQL Server, I think it makes sense to consider this as non-EF-related, hence the closed-external label.

Now, in other scenarios EF Core indeed makes use of temporary tables and merging, and the fact that it does so might increase the likelihood of deadlocks - or it may not. If I didn't have a deadlock repro with plain INSERTs, it may have made sense to consider not using merging, but at the moment it seems like the deadlock may occur in any case.

I hope the above makes sense. At the end of the day EF is only sending SQL statements to SQL Server; at the moment I can't see anything inherently wrong in what EF is sending, and the problem you're encountering seems like it's reproducible without using EF at all. But if you see this differently, I'd be happy to know more.

(BTW note that the issue itself isn't closed yet, even though I put closed-external on it - that means we're still discussing it and haven't yet reached a final conclusion).

@mrpmorris
Copy link
Author

Can you suggest where I can report an issue for SQL Server? I doubt I'll find it on GitHub :)

@roji
Copy link
Member

roji commented Aug 6, 2020

I'd be surprised if this is considered a bug in SQL Server, rather than expected behavior... I'd recommend first producing a repro which purely uses SqlCommands (based on the SQL EF produces for the repro above), and post that as a question on stackoverflow. Some people will probably be able to help you out (this question may be relevant as I wrote above).

Otherwise this is the official page for getting help for SQL Server.

@mrpmorris
Copy link
Author

This can't be expected behaviour for a world class db. It didn't happen in Firebird or Postgres.

@roji
Copy link
Member

roji commented Aug 6, 2020

@mrpmorris you may be right and we may be missing something here - all I know is that I saw the simple INSERTs from your EF repro triggering the deadlock, so the best way forward is to reproduce that without EF Core. If you can't manage to do that for some reason, I can try to help.

@mrpmorris
Copy link
Author

To make sure I wasn't muddying the waters at all I have just created a new console app inserting using SqlCommand. It uses the same code to ensure tasks start to insert data at the same time.

64 concurrent tasks inserting 1 parent row and 3000 child rows is taking a long (very long) time, but it hasn't errored yet.

using System;
using System.Collections.Generic;
using System.Data.SqlClient;
using System.Diagnostics;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;

namespace SqlServerDeadlockExample
{
	class Program
	{
		private const int NumberOfConcurrentTasks = 64;
		private const int NumberOfChildRows = 3_000;
		private const string MSSqlConnectionString = "Server=DESKTOP-G05BF1U;Database=EFCoreConcurrencyTest;Trusted_Connection=True;";

		static async Task Main(string[] args)
		{
			var trigger = new ManualResetEvent(false);
			var readySignals = new List<ManualResetEvent>();
			var processingTasks = new List<Task>();
			for(int index = 0; index < NumberOfConcurrentTasks; index++)
			{
				var readySignal = new ManualResetEvent(false);
				readySignals.Add(readySignal);
				var task = CreateDataAsync(trigger, readySignal);
				processingTasks.Add(task);
			}
			Console.WriteLine("Waiting for tasks to become ready");
			WaitHandle.WaitAll(readySignals.ToArray());
			Console.WriteLine("Saving data");
			var sw = Stopwatch.StartNew();
			trigger.Set();
			await Task.WhenAll(processingTasks.ToArray());
			sw.Stop();
			Console.WriteLine("Finished - " + sw.ElapsedMilliseconds);
		}

		private static int TaskNumber = 0;
		private static async Task CreateDataAsync(ManualResetEvent trigger, ManualResetEvent readySignal)
		{
			await Task.Yield();
			using var connection = new SqlConnection(MSSqlConnectionString);
			await connection.OpenAsync().ConfigureAwait(false);
			var transaction = (SqlTransaction)await connection.BeginTransactionAsync(System.Data.IsolationLevel.ReadCommitted).ConfigureAwait(false);
			Console.WriteLine("Task #" + Interlocked.Increment(ref TaskNumber) + " connection established and transaction started");

			readySignal.Set();
			trigger.WaitOne();
			Guid parentId = Guid.NewGuid();
			string fileCommandSql = "insert into IncomingFile (Id) values (@Id)";
			using var fileCommand = new SqlCommand(fileCommandSql, connection, transaction);
			fileCommand.Parameters.Add("@Id", System.Data.SqlDbType.UniqueIdentifier).Value = parentId;
			await fileCommand.ExecuteNonQueryAsync().ConfigureAwait(false);

			string fileEventCommandSql = "insert into IncomingFileEvent (Id, IncomingFileId) values (@Id, @IncomingFileId)";
			using var fileEventCommand = new SqlCommand(fileEventCommandSql, connection, transaction);
			fileEventCommand.Parameters.Add("@Id", System.Data.SqlDbType.UniqueIdentifier);
			fileEventCommand.Parameters.Add("@IncomingFileId", System.Data.SqlDbType.UniqueIdentifier);
			for(int index = 0; index < NumberOfChildRows; index++)
			{
				fileEventCommand.Parameters["@Id"].Value = Guid.NewGuid();
				fileEventCommand.Parameters["@IncomingFileId"].Value = parentId;
				await fileEventCommand.ExecuteNonQueryAsync().ConfigureAwait(false);
			}
			await transaction.CommitAsync().ConfigureAwait(false);
		}
	}
}

@roji
Copy link
Member

roji commented Aug 6, 2020

@mrpmorris I'll take a look in the next few days.

@roji
Copy link
Member

roji commented Aug 8, 2020

@mrpmorris your code sample sends separate commands for each child row, whereas EF Core inserts all child rows in a single INSERT command with multiple values.

The below reproduces the same deadlock without using EF Core:

Repro code without EF
private static async Task CreateDataWithSqlCommand(ManualResetEvent trigger, ManualResetEvent readySignal)
{
    await Task.Yield();
    using var connection = new SqlConnection(AppDbContext.ConnectionString);
    await connection.OpenAsync().ConfigureAwait(false);
    var transaction = (SqlTransaction)await connection.BeginTransactionAsync(System.Data.IsolationLevel.ReadCommitted).ConfigureAwait(false);

    readySignal.Set();
    trigger.WaitOne();
    Guid parentId = Guid.NewGuid();
    string fileCommandSql = "insert into IncomingFile (Id) values (@Id)";
    using var fileCommand = new SqlCommand(fileCommandSql, connection, transaction);
    fileCommand.Parameters.Add("@Id", System.Data.SqlDbType.UniqueIdentifier).Value = parentId;
    await fileCommand.ExecuteNonQueryAsync().ConfigureAwait(false);

    using var fileEventCommand = new SqlCommand
    {
        Connection = connection,
        Transaction = transaction
    };
    var commandTextBulder = new StringBuilder("INSERT INTO [IncomingFileEvent] ([Id], [IncomingFileId]) VALUES ");
    for (var i = 1; i <= NumberOfChildRows * 2; i += 2)
    {
        commandTextBulder.Append($"(@p{i}, @p{i+1})");
        if (i < NumberOfChildRows * 2 - 1)
            commandTextBulder.Append(',');

        fileEventCommand.Parameters.AddWithValue($"@p{i}", Guid.NewGuid());
        fileEventCommand.Parameters.AddWithValue($"@p{i+1}", parentId);
    }

    fileEventCommand.CommandText = commandTextBulder.ToString();
    await fileEventCommand.ExecuteNonQueryAsync().ConfigureAwait(false);
    await transaction.CommitAsync().ConfigureAwait(false);
}

I admit I'm also a bit surprised that this can cause SQL Server to deadlock - it would be good to understand exactly what's going on.

As @AndriySvyryd mentioned above, when we have a proper ADO.NET batching API (dotnet/runtime#28633), and it's implemented by SqlClient (dotnet/SqlClient#19), then the EF saving mechanism for SQL Server should be re-thought (#18990).

At least in theory, we may want to evaluate switching to (batched) command-per-row instead of a single command for all rows. The current mechanism probably also has the disadvantage of producing query plan cache pollution, when different numbers of rows are being inserted. But all this would need to be carefully analyzed and measured.

@roji roji changed the title Deadlock exceptions in EFCore SQL Server (URGENT) Deadlock with SQL Server when inserting a large number of related rows Aug 8, 2020
@mrpmorris
Copy link
Author

Thank you for the repro, @roji!

Thank you everyone, I really appreciate how much effort you have put into this on my behalf!

@ajcvickers
Copy link
Contributor

@ErikEJ Can you provide some more details about the required index?

@ErikEJ
Copy link
Contributor

ErikEJ commented Aug 14, 2020

Sure, as I briefly mentioned 6 days ago, there is no index on the foreign key, as also mentioned by Dan Guzman in his SO answer.

This will cause table scans, leading to deadlocks, as rows are added to the child table.

@roji
Copy link
Member

roji commented Aug 14, 2020

@ErikEJ I've added the following:

CREATE INDEX IX_FK ON IncomingFileEvent(IncomingFileId);

And I'm still getting the deadlock. Did I misunderstand? Below is the full repro without EF, can you tweak it to pass?

Full repro code
class Program
{
    const string ConnectionString =
        @"Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0";

    const int Tasks = 5;
    const int NumberOfChildRows = 1_000;

    static async Task Main(string[] args)
    {
        // Setup
        await using (var conn = new SqlConnection(ConnectionString))
        {
            await conn.OpenAsync();
            await using var cmd = conn.CreateCommand();
            cmd.CommandText = @"
IF OBJECT_ID('dbo.IncomingFileEvent', 'U') IS NOT NULL
DROP TABLE IncomingFileEvent;
IF OBJECT_ID('dbo.IncomingFile', 'U') IS NOT NULL
DROP TABLE IncomingFile;

CREATE TABLE IncomingFile (
Id uniqueidentifier NOT NULL,
CONSTRAINT PK_IncomingFile PRIMARY KEY (Id)
);
CREATE TABLE IncomingFileEvent (
Id uniqueidentifier NOT NULL,
IncomingFileId uniqueidentifier NOT NULL,
CONSTRAINT PK_IncomingFileEvent PRIMARY KEY (Id),
CONSTRAINT FK_IncomingFileEvent_IncomingFile_IncomingFileId FOREIGN KEY (IncomingFileId) REFERENCES IncomingFile (Id) ON DELETE CASCADE
);
CREATE INDEX IX_FK ON IncomingFileEvent(IncomingFileId);";
            await cmd.ExecuteNonQueryAsync();
        }

        var trigger = new ManualResetEvent(false);
        var readySignals = new List<ManualResetEvent>();
        var processingTasks = new List<Task>();
        for (var i = 0; i < Tasks; i++)
        {
            var readySignal = new ManualResetEvent(false);
            readySignals.Add(readySignal);
            var task = CreateDataWithSqlCommand(trigger, readySignal);
            processingTasks.Add(task);
        }
        WaitHandle.WaitAll(readySignals.ToArray());
        Console.WriteLine("Starting inserts...");
        trigger.Set();
        await Task.WhenAll(processingTasks.ToArray());
        Console.WriteLine("Finished");
    }

    private static async Task CreateDataWithSqlCommand(ManualResetEvent trigger, ManualResetEvent readySignal)
    {
        await Task.Yield();
        using var connection = new SqlConnection(ConnectionString);
        await connection.OpenAsync().ConfigureAwait(false);
        var transaction = (SqlTransaction)await connection.BeginTransactionAsync(IsolationLevel.ReadCommitted).ConfigureAwait(false);

        readySignal.Set();
        trigger.WaitOne();
        var parentId = Guid.NewGuid();
        var fileCommandSql = "INSERT INTO IncomingFile (Id) VALUES (@Id)";
        using var fileCommand = new SqlCommand(fileCommandSql, connection, transaction);
        fileCommand.Parameters.Add("@Id", SqlDbType.UniqueIdentifier).Value = parentId;
        await fileCommand.ExecuteNonQueryAsync().ConfigureAwait(false);

        using var fileEventCommand = new SqlCommand
        {
            Connection = connection,
            Transaction = transaction
        };
        var commandTextBulder = new StringBuilder("INSERT INTO [IncomingFileEvent] ([Id], [IncomingFileId]) VALUES ");
        for (var i = 1; i <= NumberOfChildRows * 2; i += 2)
        {
            commandTextBulder.Append($"(@p{i}, @p{i+1})");
            if (i < NumberOfChildRows * 2 - 1)
                commandTextBulder.Append(',');

            fileEventCommand.Parameters.AddWithValue($"@p{i}", Guid.NewGuid());
            fileEventCommand.Parameters.AddWithValue($"@p{i+1}", parentId);
        }

        // commandTextBulder.Append(" OPTION (LOOP JOIN)");
        fileEventCommand.CommandText = commandTextBulder.ToString();
        await fileEventCommand.ExecuteNonQueryAsync().ConfigureAwait(false);
        await transaction.CommitAsync().ConfigureAwait(false);
    }
}

@ajcvickers
Copy link
Contributor

@ErikEJ And if the schema was created from the EF model (i.e. Migrations) then we would have added that index, right? So this will only be an issue if the schema is created not from EF and the index isn't added?

@ErikEJ
Copy link
Contributor

ErikEJ commented Aug 14, 2020

@ajcvickers correct
@roji maybe generate proper INSERT statements also?

@roji
Copy link
Member

roji commented Aug 14, 2020

@ErikEJ how do you mean? Are you referring to generating separate INSERT statements per row instead of one statement for multiple rows?

@ErikEJ
Copy link
Contributor

ErikEJ commented Aug 14, 2020

@roji Yes, I mean a full old school INSERT statement per row

@roji
Copy link
Member

roji commented Aug 14, 2020

I think that's covered by option 2 under #21899 (comment) - this seems to degrade perf considerably (on SQL Server). But I don't see any connection to foreign indexes so far.

@mrpmorris
Copy link
Author

@ErikEJ I added an index to IncomingFileEvent.IncomingFileId and it still happened.

I also made the PKs non clustered, no change.

I also have the child table a composite PK and no change.

I'm in touch with MS support at the moment. I'll provide an update once we know what is happening.

@roji
Copy link
Member

roji commented Aug 14, 2020

Thanks @mrpmorris! Any further info on this would be useful indeed.

@mrpmorris
Copy link
Author

mrpmorris commented Aug 30, 2020

Reply from product group was the following

The deadlocks thrown by the code are due to locks being acquired during PK lookup operation to ensure referential integrity. As the code inserts thousands of records in a single batch instead of using Nested Loop join + Seek operator the optimizer decides to use Merge Join and Index Scan. The merge join plan is much cheaper from the cost perspective.

The solutions the customer may consider are:

  1. Eliminating the PK-FK relationship and ensure the integrity is maintained at the application level. Entire call is wrapped into a transaction and the GUIDs (PK for “IncomingFile” row as well as for “IncomingFileEvent”) are generated at the app level so it should do the trick.
  2. Keep the transaction as short as possible. It means they can consider splitting current transaction into 2-transactions (i.e., one inserting the data into “IncomingFile” and the next one inserting into “IncomingFileEvent”). This will ensure the locks on the “IncomingFile” are quickly released
  3. As mentioned in the thread the query hint OPTION (LOOP JOIN) can be used alternatively to enforce singleton lookups.
  4. In-Memory tables (SchemaAndData) as lock and latch free structures are excellent candidates for high transactional processing. With In-Memory tables they can continue using PK-FK relationship, no locks will be acquired but the table schema must be changed to eliminate RowVersion column.

My thoughts

  1. If we remove the R from RDBMS then we may as well go with MongoDB which is proving to be at least 4 times faster than SQL Server at inserting data directly (although no EFCore support yet).
  2. We cannot remove the A from ACID.
  3. This is what we are currently doing. Despite seeing warnings that it is slower I cannot actually see any degradation in performance. Maybe it is only slower when there is lots of data? I don't know.
  4. We cannot abandon the RowVersion column. It's an important part of our solution to ensure users do not overwrite each other's changes.

@roji
Copy link
Member

roji commented Aug 30, 2020

Thanks for posting these details... We'll take another look at this, especially once the better batching API is implemented in SqlClient.

@mrpmorris
Copy link
Author

@roji Are you referring to SqlBulkCopy?

If so, I tried it and it also suffers from the deadlock problem unless you disable referential integrity checks (which then runs the risk of invalid data).

@roji
Copy link
Member

roji commented Aug 31, 2020

@mrpmorris no, I'm referring to dotnet/runtime#28633, and to its planned implementation in SqlClient (dotnet/SqlClient#19). This hopefully would allow us to use one INSERT statement per row (but many of those batched in the same command), rather than one INSERT statement per multiple rows, without degrading performance. This may even improve performance as it would reduce query plan cache pollution - the current mechanism sends a different INSERT statement each time, since the number of rows is variable.

@mrpmorris
Copy link
Author

Thanks for the link @roji - It looks like this will make things faster, and I look forward to it, but I don't think it will fix the deadlock problem.

@roji
Copy link
Member

roji commented Sep 1, 2020

@mrpmorris I think we've already seen above (see #21899 (comment)) that when one-statement-per-row is used, the deadlock disappears - it only seems to happen when a single INSERT statement is used with many row VALUES, or am I mistaken?

@mrpmorris
Copy link
Author

@roji You are correct, but I can't help but be frightened :)

I really hope it works, good luck!

If I am reading this correctly, it should be in the November release. Is that correct?

@roji
Copy link
Member

roji commented Sep 1, 2020

I can't help but be frightened :)

With good reason :)

I really hope it works, good luck!

Me too. In any case we wouldn't be doing any changes in the EF Core SQL Server provider without carefully measuring and making sure it makes sense to switch.

If I am reading this correctly, it should be in the November release. Is that correct?

You're referring to the new batching API? No, that's definitely not going into 5.0 - but it's one of the things I want to push early for 6.0.

@mrpmorris
Copy link
Author

@roji @ajcvickers @ErikEJ

I've just finished my final support call with MS regarding this issue. I understand the cause of the problem, and hopefully I will use the correct terminology to describe it.

When SQL inserts into the child table it checks the referential integrity using one of two methods.

  1. If the referenced table does not have much data then it does a table scan and loads the keys into memory so it can perform checks quickly, this means that it uses an index scan which locks the index. This is what causes the deadlock.
  2. When the volume of data is high in the referenced table then it will not be cost efficient to store all the IDs in memory, so it does a loop join which will check the parent row exists once for every insert. Because no lock is required on the table/index there are no deadlocks.

The engineer said that batching will not solve this problem as it is related to how SQL Server checks referential integrity when inserting child rows rather than how the rows are being inserted.

So, that's the technical bit over, I'll offer my layman opinions :)

(EFCore)
This is going to be something that bites you in EFCore. Perhaps you'll need some way of allow the user to specify query options easily? Or maybe something in the docs that describes the problem and shows how to add an OPTION to queries?

(SQL Server)
It seems to me that optimising inserts for a low volume table isn't automatically a clever thing to do. SQL Server seems to be guessing the intended use of the table by how many rows of data it already has. It then assumes that because the table is low-volume it will not be inserted into very often (i.e. a practically static lookup table for country codes).

However, this logic obviously fails when the table is low volume due to being new, but will be high-volume as soon as it goes into production. All large tables are empty at some point.

If you have any channels open to you to influence SQL Server, I would like to make a suggestion. I would have mentioned it on the call, but it has only just come to me as I am writing this out.

The suggestion is to only use the MERGE JOIN (scan) option if the table is small enough AND the index hasn't been modified for X amount of time. This way SQL Server can be cautious and use LOOP JOIN when it suspects that the table is undergoing a period data changes, and use MERGE JOIN when no updates have taken place for a while. It's essentially a small window of time for disabling the use of MERGE JOIN on any table that has been updated recently. I think that would solve the problem not only for EFCore but for all users of the product.

@roji
Copy link
Member

roji commented Sep 9, 2020

@mrpmorris thanks again for continuing to follow up on this and for providing all the valuable info. Your explanation makes a lot of sense.

The engineer said that batching will not solve this problem as it is related to how SQL Server checks referential integrity when inserting child rows rather than how the rows are being inserted.

I'm still a bit confused by this - we've seen above that doing one-statement-per-row makes the deadlock go away. It could when all rows are in the same INSERT statement, the lock is held for much longer, while when doing one-statement-per-row it gets constantly released and retaken, avoiding the deadlock (or something along those lines...).

This is going to be something that bites you in EFCore. Perhaps you'll need some way of allow the user to specify query options easily? Or maybe something in the docs that describes the problem and shows how to add an OPTION to queries?

Yeah, we do have #6717 in general for tracking OPTIONS/hints - that is issue is more focused on queries, but also discusses general table hints. We can definitely work on docs/guidance if we see people hitting this more.

The suggestion is to only use the MERGE JOIN (scan) option if the table is small enough AND the index hasn't been modified for X amount of time.

I'm personally a bit skeptical of heuristic mechanisms such as this... This sounds like it would make the manifestation of the issue even more rare, which could also be viewed as a bad thing (as it's even harder to detect before production, repro reliably...).

@mrpmorris
Copy link
Author

mrpmorris commented Sep 9, 2020

@roji I don't know enough to comment myself on the batch insert which is why I asked the engineer, who said it wouldn't make a difference. I can double-check this if you wish?

As for my suggestion, SQL server is already making this assessment which is why it selects a MERGE JOIN (based on volume only). My suggestion would prevent a false positive in the case of high-volume tables that are new (and thus empty or small). E.g. only use MERGE JOIN if the table is empty or if it is small + hasn't been updated for a suitable amount of time.

@roji
Copy link
Member

roji commented Sep 9, 2020

I don't know enough to comment myself on the batch insert which is why I asked the engineer, who said it wouldn't make a difference. I can double-check this if you wish?

I definitely think there's a point here to be explored/understood, but we can also postpone the investigation and come back to it once the new batching API is actually implemented... Of course, if you have the spare time and are interested :)

@mrpmorris
Copy link
Author

mrpmorris commented Sep 9, 2020

I definitely think there's a point here to be explored/understood, but we can also postpone the investigation and come back to it once the new batching API is actually implemented... Of course, if you have the spare time and are interested :)

I am interested, and will make time :)

PS: You have all been extremely helpful. Thank you very much!

@roji
Copy link
Member

roji commented Sep 9, 2020

Thank you for being so involved in this, it's rare (and very valuable) to get this kind of very in-depth analysis!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants