Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canceling SQL Server query with while loop hangs forever #44

Open
madelson opened this issue Jan 28, 2018 · 38 comments · Fixed by #956
Open

Canceling SQL Server query with while loop hangs forever #44

madelson opened this issue Jan 28, 2018 · 38 comments · Fixed by #956
Labels
🐛 Bug! Issues that are bugs in the drivers we maintain.

Comments

@madelson
Copy link
Contributor

See StackOverflow post (https://stackoverflow.com/questions/48461567/canceling-query-with-while-loop-hangs-forever?noredirect=1#comment83955305_48461567) here for a full description of the issue.

Essentially, the issue is that for a certain query I am finding that calling CancellationTokenSource.Cancel() hangs indefinitely instead of canceling the query. The same query cancels instantly in SQL Server Management Studio. Here is code the reproduces the issue:

void Main()
{   
    var cancellationTokenSource = new CancellationTokenSource();
    var blocked = RunSqlAsync(cancellationTokenSource.Token);
    Console.WriteLine(blocked.Wait(TimeSpan.FromSeconds(1))); // false (blocked in SQL as expected)
    cancellationTokenSource.Cancel(); // hangs forever?!
    Console.WriteLine("Finished calling Cancel()");
    blocked.Wait();
}

public async Task RunSqlAsync(CancellationToken cancellationToken)
{
    var connectionString = new SqlConnectionStringBuilder { DataSource = @".\sqlexpress", IntegratedSecurity = true, Pooling = false }.ConnectionString;
    using (var connection = new SqlConnection(connectionString))
    {
        await connection.OpenAsync().ConfigureAwait(false);

        using (var command = connection.CreateCommand())
        {
            command.CommandText = @"
                WHILE 1 = 1
                BEGIN
                    DECLARE @x INT = 1
                END
            ";
            command.CommandTimeout = 0;
            Console.WriteLine("Running query");
            await command.ExecuteNonQueryAsync(cancellationToken).ConfigureAwait(false);
        }
    }
}
@EgorBo
Copy link
Member

EgorBo commented Jan 29, 2018

@madelson
Copy link
Contributor Author

@EgorBo interesting. Any thoughts as to why this particular query triggers the condition every time whereas most queries work fine?

@saurabh500
Copy link
Contributor

In this case the query is executed on the server, but it will never complete as it is an infinite loop.
In the client, after sending a query for execution, the client waits for an acknowledgement from the server by locking the TdsParserStateObject and then waiting for a network packet.
How cancellation works is, the client is responsible for making sure that it completes the existing requests and then sends a cancellation request to the server. Since the existing request is never going to complete, the cancellation will not go through either.
The hang happens in case of queries like above which are never expected to complete, or if you try to cancel an operations while streaming endless data from SqlServer (like XEvents) and try to close the reader before cancelling.

This is essentially a race condition between Reader.Close and Reader.Cancel

Looking at the code, this seems like a by design bug because of how the TDS protocol is behaving. @madelson Is this infinitely running query intentional? What is the real use case for this ?

@madelson
Copy link
Contributor Author

madelson commented Feb 3, 2018

@saurabh500 thanks for responding and adding more details. A few thoughts:

this seems like a by design bug because of how the TDS protocol is behaving

I don't think this is correct because the same query can be canceled using the synchronous cancel pattern and forgoing async IO. I don't see why it would be "designed" for cancellation to work using one cancellation approach but not the other. Here's a modified version of the above code which cancels exactly as expected:

void Main()
{   
    var cancellationTokenSource = new CancellationTokenSource();
    var blocked = RunSqlAsync(cancellationTokenSource.Token);
    Console.WriteLine(blocked.Wait(TimeSpan.FromSeconds(5))); // false (blocked in SQL as expected)
    cancellationTokenSource.Cancel(); // hangs forever?!
    Console.WriteLine("Finished calling Cancel()");
    blocked.Wait();
}

public async Task RunSqlAsync(CancellationToken cancellationToken)
{
    var connectionString = new SqlConnectionStringBuilder { DataSource = @".\sqlexpress", IntegratedSecurity = true, Pooling = false }.ConnectionString;
    using (var connection = new SqlConnection(connectionString))
    {
        await connection.OpenAsync().ConfigureAwait(false);

        using (var command = connection.CreateCommand())
        {
            command.CommandText = @"
                WHILE 1 = 1
                BEGIN
                    DECLARE @x INT = 1
                END
            ";
            command.CommandTimeout = 0;
            Console.WriteLine("Running query");
            // await command.ExecuteNonQueryAsync(cancellationToken).ConfigureAwait(false);
			var commandTask = Task.Run(() => command.ExecuteNonQuery());
			using (cancellationToken.Register(() => {
				while (!commandTask.IsCompleted)
				{
					command.Cancel();
					Thread.Sleep(1);
				}
			}))
			{
				await commandTask.ConfigureAwait(false);
			}
                }
	}
}

Is this infinitely running query intentional? What is the real use case for this ?

This is a toy example. The real use-case where I discovered this involved a query that does work in a loop. The loop will eventually terminate but it can take a long time to do under edge-case conditions. Therefore, the ability to cancel when the work isn't needed is important.

Another very plausible use-case for this could be building an SSMS-like application or even a service like StackExchange's Data Explorer where part of the functionality is issuing user-authored queries against a database. For such applications reliable abort behavior is very important.

The hang happens in case of queries like above which are never expected to complete, or if you try to cancel an operations while streaming endless data from SqlServer (like XEvents) and try to close the reader before cancelling

This doesn't feel intuitive to me. Queries that are taking forever (and might never complete) are exactly the kind of queries people want to cancel.

@madelson
Copy link
Contributor Author

@EgorBo @saurabh500 can we label this as a bug?

@saurabh500
Copy link
Contributor

@madelson Thanks for the explanation. We will be looking into this issue. The code samples and comparisons are appreciated.
We considered all issues as bugs and mark the feature requests with enhancement.

@madelson
Copy link
Contributor Author

@saurabh500 thanks. I thought I'd seen some issues with the label "bug" so I wanted to make sure that this wasn't being considered "by design" per some of the earlier discussion.

@madelson
Copy link
Contributor Author

Any update on this? It looks like many issues are indeed labeled as bugs (https://github.com/dotnet/corefx/labels/bug). Many in that list feel similar to this in nature.

@divega
Copy link

divega commented May 16, 2019

As recently announced in the .NET Blog, focus on new SqlClient features an improvements is moving to the new Microsoft.Data.SqlClient package. For this reason, we are moving this issue to the new repo at https://github.com/dotnet/SqlClient. We will still use https://github.com/dotnet/corefx to track issues on other providers like System.Data.Odbc and System.Data.OleDB, and general ADO.NET and .NET data access issues.

@divega divega transferred this issue from dotnet/corefx May 16, 2019
@madelson
Copy link
Contributor Author

Thanks for the update @divega . Now that there is renewed focus on SqlClient, can this be labeled as a bug? It's clearly buggy behavior and as more and more people shift to using async queries passing through the RequestAborted cancellation token from ASP.NET Core I anticipate that this will come up more frequently.

@David-Engel David-Engel added the 🐛 Bug! Issues that are bugs in the drivers we maintain. label May 21, 2019
@David-Engel David-Engel added this to the 1.1.0 milestone May 21, 2019
@divega
Copy link

divega commented May 22, 2019

The explanation at #44 (comment) makes me wonder if async (i.e. CancellationToken-based) cancellation should have a similar implementation to SqlCommand.Cancel(), e.g. using Attention messages.

However I think it in #109 we were seeing async cancellation work correctly, so I am not sure I understand this issue.

cc @Wraith2, @roji, @saurabh500

@Wraith2
Copy link
Contributor

Wraith2 commented May 22, 2019

This isn't the same issue as #109 and @EgorBo is spot on at pinpointing the problem. At the moment a lock is taken in EndExecuteNonQuery and then in TdsParserStateObject.Cancel the lock is attempted and cancellation isn't allowed if it can't be acquired.

If I take out that acquired check meaning that it'll take the lock if it can but will still cancel even if it can't get it then the test works. The rest of the manual test suite also passes.

@Wraith2 Wraith2 mentioned this issue Sep 29, 2019
@cheenamalhotra cheenamalhotra modified the milestones: 1.1.0, 1.2.0 Nov 20, 2019
@Samirat
Copy link

Samirat commented Dec 16, 2019

It's not clear to me what the upshot of this is. It sounds like both CancelAsync and CancellationToken based cancellation are deadlocks waiting to happen? What's the recommended way to cancel an ongoing SQL query, preferably asynchronously?

@Wraith2
Copy link
Contributor

Wraith2 commented Dec 16, 2019

The fix was blocked by #248 which is still waiting.

@madelson
Copy link
Contributor Author

madelson commented Mar 7, 2020

@Wraith2 the PR you linked is now merged. Does that fix this issue or just open the door for fixing it?

@keivansahebdelfard
Copy link

keivansahebdelfard commented Mar 2, 2021

declare @index int ;
set @index = 3;
declare @maxIndex int
declare @id nvarchar(10);
set @maxIndex = (select count(*) from aspnetusers);
while (@index <= @maxIndex)
begin 
set @id = (select Id from (select ROW_NUMBER() OVER(ORDER BY (SELECT 1)) as [row],Id from aspnetusers) as tbl where [row] = @index)
print 'id : '+ @id
set @index = @index+1;
end

@Wraith2
Copy link
Contributor

Wraith2 commented Mar 8, 2021

Is this even allowed?

With waits yes it's allowed because they're serialized. Without waits no in both synchronous and asynchronous modes because you're only allowed to have one owner per parser and the reader takes ownership as it executes. Moreover there's plenty of checking at call time to make sure there's only one asynchronous operation of any type executing at any time. Multi threading in both scenarios is supported through cheap creation of multiple connections and linked objects.

The locking is to ensure that only one thread is mutating the parser and state object at any given time because the state transitions cause multiple variables to be changes you can't have partial interleaving. You can do this in a course way with locking which just prevents entry to the critical section but incurs waits or you can do it with interlock gates which prevents the wait but means you have to be really careful about ordering of gate checks. The locking is far easier to understand and work with.

@cheenamalhotra
Copy link
Member

cheenamalhotra commented Mar 8, 2021

It's not allowed with waits too. You have to read and close first DataReader and then execute again.
Technically if that was an issue - you could execute same command from 2 threads synchronously too - but that's just Bad Practice.

And let's say you locked the command, and allow 1 execute at a time. Without reading results whenever lock is released and query is re-executed, it's not going to work anyways.

@Wraith2
Copy link
Contributor

Wraith2 commented Mar 8, 2021

It depends on exactly what RunSqlAsync does but I agree with your description. From my reading of the code in the past executing two sync commands from two threads concurrently will throw an exception when the state object is found to be owned by another reader but as you say it's a Bad Idea so lets not worry about that.

@roji
Copy link
Member

roji commented Mar 9, 2021

You can do this in a course way with locking which just prevents entry to the critical section but incurs waits or you can do it with interlock gates which prevents the wait but means you have to be really careful about ordering of gate checks. The locking is far easier to understand and work with.

I'm lacking most of the context here, but just wanted to mention that as long as no contention is anticipated, locks aren't expensive - taking a free lock is generally a very cheap operation. I wouldn't go into lock-free CompareExchange or similar for perf reasons unless contention is anticipated (which doesn't seem to be the case - this seems to be just a guard against invalid user behavior) and the perf benefits are demonstrated.

@Wraith2
Copy link
Contributor

Wraith2 commented Mar 9, 2021

It isn't really a problem of how expensive the locks are because as you say they're pretty cheap most of the time. The problem is that the EndExecute process inside the library is taking a lock very early and holding it for a long time and because that same lock is required for cancellation it prevents cancel, it effectively serializes it to occur after the wait so you'll get a cancellation but it won't happen until the query is finished which isn't the intention.

the compareexchange isn't for perf it's just for making the state transition atomic and preventing both EndExecuteInternal (which is Close from the data reader confusingly) and Cancel can't both be executing in a way that they think they've succeeded. Instead of locking to serialize we use the successful change of state to identify the "winner" and let them execute.

@katwan
Copy link

katwan commented Mar 16, 2021

We are having a similar issue but with Reader.ExecuteReadAsync which hangs without respecting cancellationToken we pass to it when reading large rows. Any suggestions on the workaround before fixing is in place?

@Wraith2
Copy link
Contributor

Wraith2 commented Mar 16, 2021

There isn't a workaround because the cancellation isn't respected. The PR's that might fix the problem are in review.

@madelson
Copy link
Contributor Author

@katwan the only workaround I know of is to use the synchronous ExecuteReader API with SqlCommand.Cancel()

@madelson
Copy link
Contributor Author

madelson commented Sep 26, 2021

@cheenamalhotra @Wraith2 @smartguest what version of Microsoft.Data.SqlClient is this fixed in? I'm a bit confused because the related PRs are in a different repo.

@Wraith2
Copy link
Contributor

Wraith2 commented Sep 26, 2021

@cheenamalhotra
Copy link
Member

Reopening issue as the change was reverted in 4.0.0-preview3 (#1352)

@abatishchev
Copy link

But the change log for the 4.0 release says the issue was fixed. How come?

@JRahnama
Copy link
Contributor

@abatishchev this seems like an oversight. Thanks for finding it. We will fix the documentation.

@Greg-Smulko
Copy link

Any chance of having it re-triaged (or maybe even fixed ;) ), please?
It's a bit silly that we need to use sync methods of the SQL Client because async is broken...
@cheenamalhotra ?

@cheenamalhotra
Copy link
Member

Current alternative:

// Register cancellation token to cancel command after command timeout, e.g. 5 seconds like under:
CancellationTokenSource source = new();
source.CancelAfter(5000);
source.Token.Register(() =>
{
    Console.WriteLine("Cancelling operation.");
    sqlCommand.Cancel();
});

// Instead of calling:
await sqlcommand.ExecuteReaderAsync(source.Token); // which gets stuck due to this issue

// Call sync API wrapped in a task that will cancel with token as well:
await Task.Run(() => sqlCommand.ExecuteReader(), source.Token))

// or simply call:
sqlCommand.ExecuteReader(); // as command will be canceled with token canceling this execution.

@Greg-Smulko
Copy link

Current alternative:

Thank you for your response @cheenamalhotra!

But, with what you're proposing, doesn't it mean that we'd have a thread blocked on each sync ExecuteReader(), risking thread exhaustion and higher memory pressure?

@marcOcram
Copy link

I just stumbled upon this issue as well. I have a command with an infinite timeout doing WAITFOR (RECEIVE ...), TIMEOUT 1000 in a loop until it receives something. I am unable to cancel it.

Thread 35048:

Microsoft.Data.SqlClient.dll!Microsoft.Data.SqlClient.TdsParserStateObject.ReadSniSyncOverAsync
[Managed to Native Transition]
Microsoft.Data.SqlClient.dll!Microsoft.Data.SqlClient.TdsParserStateObject.ReadSniSyncOverAsync()
Microsoft.Data.SqlClient.dll!Microsoft.Data.SqlClient.TdsParserStateObject.TryPrepareBuffer()
Microsoft.Data.SqlClient.dll!Microsoft.Data.SqlClient.TdsParser.TryRun(Microsoft.Data.SqlClient.RunBehavior runBehavior, Microsoft.Data.SqlClient.SqlCommand cmdHandler, Microsoft.Data.SqlClient.SqlDataReader dataStream, Microsoft.Data.SqlClient.BulkCopySimpleResultSet bulkCopyHandler, Microsoft.Data.SqlClient.TdsParserStateObject stateObj, out bool dataReady)
Microsoft.Data.SqlClient.dll!Microsoft.Data.SqlClient.SqlDataReader.TryConsumeMetaData()
Microsoft.Data.SqlClient.dll!Microsoft.Data.SqlClient.SqlDataReader.MetaData.get()
Microsoft.Data.SqlClient.dll!Microsoft.Data.SqlClient.SqlCommand.FinishExecuteReader(Microsoft.Data.SqlClient.SqlDataReader ds, Microsoft.Data.SqlClient.RunBehavior runBehavior, string resetOptionsString, bool isInternal, bool forDescribeParameterEncryption, bool shouldCacheForAlwaysEncrypted)
Microsoft.Data.SqlClient.dll!Microsoft.Data.SqlClient.SqlCommand.CompleteAsyncExecuteReader(bool isInternal, bool forDescribeParameterEncryption)
Microsoft.Data.SqlClient.dll!Microsoft.Data.SqlClient.SqlCommand.InternalEndExecuteReader(System.IAsyncResult asyncResult, bool isInternal, string endMethod)
Microsoft.Data.SqlClient.dll!Microsoft.Data.SqlClient.SqlCommand.EndExecuteReaderInternal(System.IAsyncResult asyncResult)
Microsoft.Data.SqlClient.dll!Microsoft.Data.SqlClient.SqlCommand.EndExecuteReaderAsync(System.IAsyncResult asyncResult)
Microsoft.Data.SqlClient.dll!Microsoft.Data.SqlClient.SqlCommand.InternalExecuteReaderAsync.AnonymousMethod__201_1(System.IAsyncResult asyncResult)
System.Private.CoreLib.dll!System.Threading.Tasks.TaskFactory<System.__Canon>.FromAsyncCoreLogic(System.IAsyncResult iar, System.Func<System.IAsyncResult, System.__Canon> endFunction, System.Action<System.IAsyncResult> endAction, System.Threading.Tasks.Task<System.__Canon> promise, bool requiresSynchronization) Line 507
System.Private.CoreLib.dll!System.Threading.Tasks.TaskFactory<Microsoft.Data.SqlClient.SqlDataReader>.FromAsyncImpl.AnonymousMethod__0(System.IAsyncResult iar) Line 779
Microsoft.Data.SqlClient.dll!Microsoft.Data.SqlClient.SqlCommand.BeginExecuteReaderInternal.AnonymousMethod__184_3(System.Threading.Tasks.Task<object> task, object state)
System.Private.CoreLib.dll!System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(System.Threading.Thread threadPoolThread, System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state) Line 264
System.Private.CoreLib.dll!System.Threading.Tasks.Task.ExecuteWithThreadLocal(ref System.Threading.Tasks.Task currentTaskSlot, System.Threading.Thread threadPoolThread) Line 2349
System.Private.CoreLib.dll!System.Threading.ThreadPoolWorkQueue.Dispatch() Line 913
System.Private.CoreLib.dll!System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() Line 102
...

Thread 34880:

Microsoft.Data.SqlClient.dll!Microsoft.Data.SqlClient.TdsParserStateObject.Cancel
Waiting on lock owned by Thread 35048, double-click or press enter to switch to thread]
Microsoft.Data.SqlClient.dll!Microsoft.Data.SqlClient.TdsParserStateObject.Cancel(object caller)
Microsoft.Data.SqlClient.dll!Microsoft.Data.SqlClient.SqlDataReader.Cancel(Microsoft.Data.SqlClient.SqlCommand command)
Microsoft.Data.SqlClient.dll!Microsoft.Data.SqlClient.SqlCommand.Cancel()
Microsoft.Data.SqlClient.dll!Microsoft.Data.SqlClient.SqlCommand.CancelIgnoreFailure()
Microsoft.Data.SqlClient.dll!Microsoft.Data.SqlClient.SqlCommand.CancelIgnoreFailureCallback(object state)
System.Private.CoreLib.dll!System.Threading.CancellationTokenSource.Invoke(System.Delegate d, object state, System.Threading.CancellationTokenSource source)
System.Private.CoreLib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state)
System.Private.CoreLib.dll!System.Threading.CancellationTokenSource.ExecuteCallbackHandlers(bool throwOnFirstException)
...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 Bug! Issues that are bugs in the drivers we maintain.
Projects
None yet
Development

Successfully merging a pull request may close this issue.