-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
System.Data.SqlClient 4.7.0 tends to deadlock on async #262
Comments
Unless you're using mars can you try turning it off? It significantly increases internal complexity on the managed implementation which is what's used on Unix. |
@Wraith2 would disabling MARS be useful to find a possible bug and to develop a fix? Asking, because I am happy with version 4.6.1 now and reproducing the issue is hard and would take some time I think the application is using MARS, but it could be changed. AFAIR there were contructs like this:
|
Ideally i'd like to fix the issue. Mars is just a suggestion, if you need that functionality then clearly you can't just give it a try because it'll require code rework. Without a stable repro it'll be hard to investigate so if you're happy with another version it's probably worth staying with it. |
This sounds like a regression to me introduced in 4.7.0, could you also try testing with We'd go down in history to find the buggy PR but also understand impacted areas. |
@Wraith2, @cheenamalhotra
|
This gave us headaches the whole last night... Microsoft.Data.SqlClient.SqlError: Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
at Microsoft.Data.SqlClient.SqlCommand.<>c.<ExecuteDbDataReaderAsync>b__164_0(Task`1 result)
at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location where exception was thrown ---
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
--- End of stack trace from previous location where exception was thrown ---
at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken)
at Microsoft.EntityFrameworkCore.Update.ReaderModificationCommandBatch.ExecuteAsync(IRelationalConnection connection, CancellationToken cancellationToken) At some time Kestrel doesn't even want to handle any connections anymore and the database together with the app is locked until the whole application is restarted. After disabling MARS everything is working fine again! EDIT: We also hadn't any issue when changing the connection string to target a Server 2017 (Developer) instead of Server 2016 (Standard). Maybe that is SQL Server Version related. |
Hi @alex-zyl I was able to reproduce the issue using your repro provided here: https://github.com/alex-zyl/EFCore3OnLinuxIssue on Azure Database. But in order to find the PR that caused this between 4.6.1 and 4.7.0, I'd need a repro using EF Core 2.2.6 and System.Data.SqlClient 4.7.0 if possible, as issue was introduced somewhere in between. Would you be able to provide one? |
@cheenamalhotra |
@cheenamalhotra |
Hi @alex-zyl It looks like problem starts from "4.7.0-preview.19073.11" version, before which everything seems to be fine. The PR culprit for this issue is likely to be dotnet/corefx#34184. I'd like a second opinion on this one, to confirm investigation, so please test it for me at your end too. The NuGet package with reverted changes is available here: Microsoft.Data.SqlClient.1.2.0-dev-debug.nupkg.zip Use steps from PR https://github.com/alex-zyl/EFCore3OnLinuxIssue/pull/1 to verify fix with this NuGet package. Thanks! |
That PR was reviewed by a couple of people very experienced with async programming and notes that it removes use of async void which is definately a bug. I don't think reverting it is the best course of action, for proving it's the error it's fine but long term no. Is it easy to reproduce using the example projct required? i'm a bit short on time at the moment but if it is i can take a look into it and see if i have any ideas. |
Hi Wraith, Yes it's easy to reproduce issue, with 'master' branch, once reproduced, you can build your own version of Microsoft.Data.SqlClient nuget package with assembly file version specified to same as in use with EF Core, and use it with PR 1 changes. I don't intend to revert the entire PR but I would prefer to look for and fix the bug in there, plz feel free to provide any ideas/suggestions that I can start with tomorrow. |
@cheenamalhotra |
@stephentoub @benaadams @saurabh500 Would like to grab your attention to identify issue in PR dotnet/corefx#34184 that causes deadlocks in Async Operations. We have repros available and reverting the methods |
There is a fix for one of the paths in #295; however that might need to be rolled out to the other changed methods in that PR dotnet/corefx#34184 ? |
Also as an aside the Streams in SqlClient don't look to override the newer async methods on Stream which these methods use (like virtual ValueTask<int> ReadAsync (Memory<byte> buffer, System.Threading.CancellationToken cancellationToken = null);
virtual ValueTask WriteAsync (ReadOnlyMemory<byte> buffer, System.Threading.CancellationToken cancellationToken = null); Also Not sure how important these are in the scheme of things though and what they are used for. |
The ssl stream passthrough is on my list of perf things to look into when I get chance. |
Tried that, doesn't seem to fix the issue.
These could be related, as the change PR dotnet/corefx#34184 brings is also changing async calls to replace I have a doubt: Is it just not possible to use I'm wondering if |
I can't recreate it with the current build and EFCore3Demo when running via visual studio (not via docker) Output being as follows
Using Is there another configuration I need to do? |
Note this is via Windows; maybe Linux is different? |
Ah, the fix in #295 would only effect .NET 5 with statemachine reuse, so unlikely to be this |
Hi @benaadams , Thanks for taking a look! The code modified in PR belongs to Managed SNI applicable only to Unix machines. So the issue won't reproduce on Windows client. |
@benaadams to get managed running on windows you can either sets the environment variable if it's a debug build or use private reflection to set the property. Lines 23 to 30 in 60fd1da
|
Only ValueTask for NetCore as it has the ValueTask and Memory/Span overloads in the BCL and further enhancements for the type, whereas netstandard/netfx doesn't. Also some of the other methods do more complicated things with the Task that ValueTask doesn't support Looking at the stacktraces they seem to be suck on
and looking at the method that would suggest they were stuck waiting for the lock? |
Fixes dotnet#262 by reverting dotnet/corefx#34184 until PR dotnet#335 is ready with complete fix.
Fixes #262 by reverting dotnet/corefx#34184 until PR #335 is ready with complete fix.
Microsoft.Data.SqlClient will contain the fix starting next preview release v2.0.0-preview1 planned mid-January, while for System.Data.SqlClient we've backported the fix and shall be hopefully available in upcoming releases. |
Since this is still crashing is there any chance of getting a memory dump of the deadlocked process? sysinternals procdump could get one. It's a long shot and difficult but someone might be able to identify the waits that are stuck. |
What would be the linux equivalent? On the Ubuntu build agent it locks quite predictable. I could give it a try |
https://docs.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-dump I think, I've not used it and i'm not sure the output would be compatible with windbg which is what i'd use to look at it. |
is there any workaround to fix issue? we have trouble at production. |
you can try one of the recent 2.0.0 builds which contain this PR #349 and see if it resolves the issue. |
where I can find artifacts? Not available in nuget. I think i should compile it myself. |
If this is an issue in 4.7.0; presumably going back to 4.6.1 should solve the issue as a workaround? |
I tried to downgrade to 4.6.1 System.Data.SqlClient from Microsoft.Data.SqlClient 1.1.0, seems deadlocks resolved. |
As of currently, we don't trigger nightly builds, you can build yourself using BUILDGUIDE.md until targeted release milestone completes and the release is published. |
Was experiencing random deadlock of various EF queries on a ASP.NET core 3.1 project when deployed to a docker container with mcr.microsoft.com/dotnet/core/aspnet:3.1 as base. Upgrading to Microsoft.Data.SqlClient 1.1.1 appears to have resolved the issue. |
I am experiencing this issue. I am using a .Net Standard 2.0 library referencing nuget packages: Microsoft.EntityFrameworkCore (3.1.24) Sounds like a fix was applied to SqlClient v2.0.0. cheenamalhotra are you able to confirm? SaveChanges works for me however SaveChangesAsync deadlocks. Some required functionality resides in a .Net Standard 2.0 library as it is needed by both a .Net Framework (4.7.2) application and a .Net Core 6.0 application. EntityFrameworkCore (3.1.24) was the last compatible version between all Net frameworks. After using SQL Profiler I can see the insert I am trying to perform leaves an open transaction on the server, inserts the record, but never returns. When I stop the web api application from running the transaction and insert rolls back. TIA, |
@4ndrewburrow you can just add an explicit reference to a newer version |
I am afraid I cannot provide a clean reproduction, but I want to share my experience, so that you could decide if this is an issue or not.
Turns out, that with the bump to 4.7.0 (from 4.6.1) our CI build deadlocks almost every run during unit tests. There is neither a problem on development environments nor on production.
The setup: a xunit test suite starts a .net core 2.1 service host as shared fixture to run some integration tests. part of the application stack is asp.net identity core. The whole environment is being started via
docker-compose
so that the dependent SQL Server is also available. A rather complete test database seeding is done during test initialization. Build agents are cheap Azure VMs of B2s size (2vCPUs, 4GB RAM) running Ubuntu 18.04 with docker.Since I was unable to dump/debug the locked process on the VM inside docker, I spent some time to reproduce the issue locally (i7-8850H, Archlinux, Jetbrains Rider). I finally made it by running
stress --cpu 12 --io 10 --vm 5 --hdd 1
, starting thesqlserver
process inside docker and limiting it viataskset -p 0x01 nnnn
to one CPU, and starting the integration test restricted to the same CPU directly viataskset 0x01 dotnet test
.Attaching a debugger on the locked process reveals the following stack trace (besides various others waiting on reset events, like the xunit runner, test diagnostics and Application insights):
the last frame on the stack resulting from our application code is this line:
Note that there is not a single
Task.Wait()
orTask.Result
in the whole code base, everything is "async all the way down". After suspecting the advanced parallelization features of xUnit as part of the problem, I disabled all test parallelization and lifted all restrictions on allowed threads, but without any effect. The SQL Server is only hit by one open connection, state "await command".Reverting the solution to System.Data.SqlClient 4.6.1 made all problems go away, there was not a single locking build since then, so I am pretty sure it's related.
Note: The solution is on a private GitHub Repo, but I'd be able to share an export or more details, when needed.
The text was updated successfully, but these errors were encountered: