-
Notifications
You must be signed in to change notification settings - Fork 293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Making open operation async first #2384
Conversation
@dotnet-policy-service agree |
c4163ef
to
5fd6235
Compare
Making the open operation async first simplifies the OpenAsync logic and allows to use the same logic for sync Open, through GetAwaiter
5fd6235
to
a896f6b
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #2384 +/- ##
==========================================
+ Coverage 72.58% 76.85% +4.27%
==========================================
Files 310 247 -63
Lines 61875 38876 -22999
==========================================
- Hits 44911 29880 -15031
+ Misses 16964 8996 -7968
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
This change is a little bit risky and I wanted to make it in a separated commit, to see if the previous one could pass at least most of the pipeline and to make this one easier to revert
I've not got much to technically say about your changes, but I'd had a few ideas kicking around and this is as good a place as any to drop them. I noticed that you've removed a few references to Thread.Sleep. This is definitely the right thing to do. I'm pretty much certain that these shouldn't be in most Task-based code, given the impact it has on the wider thread pool. I've not run any meaningful tests on this, but I'd speculate off-the-cuff that use of the thread-based operations such as Sleep within a Task-based model could feed into the performance issues which have been reported on the async code paths - anything async-based would be slower under load if a large chunk of the threads in the thread pool are asleep or blocked... Taking this further though: there are a few code paths which also block the thread, rather than the Task. Monitor.Enter is one of them, and thus so are SyncAsyncLock and _parserLock. Any Wait on _parserLock has the potential to block the thread, as does any lock() statement (although I don't think there are any of those in SqlInternalConnectionTds.) On three more specific points:
The last point is the most important one. The managed SNI parser has the largest async performance issues, and if there are any issues/incompatibilities, I think that's probably where you'll run into them. |
@edwardneal thanks for the tips. I'll take a look into it as soon as I get some time. There are other places I used GetAwaiter that I should change, too. Maybe if I use ConfigureAwait(false) it mitigates the risk of deadlocks? I will make another branch to try around on your suggestion, but I think of three ways to achieve this, actually:
Between these three options I could imagine, I think maybe the second approach would bring the best benefits in the long run, although it would make this PR harder to be reviewed. I really like, though, the idea of making the code 100% async, and falling in a sync-over-async scenario, as the async code removes the need of locks in many situations, but I'm not sure if the risk of deadlocks can really be mitigated. |
The same changes of netfx were replicated on netcore. Also, ConfigureAwait(false) was added to lower the risk of deadlock if getAwaiter. This approach is still needed to be checked
Sounds good! The pattern I've seen has been similar to: async ValueTask<int> ActCore(int param1, bool async)
{
if (async)
return await NestedFuncAsync(param1);
else
return ValueTask<int>.FromResult(NestedFunc(param1));
}
async ValueTask<int> ActAsync(int param1)
=> ActCore(param1, true);
int Act(int param1)
=> ActCore(param1, false).Result; This provides a common codebase, while also bypassing the issues which come from sync-over-async. |
@edwardneal I have an Idea: What if I create a static helper object that uses AsyncLocal to determine wether all the Open stack call must be called synchronously? I think this way the code will be way cleaner and easier to maintain |
Keep in mind that AsyncLocal has a certain perf cost, which I'm not sure makes sense here. A usual thing to do in order to unify sync and async paths is simply to pass an "async" flag parameter down the stack instead. |
I'd agree with roji here - I've not used AsyncLocal before, but it seems like overkill in most places when an I agree with the rest of your point. Making the core code return a Task or a ValueTask is fine, and in the case where it runs synchronously you can simply return its Result property or invoke the code with async = false. The only reason I've picked ValueTask is that it's a value type, so doesn't necessarily incur a heap allocation (which would be guaranteed if it returned a Task instance) in the synchronous case. To elaborate on thread-level synchronisation primitives though, please consider a situation where the thread pool is heavily loaded.
We can't guarantee that a continuation of a Task/ValueTask will run on the same thread as its predecessor. It's possible that a ContinueWith could do the same thing, although I've picked a network write because it seems more likely to force the task to yield. Incidentally, this is also part of the reason why we can't mix It's not an immediate stoppage because it's "only" reducing thread pool capacity, but it'll develop into such if it's given time. This sort of issue might also be having an impact on the existing issues around MARS and general async performance, although I'm just speculating. It's possible that other issues are constraining the library's connection speed enough for that not to happen in real life. If it can happen, it'll probably happen in a situation where something is opening SQL Server connections rapidly to a database server with higher network latency. One such scenario could be a web service connecting to a database which is going through a disaster recovery scenario. I think the largest lock to try to unpick for this is the SyncAsyncLock which sits in SqlInternalConnectionTds, but it's used in quite a few places and I'm struggling to understand it. @Wraith2 - I'd appreciate some help with this, if possible. My first impression is that SqlClient's likely to encounter this sort of trouble with thread blocking, am I missing something which mitigates this in SyncAsyncLock or more widely? I can see that SyncAsyncLock.Wait has a I get the impression from reading the code that SyncAsyncLock was written to use thread-level locking, then had some ad-hoc changes to make sure that the existing locking methodology would carry on working with Tasks. I think its essential purpose is to ensure that TdsParser has control over the underlying connection's TDS-level data stream, with the limited exception of client-to-server Attention messages and client-driven connection closures. Am I understanding that correctly? If so, do you think it'd be viable to have something like a simple SemaphoreSlim to handle all protection of the data stream and an associated CancellationTokenSource to handle the cases of client-driven connection closure and command cancellation? |
This seems the exact root cause of issue #422 |
I understand. I've started to increment this draft following the suggestion of the async flag, then. @edwardneal all your concerns are pretty valid. With OpenAsync starting to be asynchronous, we have a whole new world of scenarios to deal with. I was hoping this change could be more simpler, but this is really a bigger issue and I have a lot to learn about this code to make it safely. I'll take my time to look at the code but I'm not sure if this draft is a good idea anymore. It's very risky, |
For modern .NET you should use ValueTask where there is a reasonable assumption that an operation will complete synchronously the majority of the time, for heuristic purposes consider 90% a decent approximation point to start with. If the code being changed is the code to fetch an existing connection from the in-memory pool then i'd say it should be written using value task. If the code being changed is going to establish a new socket connection I would expect to use Task. This information is taken from long conversations by the runtime team in api reviews about the topic. I wouldn't expect to use AsyncLocal for opening a connection. My intention was to use a bool async parameter which will probably be quite viral like async itself is. I wasn't going to start any of this sort of rewrite until the codebases are entirely merged. The difficulty is too high to implement and to review on multiple codebases. I've got a lot of things i could change if we could just get to a single codebase. |
Closing this as this will be addressed in the SqlClientx |
Attempt to fix #979
The general idea is to make the open operation async first, so it simplifies the OpenAsync flow and allows the use of the same logic for the sync Open through GetAwaiter.
I'm still figuring out how to handle and to test this project, so I'm opening it as a draft hoping anyone can validate if this is a valid approach or if the idea is not feasible because it can introduce some breaking changes I haven't noticed.