-
Notifications
You must be signed in to change notification settings - Fork 4.9k
SqlClient optimize SqlDataReader async method allocations #37254
Conversation
src/System.Data.SqlClient/src/System/Data/SqlClient/SqlDataReader.cs
Outdated
Show resolved
Hide resolved
A quick question (will attempt a more full review in the next few days)... The number of string allocations seems to have increased - is it safe to assume that this is because you're running the benchmark in constant time, and therefore we're seeing increased iterations? If so, this is a good reason to prefer a constant-iteration benchmark rather than a constant time one. In general, if you can code up the benchmark in BenchmarkDotNet, that would provide us with a very reliable snapshot of the before and after. It also provides the total amount of allocations per iteration. The one place where BDN is still a bit problematic is when benchmarking concurrency, but I don't think that's what you're trying to do at the moment. See the exchange between @stephentoub and myself here on the https://github.com/dotnet/performance repo. |
Yes, strings being the output their increase is good but the string allocation numbers while running under the profiler aren't useful for much more than the general inference you've drawn. Task instances have also increased but they're a side effect and sadly unavoidable. I've done a lot of other work using BDN but after fighting with it for several months trying to get it working with my dev project soured me on using it as the only tool for perf numbers. Great tool, but that doesn't make it the only thing that can produce useful information. Yes I could try to convert the DataAccessPerformance solution into a BDN project but why when it's numbers are representative and useful? what does it gain? |
BDN generally provides a much higher level of confidence/reliability/precision in the results it provides - for many different reasons. For example, rather than just running the code for an arbitrary amount of time (e.g 30 seconds), it runs for as many iterations as it needs to until reaching a reliable result. Its memory diagnoser provides a precise view of the memory used in each iteration (thus avoiding the issue we've seen above). It's pretty standard now for corefx/coreclr to publish BDN results for before and after as a justification for any perf change. I'm surprised that you had a difficult time with BDN - I've always found it very easy to use. I can help out by setting up a first BDN benchmark under https://github.com/dotnet/performance, which we could use to measure the impact of your PR - hopefully I'll have this done today. BTW the idea wouldn't really be to duplicate DataAccessPerformance to a BDN project. DataAccessPerformance is more of an end-to-end, concurrency-heavy scenario designed to somewhat approximate TechEmpower. BDN would be useful more for micro-benchmarks which each measure a very specific aspect of SqlClient - e.g. opening and closing a pooled connection, executing a single minimal query. This would also give us the ability to see the same benchmark results over time, as more and more optimizations are made. |
@tarikulsabbir I cannot assign you the PR, because you are not part of dotnet org - please join the org (and Microsoft) and flip your memberships to public. Thanks! |
10d78b8
to
6582aea
Compare
@Wraith2 This PR is being tracked and will be picked up for dotnet/SqlClient. Appreciate your patience for some more time while we come back to review and merge your contributions! 🙏 |
Profiling the DataAccessPerformance project which emulates the TechEmpower fortunes benchmark shows that common async operations like ReadAsync generate context objects and delegates each time they are called which are then dropped for the GC to handle.
This PR changes the implementation of a common pattern used in similar async functions. This is currently implemented using
InvokeRetryable
ContinueRetryable
andCompleteRetryable
. It took some time to work out just what these functions were doing and how data flows between them and having done so I chose to rename them to Resumable instead of Retryable because retry is commonly (though you could argue incorrectly) used in contexts where we mean that an exception has occurred and we are going to try again which is not how these functions work, any exception is immediate and not retried.When ReadAsync is called the function attempts to satisfy the call synchronously if possible. If it is not possible it generates an async closure and func and then passes those to the retry functions. This generation of closure and function is currently per-call.
I have removed all uses of the context closures by wrapping the common pattern in an abstract
Resumable<T>
class which contains common functionality. This class is then inherited and context data added to it for each of the changed async method, Because each operation now has a dedicated class static funcs can be created for the callback removing the repeated allocation of the func. The objects themselves may be reused because theSqlDataReader
only allows a single async operation to be active so I have added cached objects forIsDBNullAsync
andReadAsync
which means that only the first call to each will allocate the cached object after that they will be reused. Less commonly used async functions likeHasNextAsync
andReadBytesAsync
are not cached,GetfieldValueAsync<T>
is not cached because the range of T is unbounded.Profiles before:
After:
benchmark results are small because of the prevalence of snapshot allocations, but worth having.
so a 1.5% throughput increase.
Manual and functional test pass in native mode. DataAccessPerfomance under pure load and profilers has no problems.
/cc area owners @afsanehr, @tarikulsabbir, @Gary-Zh , @David-Engel , people interested in perf @divega @roji @saurabh500