SqlClient optimize SqlDataReader async method allocations #37254

Wraith2 · 2019-04-27T22:34:11Z

Profiling the DataAccessPerformance project which emulates the TechEmpower fortunes benchmark shows that common async operations like ReadAsync generate context objects and delegates each time they are called which are then dropped for the GC to handle.

This PR changes the implementation of a common pattern used in similar async functions. This is currently implemented using InvokeRetryable ContinueRetryable and CompleteRetryable. It took some time to work out just what these functions were doing and how data flows between them and having done so I chose to rename them to Resumable instead of Retryable because retry is commonly (though you could argue incorrectly) used in contexts where we mean that an exception has occurred and we are going to try again which is not how these functions work, any exception is immediate and not retried.

When ReadAsync is called the function attempts to satisfy the call synchronously if possible. If it is not possible it generates an async closure and func and then passes those to the retry functions. This generation of closure and function is currently per-call.

I have removed all uses of the context closures by wrapping the common pattern in an abstract Resumable<T> class which contains common functionality. This class is then inherited and context data added to it for each of the changed async method, Because each operation now has a dedicated class static funcs can be created for the callback removing the repeated allocation of the func. The objects themselves may be reused because the SqlDataReader only allows a single async operation to be active so I have added cached objects for IsDBNullAsync and ReadAsync which means that only the first call to each will allocate the cached object after that they will be reused. Less commonly used async functions like HasNextAsync and ReadBytesAsync are not cached, GetfieldValueAsync<T> is not cached because the range of T is unbounded.

Profiles before:

After:

benchmark results are small because of the prevalence of snapshot allocations, but worth having.

name	sync	threads	TPS	stdev	description
ado-sqlclient+async+64	async	64	59190	1675	asyncread master
ado-sqlclient+async+64	async	64	60073	1650	asyncread branch

so a 1.5% throughput increase.

Manual and functional test pass in native mode. DataAccessPerfomance under pure load and profilers has no problems.
/cc area owners @afsanehr, @tarikulsabbir, @Gary-Zh , @David-Engel , people interested in perf @divega @roji @saurabh500

src/System.Data.SqlClient/src/System/Data/SqlClient/SqlDataReader.cs

roji · 2019-04-29T19:20:42Z

A quick question (will attempt a more full review in the next few days)... The number of string allocations seems to have increased - is it safe to assume that this is because you're running the benchmark in constant time, and therefore we're seeing increased iterations?

If so, this is a good reason to prefer a constant-iteration benchmark rather than a constant time one. In general, if you can code up the benchmark in BenchmarkDotNet, that would provide us with a very reliable snapshot of the before and after. It also provides the total amount of allocations per iteration. The one place where BDN is still a bit problematic is when benchmarking concurrency, but I don't think that's what you're trying to do at the moment. See the exchange between @stephentoub and myself here on the https://github.com/dotnet/performance repo.

Wraith2 · 2019-04-29T19:45:55Z

A quick question (will attempt a more full review in the next few days)... The number of string allocations seems to have increased - is it safe to assume that this is because you're running the benchmark in constant time, and therefore we're seeing increased iterations?

Yes, strings being the output their increase is good but the string allocation numbers while running under the profiler aren't useful for much more than the general inference you've drawn. Task instances have also increased but they're a side effect and sadly unavoidable.

I've done a lot of other work using BDN but after fighting with it for several months trying to get it working with my dev project soured me on using it as the only tool for perf numbers. Great tool, but that doesn't make it the only thing that can produce useful information. Yes I could try to convert the DataAccessPerformance solution into a BDN project but why when it's numbers are representative and useful? what does it gain?

roji · 2019-04-30T10:16:20Z

I've done a lot of other work using BDN but after fighting with it for several months trying to get it working with my dev project soured me on using it as the only tool for perf numbers. Great tool, but that doesn't make it the only thing that can produce useful information. Yes I could try to convert the DataAccessPerformance solution into a BDN project but why when it's numbers are representative and useful? what does it gain?

BDN generally provides a much higher level of confidence/reliability/precision in the results it provides - for many different reasons. For example, rather than just running the code for an arbitrary amount of time (e.g 30 seconds), it runs for as many iterations as it needs to until reaching a reliable result. Its memory diagnoser provides a precise view of the memory used in each iteration (thus avoiding the issue we've seen above). It's pretty standard now for corefx/coreclr to publish BDN results for before and after as a justification for any perf change.

I'm surprised that you had a difficult time with BDN - I've always found it very easy to use. I can help out by setting up a first BDN benchmark under https://github.com/dotnet/performance, which we could use to measure the impact of your PR - hopefully I'll have this done today.

BTW the idea wouldn't really be to duplicate DataAccessPerformance to a BDN project. DataAccessPerformance is more of an end-to-end, concurrency-heavy scenario designed to somewhat approximate TechEmpower. BDN would be useful more for micro-benchmarks which each measure a very specific aspect of SqlClient - e.g. opening and closing a pooled connection, executing a single minimal query. This would also give us the ability to see the same benchmark results over time, as more and more optimizations are made.

karelz · 2019-05-04T01:23:12Z

@tarikulsabbir I cannot assign you the PR, because you are not part of dotnet org - please join the org (and Microsoft) and flip your memberships to public. Thanks!
Instructions: https://github.com/dotnet/core/blob/master/Documentation/microsoft-team.md

AfsanehR-zz · 2019-05-06T18:39:57Z

@karelz updated area owners per #37475.

cheenamalhotra · 2019-08-06T21:24:06Z

@Wraith2 This PR is being tracked and will be picked up for dotnet/SqlClient. Appreciate your patience for some more time while we come back to review and merge your contributions! 🙏

benaadams reviewed Apr 27, 2019

View reviewed changes

src/System.Data.SqlClient/src/System/Data/SqlClient/SqlDataReader.cs Outdated Show resolved Hide resolved

karelz added the area-System.Data.SqlClient label May 4, 2019

karelz assigned Wraith2, AfsanehR-zz and Gary-Zh May 4, 2019

Wraith2 added 2 commits June 13, 2019 00:20

rework SqlDataReader retryable to resumable

4b52683

remove lazy delegates

6582aea

Wraith2 force-pushed the sqlperf-readerasync branch from 10d78b8 to 6582aea Compare June 12, 2019 23:20

karelz added this to the 5.0 milestone Aug 3, 2019

Wraith2 closed this Aug 7, 2019

Wraith2 deleted the sqlperf-readerasync branch August 7, 2019 15:43

Wraith2 mentioned this pull request Nov 24, 2019

perf: Optimize async method allocations dotnet/SqlClient#328

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SqlClient optimize SqlDataReader async method allocations #37254

SqlClient optimize SqlDataReader async method allocations #37254

Wraith2 commented Apr 27, 2019 •

edited

Loading

roji commented Apr 29, 2019

Wraith2 commented Apr 29, 2019

roji commented Apr 30, 2019

karelz commented May 4, 2019

AfsanehR-zz commented May 6, 2019

cheenamalhotra commented Aug 6, 2019

SqlClient optimize SqlDataReader async method allocations #37254

SqlClient optimize SqlDataReader async method allocations #37254

Conversation

Wraith2 commented Apr 27, 2019 • edited Loading

roji commented Apr 29, 2019

Wraith2 commented Apr 29, 2019

roji commented Apr 30, 2019

karelz commented May 4, 2019

AfsanehR-zz commented May 6, 2019

cheenamalhotra commented Aug 6, 2019

Wraith2 commented Apr 27, 2019 •

edited

Loading