Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

SqlClient optimize SqlDataReader async method allocations #37254

Closed
wants to merge 2 commits into from

Conversation

Wraith2
Copy link
Contributor

@Wraith2 Wraith2 commented Apr 27, 2019

Profiling the DataAccessPerformance project which emulates the TechEmpower fortunes benchmark shows that common async operations like ReadAsync generate context objects and delegates each time they are called which are then dropped for the GC to handle.

This PR changes the implementation of a common pattern used in similar async functions. This is currently implemented using InvokeRetryable ContinueRetryable and CompleteRetryable. It took some time to work out just what these functions were doing and how data flows between them and having done so I chose to rename them to Resumable instead of Retryable because retry is commonly (though you could argue incorrectly) used in contexts where we mean that an exception has occurred and we are going to try again which is not how these functions work, any exception is immediate and not retried.

When ReadAsync is called the function attempts to satisfy the call synchronously if possible. If it is not possible it generates an async closure and func and then passes those to the retry functions. This generation of closure and function is currently per-call.

I have removed all uses of the context closures by wrapping the common pattern in an abstract Resumable<T> class which contains common functionality. This class is then inherited and context data added to it for each of the changed async method, Because each operation now has a dedicated class static funcs can be created for the callback removing the repeated allocation of the func. The objects themselves may be reused because the SqlDataReader only allows a single async operation to be active so I have added cached objects for IsDBNullAsync and ReadAsync which means that only the first call to each will allocate the cached object after that they will be reused. Less commonly used async functions like HasNextAsync and ReadBytesAsync are not cached, GetfieldValueAsync<T> is not cached because the range of T is unbounded.

Profiles before:
readerasync-master

After:
readerasync-branch

benchmark results are small because of the prevalence of snapshot allocations, but worth having.

name sync threads TPS stdev description
ado-sqlclient+async+64 async 64 59190 1675 asyncread master
ado-sqlclient+async+64 async 64 60073 1650 asyncread branch

so a 1.5% throughput increase.

Manual and functional test pass in native mode. DataAccessPerfomance under pure load and profilers has no problems.
/cc area owners @afsanehr, @tarikulsabbir, @Gary-Zh , @David-Engel , people interested in perf @divega @roji @saurabh500

@roji
Copy link
Member

roji commented Apr 29, 2019

A quick question (will attempt a more full review in the next few days)... The number of string allocations seems to have increased - is it safe to assume that this is because you're running the benchmark in constant time, and therefore we're seeing increased iterations?

If so, this is a good reason to prefer a constant-iteration benchmark rather than a constant time one. In general, if you can code up the benchmark in BenchmarkDotNet, that would provide us with a very reliable snapshot of the before and after. It also provides the total amount of allocations per iteration. The one place where BDN is still a bit problematic is when benchmarking concurrency, but I don't think that's what you're trying to do at the moment. See the exchange between @stephentoub and myself here on the https://github.com/dotnet/performance repo.

@Wraith2
Copy link
Contributor Author

Wraith2 commented Apr 29, 2019

A quick question (will attempt a more full review in the next few days)... The number of string allocations seems to have increased - is it safe to assume that this is because you're running the benchmark in constant time, and therefore we're seeing increased iterations?

Yes, strings being the output their increase is good but the string allocation numbers while running under the profiler aren't useful for much more than the general inference you've drawn. Task instances have also increased but they're a side effect and sadly unavoidable.

I've done a lot of other work using BDN but after fighting with it for several months trying to get it working with my dev project soured me on using it as the only tool for perf numbers. Great tool, but that doesn't make it the only thing that can produce useful information. Yes I could try to convert the DataAccessPerformance solution into a BDN project but why when it's numbers are representative and useful? what does it gain?

@roji
Copy link
Member

roji commented Apr 30, 2019

I've done a lot of other work using BDN but after fighting with it for several months trying to get it working with my dev project soured me on using it as the only tool for perf numbers. Great tool, but that doesn't make it the only thing that can produce useful information. Yes I could try to convert the DataAccessPerformance solution into a BDN project but why when it's numbers are representative and useful? what does it gain?

BDN generally provides a much higher level of confidence/reliability/precision in the results it provides - for many different reasons. For example, rather than just running the code for an arbitrary amount of time (e.g 30 seconds), it runs for as many iterations as it needs to until reaching a reliable result. Its memory diagnoser provides a precise view of the memory used in each iteration (thus avoiding the issue we've seen above). It's pretty standard now for corefx/coreclr to publish BDN results for before and after as a justification for any perf change.

I'm surprised that you had a difficult time with BDN - I've always found it very easy to use. I can help out by setting up a first BDN benchmark under https://github.com/dotnet/performance, which we could use to measure the impact of your PR - hopefully I'll have this done today.

BTW the idea wouldn't really be to duplicate DataAccessPerformance to a BDN project. DataAccessPerformance is more of an end-to-end, concurrency-heavy scenario designed to somewhat approximate TechEmpower. BDN would be useful more for micro-benchmarks which each measure a very specific aspect of SqlClient - e.g. opening and closing a pooled connection, executing a single minimal query. This would also give us the ability to see the same benchmark results over time, as more and more optimizations are made.

@karelz
Copy link
Member

karelz commented May 4, 2019

@tarikulsabbir I cannot assign you the PR, because you are not part of dotnet org - please join the org (and Microsoft) and flip your memberships to public. Thanks!
Instructions: https://github.com/dotnet/core/blob/master/Documentation/microsoft-team.md

@AfsanehR-zz
Copy link
Contributor

@karelz updated area owners per #37475.

@karelz karelz added this to the 5.0 milestone Aug 3, 2019
@cheenamalhotra
Copy link
Member

@Wraith2 This PR is being tracked and will be picked up for dotnet/SqlClient. Appreciate your patience for some more time while we come back to review and merge your contributions! 🙏

@Wraith2 Wraith2 closed this Aug 7, 2019
@Wraith2 Wraith2 deleted the sqlperf-readerasync branch August 7, 2019 15:43
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants