Perf: Static delegates #1060

Wraith2 · 2021-05-03T21:16:14Z

Now we've got c#9 i've audited all lambda expressions in the library and changed some of them over to using static delegates which allows the compiler to cache the delegate on first use.

I also adjusted some other callsites to use state parameters to reduce the size of the closure slightly or allow onFailure, or onCancellation delegates to be static.

I tried to apply a consistent style to all the lambda usage layout spacing named params etc. It feels better to me overall but let me know if you want a different style.

jinek · 2021-05-05T15:49:09Z

I think you are confusing anonymous methods and expression trees both of which can be created using lambda expressions.

Anonymous methods are compiled by c# compiler once at compilation time, while expression trees can be compiled to a delegate at runtime.
I've created small example to demonstrate how anonymous methods are compiled:

You can also investigate this using windbg tool which shows the method table information for a given type-object.

Static anonymous function has been introduced to restrict the access to instance and local variables if it is not intended so by the design.
However, one of the main benefits of anonymous delegates and local functions is the possibility to capture the context of a method so you don't have to pass it via parameters or explicitely define fields of a class to contain that context.
And this is my point here: you are increasing the code base to remove the use of the feature which has been introduced to decrease the codebase:

Wraith2 · 2021-05-05T17:08:34Z

Captures contribute to the size and number of hidden classes generated by the compiler. Passing using existing state parameter placement decreases or obviates the need for captures at all. Using static lambdas allows the compiler to cache the delegate again avoiding per-call allocations.

I do know what I'm doing here. Try running a memory profile of an async test case with and without the PR and see what difference it makes.

jinek · 2021-05-05T21:11:44Z

Captures contribute to the size and number of hidden classes generated by the compiler. Passing using existing state parameter placement decreases or obviates the need for captures at all. Using static lambdas allows the compiler to cache the delegate again avoiding per-call allocations.

If you are talking about delegate object itself then does it make sense to do optimization in this code? Because probably even the average connection string is several times bigger than a delegate and captured state together.

I do know what I'm doing here. Try running a memory profile of an async test case with and without the PR and see what difference it makes.

Could you please share the result of your experiment?
(because from your PR the only metric seen at the moment has negative impact)

Wraith2 · 2021-05-05T22:07:18Z

If you are talking about delegate object itself then does it make sense to do optimization in this code? Because probably even the average connection string is several times bigger than a delegate and captured state together.

Each connection string is allocated once and re-used. The paths through these methods allocate on each call. If you call them once you'll see a very minor improvement, if you call them thousands or millions of times that adds up considerably. It's a question of scale.

Results from running the https://github.com/aspnet/DataAccessPerformance which is an implementation of the TE Fortunes benchmark.

Test	Type	Threads	Througput	StdDev	branch
ado-sqlclient+async+16	async	16	26411	1040	perf-staticdelegates
ado-sqlclient+async+16	async	16	26165	1101	main

So a 0.9% throughput increase from some very minor changes.

These changes barely impact the size of the code in comparison to the overall size of the codebase. In general the size of the code is far less relevant than the correctness and speed. If your main concern is code size at the driver level I think your priorities are wrong.

Wraith2 · 2021-05-05T23:31:48Z

@cmeyertons there are a number of changes in bulk copy here and they may improve the performance enough to outweigh the problem in #1048 if you want to give them a try.

jinek · 2021-05-07T14:09:10Z

Each connection string is allocated once and re-used. The paths through these methods allocate on each call. If you call them once you'll see a very minor improvement, if you call them thousands or millions of times that adds up considerably. It's a question of scale.

Sure, I completely agree scaling is important as well as performance overall. It's just, even if we run simple non-query request with new connection, it consumes about 25Kb, and empty delegate is 32B . Compiler does optimization and caches the delegate if possible whether it's static or not, it not necessary creates hidden classes if there is no need to capture the stack, right?

Results from running the https://github.com/aspnet/DataAccessPerformance which is an implementation of the TE Fortunes benchmark.
Test Type Threads Througput StdDev branch
ado-sqlclient+async+16 async 16 26411 1040 perf-staticdelegates
ado-sqlclient+async+16 async 16 26165 1101 main

So a 0.9% throughput increase from some very minor changes.

These changes barely impact the size of the code in comparison to the overall size of the codebase. In general the size of the code is far less relevant than the correctness and speed. If your main concern is code size at the driver level I think your priorities are wrong.

That interesting tool uses DateTime.UtcNow to measure time intervals, Task.Wait for synchronization etc.
But there are methodologies of measurements, this topic is not new. I'd suggest you to measure with something like https://benchmarkdotnet.org/ and check if your improvements do more than the measurement error.
Also I believe, you, while being an expirienced dev, could investigate some of the allocations which you can find in mem profiler tools, for example, I would check why it allocates many bytes, may be it does some redundant IO. Or, for example, what is going on in _SqlMetadata

Wraith2 · 2021-05-07T14:27:58Z

If you believe that there is work that can be done on the entries in that profile then please go ahead, I've already made some changes in #521 and dotnet/corefx#34393 but any additional improvements you can make are welcome.

I didn't write that tool. The ASP net core team did. I was pointed to it as a reliable perf measure by one of the maintainers of this library when I started working on it back when it was System.Data.SqlClient in corefx. I use a variety of tools including profilers, benchmarks, user provided and synthetic scenario tests to identify functional and performance issues. Have a look through my PR history and you'll find all sort of things.

This code demonstrably and measurably produces a small performance increase. It may be a small increase but scale is important as we've agreed. We can go all the way down to IL and ASM but I really don't see the need to do that. Do you have any specific problem with the code change as written other than it slightly increases code complexity and size?

jinek · 2021-05-07T18:32:30Z

Do you have any specific problem with the code change as written other than it slightly increases code complexity and size?

no

DavoudEshtehari

Just to double-check; The following methods have used lambda expression. Please look at them if they're worth being static:

Net Core:
- SqlBulkCopy.WriteToServerInternalRestAsync
- SqlCommand.GetParameterEncryptionDataReaderAsync
- SqlConnection.OnError

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/SqlBulkCopy.cs

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/SqlUtil.cs

Wraith2 · 2021-06-09T01:52:04Z

SqlBulkCopy.WriteToServerInternalRestAsync : Closes over 4 parameters including this. We can't avoid the allocation and a 4 tuple is less readable than the closure. I haven't profiled SqlBulkCopy scenarios so I don't have information that makes me think that changing this is beneficial at the moment.
GetParameterEncryptionDataReaderAsync: generally the same as WriteToServerInternalRestAsync . The result is cached so it isn't per-call. I haven't started profiling always encrypted scenarios yet so I don't know if this is hot, the cache means it should not be.
OnError: only 2 parameters so it could be tupled easily and fairly clearly but the allocation still can't be avoided. We're about to throw an exception which is going to allocation the exception, it's message, error collection, error collection contents, stacktrace. Attempting to save memory at this point only hurts code clarity it does nothing to benefit the user or us.

In general the more parameters are captured by a closure the less likely it is that it is worth converting. Tuples hurt code clarity and we can deal with some of that for improved performance of users but there is a limit somewhere and more than 3 parameters is where I currently feel readability suffers more than users benefit. This can in change (and may have done since I've been working on perf) with new information like profiling results and discussions.

The changes made to convert to static delegates in this PR were not driven by profile results so they should retain clarity as much as possible while improving performance. I think I would only reduce clarity further if I had a very strong argument that it would improve performance.

Let me know what you think and whether I've gone too far in reducing clarity. There may be cases where I've converted to tuples in the past that it might be better to use closures for clarity.

DavoudEshtehari · 2021-06-09T17:54:59Z

Generally, I agree with keeping code legible; I think except the number of parameters in a tuple, it's important to avoid using it in a medium or large block of code. If a block of code is less than 10 lines, it seems reasonable to use a tuple with more than three parameters. And, definitely, it's a trade off among performance, memory allocation, and readability which should be considered. I would say go ahead with your best judgement. Let's see if anyone has other idea.

JRahnama · 2021-06-10T19:53:08Z

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/SqlBulkCopy.cs

                            );
                        }
                    }, ctoken); // We do not need to propagate exception, etc, from reconnect task, we just need to wait for it to finish.
                    return tcs.Task;
                }
                else
                {
-                    AsyncHelper.WaitForCompletion(reconnectTask, BulkCopyTimeout, () => { throw SQL.CR_ReconnectTimeout(); }, rethrowExceptions: false);
+                    AsyncHelper.WaitForCompletion(reconnectTask, BulkCopyTimeout, static () => throw SQL.CR_ReconnectTimeout(), rethrowExceptions: false);


I am not quiet sure or probably did not understand what benefit we get from changing this particular line to static? Isn't the purpose of static expression to prevent unintentional capture of local variables or instance state by the lambda?

Yes, that is the purpose. There is also a useful side effect of the way that the compiler implements it because it's a static delegate means that the delegate can be cached. This sharplab example shows that you get a hidden static class generated and that the action delegate is created the first time it is needed and re-used every time it is called after that. So we get lazy single allocation for free with a nice syntax.

public void B() { AsyncHelper.WaitForCompletion(null, 0, static () => throw SQL.CR_ReconnectTimeout(), rethrowExceptions: false); }

is lowered to:

[Serializable] [CompilerGenerated] private sealed class <>c { public static readonly <>c <>9 = new <>c(); public static Action <>9__0_0; internal void <B>b__0_0() { throw SQL.CR_ReconnectTimeout(); } } public void B() { AsyncHelper.WaitForCompletion(null, 0, <>c.<>9__0_0 ?? (<>c.<>9__0_0 = new Action(<>c.<>9.<B>b__0_0)), false); }

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/SqlBulkCopy.cs

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/SqlUtil.cs

DavoudEshtehari · 2021-06-11T17:23:29Z

/azp run CI-Ub18-Enclave-NetStd

azure-pipelines · 2021-06-11T17:23:34Z

You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list.

azure-pipelines · 2021-06-11T17:25:40Z

No pipelines are associated with this pull request.

JRahnama · 2021-06-11T17:27:33Z

@Wraith2 Can you trigger a build by making a small change in your PR and make the tests to run once more? I have tried from our side and somethings are missing...

Wraith2 · 2021-06-11T18:03:11Z

Fixed up a couple of anonymous typed lambdas.

JRahnama · 2021-06-11T20:02:28Z

@Wraith2 LGTM, I am waiting for the Enclave tests to finish and see if the failures are fixed after I cleaned up the databases. I am jumping to other PRs. Do you have any blocking PRs that needs to be done before other steps to come in?

Wraith2 · 2021-06-11T20:23:13Z

I'd like to get #934 #925 and #902 finished to unblock other work if possible.

…net#1060 from netcore to common file).

Wraith2 added 4 commits May 3, 2021 01:17

netcore static lambda rework

2b3864c

minor netcore api cleanup

5a9e997

netfx static lambda rework

e28eec4

netcore add strongbox to avoid boxing

4ffd7fd

DavoudEshtehari self-assigned this Jun 8, 2021

DavoudEshtehari reviewed Jun 9, 2021

View reviewed changes

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/SqlBulkCopy.cs Show resolved Hide resolved

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/SqlUtil.cs Show resolved Hide resolved

DavoudEshtehari approved these changes Jun 9, 2021

View reviewed changes

JRahnama reviewed Jun 10, 2021

View reviewed changes

fixup 2 anonymous typed lambdas to be explicit

c90d335

JRahnama approved these changes Jun 11, 2021

View reviewed changes

JRahnama added this to the 4.0.0-preview1 milestone Jun 11, 2021

JRahnama merged commit f17bba2 into dotnet:main Jun 11, 2021

Wraith2 deleted the perf-staticdelegates branch June 14, 2021 23:13

panoskj added a commit to panoskj/SqlClient that referenced this pull request Mar 18, 2022

Merging some methods of TdsParserStateObject (note: ports part of dot…

417a6da

…net#1060 from netcore to common file).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf: Static delegates #1060

Perf: Static delegates #1060

Wraith2 commented May 3, 2021

jinek commented May 5, 2021 •

edited

Loading

Wraith2 commented May 5, 2021

jinek commented May 5, 2021

Wraith2 commented May 5, 2021 •

edited

Loading

Wraith2 commented May 5, 2021

jinek commented May 7, 2021

Wraith2 commented May 7, 2021 •

edited

Loading

jinek commented May 7, 2021

DavoudEshtehari left a comment

Wraith2 commented Jun 9, 2021 •

edited

Loading

DavoudEshtehari commented Jun 9, 2021

JRahnama Jun 10, 2021

Wraith2 Jun 10, 2021

DavoudEshtehari commented Jun 11, 2021 •

edited

Loading

azure-pipelines bot commented Jun 11, 2021

azure-pipelines bot commented Jun 11, 2021

JRahnama commented Jun 11, 2021

Wraith2 commented Jun 11, 2021

JRahnama commented Jun 11, 2021 •

edited

Loading

Wraith2 commented Jun 11, 2021

Perf: Static delegates #1060

Perf: Static delegates #1060

Conversation

Wraith2 commented May 3, 2021

jinek commented May 5, 2021 • edited Loading

Wraith2 commented May 5, 2021

jinek commented May 5, 2021

Wraith2 commented May 5, 2021 • edited Loading

Wraith2 commented May 5, 2021

jinek commented May 7, 2021

Wraith2 commented May 7, 2021 • edited Loading

jinek commented May 7, 2021

DavoudEshtehari left a comment

Choose a reason for hiding this comment

Wraith2 commented Jun 9, 2021 • edited Loading

DavoudEshtehari commented Jun 9, 2021

JRahnama Jun 10, 2021

Choose a reason for hiding this comment

Wraith2 Jun 10, 2021

Choose a reason for hiding this comment

DavoudEshtehari commented Jun 11, 2021 • edited Loading

azure-pipelines bot commented Jun 11, 2021

azure-pipelines bot commented Jun 11, 2021

JRahnama commented Jun 11, 2021

Wraith2 commented Jun 11, 2021

JRahnama commented Jun 11, 2021 • edited Loading

Wraith2 commented Jun 11, 2021

jinek commented May 5, 2021 •

edited

Loading

Wraith2 commented May 5, 2021 •

edited

Loading

Wraith2 commented May 7, 2021 •

edited

Loading

Wraith2 commented Jun 9, 2021 •

edited

Loading

DavoudEshtehari commented Jun 11, 2021 •

edited

Loading

JRahnama commented Jun 11, 2021 •

edited

Loading