SqlBulkCopy - Leverage Generics to Eliminate Boxing #1048

cmeyertons · 2021-04-26T16:16:44Z

@Wraith2 @cheenamalhotra - This is a re-creation of #358 - there were multiple merge conflicts that were difficult to resolve.

Leverage Generic Code Paths to eliminate boxing of value types as the value is being written to the database
Benchmark test that validates the boxing is eliminated (this only works in Release mode as the Debug JIT does not perform the necessary optimizations)
Extension Method for Generic Cast instead of a GenericConverter type -- this makes the code a bit less verbose + easier to read.

Possible breaking changes:

IDataReader.GetFieldType is invoked in SqlBulkCopy that was not invoked previously. Custom data reader implementations that do not properly implement this method could break.
IDataReader.IsDbNull is now invoked in the IDataReader code path for every value (previously it was only invoked if streaming was enabled). This is necessary to invoke the proper value method (GetInt32, etc.) and handle potential nullable value types.

TODO:

Migrate this code into netfx (I want to wait for all tests to pass before I spend any effort doing this)

src/Microsoft.Data.SqlClient/tests/ManualTests/SQL/SqlBulkCopyTest/NoBoxingValuesTypes.cs

cheenamalhotra · 2021-04-26T17:50:31Z

Hi @cmeyertons

We appreciate your efforts.

I have implemented a custom data reader implementation in the NoBoxingValueTypes test -- I would've preferred to leverage the FastMember library to reduce the amount of code required, but that package was not available in the currently configured Nuget sources.

FastMember is not an official Microsoft package, and neither provides a unique solution that keeps us blocked. Yes it does simplify code, but that's not necessary to do the same job. We cannot justify the need to depend on this package for just code enhancements. Please continue to use the existing reflection design in tests to add your tests.

cmeyertons · 2021-04-26T18:58:17Z

@cheenamalhotra based on #1046 -- i'm guessing that some of my test failures are expected at this point?

Wraith2 · 2021-04-26T19:01:54Z

You'll still have to check them individually and decide if they're code related of infra, apart from TVPMain, that's just flaky as hell.

cheenamalhotra · 2021-04-26T19:04:40Z

@cmeyertons

Yes, we have some random failures which should go away once we rerun pipelines, don't bother about those ones. We're trying to fix them, but as Wraith mentioned, they're flaky..

cmeyertons · 2021-04-26T19:35:26Z

@cheenamalhotra @Wraith2 alright tests are looking much better -- in the process of re-patching the code onto this branch, I did find the bug that was causing tests to fail.

If I could get a preliminary approval on how the code looks now (the potential breaking changes, using the extension method over the GenericConverter class), I can get started on the netfx port (I really don't want to change manage across both with a bunch of iterations)

Wraith2 · 2021-04-27T00:20:08Z

If I could get a preliminary approval on how the code looks now (the potential breaking changes, using the extension method over the GenericConverter class), I can get started on the netfx port

It's been 16 months, you've been trying to get this merged so I'd really hope that someone can find the time to review it and give constructive feedback. You've been far more patient than anyone should need to be on this.

cheenamalhotra · 2021-04-27T05:08:18Z

@cmeyertons

IDataReader.IsDbNull is now invoked in the IDataReader code path for every value (previously it was only invoked if streaming was enabled). This is necessary to invoke the proper value method (GetInt32, etc.) and handle potential nullable value types.

As you specifically mentioned this now in your PR, I'm a little worried about the IDataReader.IsDbNull call that will be made on every value. I suspect this might have a negative impact on performance for large rows. I have previously heard reports on this call causing performance lag (e.g. #846 (comment) and also from internal teams when using 200+ columns) so doing so for every type may cause significant performance drop, specially on existing customer applications that rely on current behavior.

We of course need to write a use case to prove that, so I'll be looking more into that tomorrow. But if you can also test it out and come up with a use-case to find any significant performance drop with this PR, we may have to revisit the design a bit to make it an optional behavior. But if our test results do not suggest any performance degradation, we'd be happy to take it as is.

Wraith2 · 2021-04-27T07:46:00Z

I've investigated the perf impact of IsDBNull in the past and I found that the cost of the seek to the field was being attributed to it which made IsDBNull seem heavier than it is and GetFieldValue calls seem lighter . To check if a column value is null you have to seek to the column header which will run the parser if you haven't already touched the field and at that point IO can also occur. When you call GetFieldValue on that same column the seek should already have happened from the previous call so you usually just pick up the value in the SqlBuffer.

The fastest call is one that never happens so it would be good to get some numbers. For this PR can you demonstrate (with a BenchmarkDotNet bench if possible) that the original/current implementation is still slower than the PR branch with a reasonable column width of say 32 or 64? You'll need to give it a runtime of at least a minute ime to make sure the GC kicks in for the boxes that are being removed, without the GC the overhead of the new approach won't be properly balanced against the waste of the current one and your PR will look artificially poor, been there made that mistake.

cmeyertons · 2021-04-27T13:02:56Z

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/SqlBulkCopy.cs

-
-                        // Only use IsDbNull when streaming is enabled and only for non-SqlDataReader
-                        if ((_enableStreaming) && (_sqlDataReaderRowSource == null) && (rowSourceAsIDataReader.IsDBNull(sourceOrdinal)))
+                        // previously, IsDbNull was only invoked in a non-streaming scenario with a non-SqlDataReader.


@cheenamalhotra @Wraith2 starting a conversation here to keep this tied together -- I will create a benchmark around this logic that inserts a million records in a loop until i hit about the minute mark.

I will do this in a couple of other branches in my repo and will post some links to the code here -- should be able to get this done today.

~~If IsDbNull proves to be problematic, there could be an optimization around only invoking it for fields that are possibly null (e.g. if GetFieldType returns typeof(int) we don't need to check)~~

Edit: On second thought, the optimization poses some risk because DataTable (the return type of IDataReader.GetSchemaTable does not accept nullable types -- we should still add the OR check for the ValueMethod below however

If you're inspecting the SchemaTable there is IsNullable field, if it's present and set to false you should be able to skip the dbnull checks.

cmeyertons · 2021-04-27T13:07:11Z

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/SqlBulkCopy.cs

+
+                Type t = ((IDataReader)_rowSource).GetFieldType(ordinal);
+
+                if (t == typeof(bool))


@cheenamalhotra @Wraith2

Thinking about this in the context of the IsDbNull check above -- I think we need to add an OR check to the corresponding nullable type (e.g. t == typeof(bool) || t == typeof(bool?)

I think we have two routes we can go here.

We add the OR check and invoke IsDbNull on each value. Pros: prevents nullable types from being boxed. Cons: have to invoke IsDbNull (per impact TBD) and requires data reader implementations to correct determine result of IsDbNull

Leave this as is and we don't have to invoke IsDbNull on each value. Nullable value types would get boxed.

I would prefer option 1 if possible.

I will measure the option 1 approach as discussed and we will see how it shakes out.

I don't believe that GetFieldType can return a Nullable<bool>, at least it can't in the SqlClient implementation. We don't conflate sql and language nulls (rightly imo) so I don't think you need to change this. If a non-SqlClient data source provides a Nullable<bool> then I think boxing is probably appropriate.

A nullable int column should have GetFieldType of int, nullability is not part of the type in sql.

cmeyertons · 2021-04-27T18:37:00Z

@cheenamalhotra @Wraith2 after much further investigation, it appears that migrating the code paths from object value to T value appears to have a detrimental impact on performance -- I see about a 10% increase across the board.

You can view my the benchmarking code in my repo in the benchmarks/current and benchmarks/new branches. I am bulk copying appx 40 columns

Some observations:

This does not appear to be solely related to IsDbNull -- i see the 10% uptick on various submethods that were genericized.
This might be due to passing decimal by value (128 bits instead of 64). I did attempt to change all of the signatures to ref and that did not alleviate the problem.
Maybe leveraging the in modifier would help -- however, this change is probably not backwards compatible.
Some extra boxing is forced to occur during CoerceValue if coercion is necessary -- eliminating boxing would require a generic version of Convert.ChangeType which is a framework change.
I did uncover a small performance change that I can submit in a subsequent PR

Inevitably, the boxing cost is much lower than I initially anticipated. While this was a good and fun personal exercise in getting into the lowers details of the SqlClient library, I'm at a deadend on how to improve performance at this point.

CURRENT

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.1500 (1909/November2018Update/19H2)
Intel Xeon E-2176M CPU 2.70GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.202
  [Host]     : .NET Core 3.1.14 (CoreCLR 4.700.21.16201, CoreFX 4.700.21.16208), X64 RyuJIT
  Job-ZNZZJB : .NET Core 3.1.14 (CoreCLR 4.700.21.16201, CoreFX 4.700.21.16208), X64 RyuJIT

IterationCount=5  LaunchCount=1  WarmupCount=0

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
BulkCopy	17.91 s	1.270 s	0.330 s	364000.0000	2000.0000	-	2.13 GB

NEW

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.1500 (1909/November2018Update/19H2)
Intel Xeon E-2176M CPU 2.70GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.202
  [Host]     : .NET Core 3.1.14 (CoreCLR 4.700.21.16201, CoreFX 4.700.21.16208), X64 RyuJIT
  Job-UIKJLZ : .NET Core 3.1.14 (CoreCLR 4.700.21.16201, CoreFX 4.700.21.16208), X64 RyuJIT

IterationCount=5  LaunchCount=1  WarmupCount=0

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
BulkCopy	20.01 s	2.480 s	0.384 s	178000.0000	2000.0000	-	1.04 GB

Wraith2 · 2021-04-27T19:42:49Z

The memory is a lot better which was the initial goal. The speed drop is a bit troubling. [edit] you said no hotspot, i should learn to read.

In modifier wouldn't work on netfx because it can't deal with the unknown modreq. It's also not clear that it would have a benefit here unless there's evidence that it's actually the copy that's a problem and I don't think we're running nearly fast enough for that to be the case. Is it work looking at call counts to see if there's any method result that can be cached?

cheenamalhotra · 2021-04-28T07:15:32Z

Hi @cmeyertons

I extended your project to include more info (+10 columns +Async APIs +SqlDataReader +Threading Diagnoser).
I also used the same data to keep things comparable. Below are the results for ~50 columns:
My source changes are here: fork: benchmarks/new

CURRENT - IDataReader

Method	Mean	Error	StdDev	Completed Work Items	Gen 0	Gen 1	Allocated
BulkCopy	22.82 s	12.746 s	1.972 s	2.0000	646000.0000	2000.0000	2.52 GB
BulkCopyAsync	22.48 s	1.601 s	0.416 s	250.0000	646000.0000	2000.0000	2.52 GB

NEW - IDataReader

Method	Mean	Error	StdDev	Completed Work Items	Gen 0	Gen 1	Allocated
BulkCopy	¹ 25.86 s	13.320 s	2.061 s	4.0000	294000.0000	2000.0000	³ 1.15 GB
BulkCopyAsync	¹ 26.16 s	1.946 s	0.505 s	² 645.0000	294000.0000	2000.0000	³ 1.15 GB

¹ Performance degradation is approx. 13% (sync) &19% (async).
² Async flow consumes significantly more threads here.
³ There's no doubt allocation has improved.

CURRENT - SqlDataReader

Method	Mean	Error	StdDev	Completed Work Items	Gen 0	Gen 1	Allocated
BulkCopy	28.81 s	2.297 s	0.355 s	10.0000	371000.0000	3000.0000	1.45 GB
BulkCopyAsync	65.16 s	19.408 s	5.040 s	3924955.0000	802000.0000	5000.0000	3.04 GB

NEW - SqlDataReader

Method	Mean	Error	StdDev	Completed Work Items	Gen 0	Gen 1	Allocated
BulkCopy	¹ 34.56 s	2.044 s	0.531 s	3.0000	371000.0000	3000.0000	1.45 GB
BulkCopyAsync	¹ 70.56 s	2.968 s	0.459 s	3925037.0000	804000.0000	7000.0000	3.04 GB

¹ Performance degraded by approx. 20% (sync) and 8% (async), even though there's no change in allocation.
Was this intended or can you improve allocations for SqlDataReader too?

cmeyertons · 2021-04-28T13:59:49Z

@cheenamalhotra @Wraith2

1 Performance degraded by approx. 20% (sync) and 8% (async), even though there's no change in allocation.
Was this intended or can you improve allocations for SqlDataReader too?

Looking at the SqlDataReader internals, it looks like it holds data in a SqlBuffer object, so it might already be boxed -- i was anticipating less allocations due to invocations through GetSqlDecimal etc.

I've refactored some code paths to closer align the # of method invocations to the original -- refactored ConvertWriteValueAsync back to more similar of the original ConvertValue (changed to ConvertValueIfNeeded)

I lifted out the null checking from ConvertValueIfNeeded to eliminate the method dispatch for nulls (this is a small perf improvement that could be made to the existing codebase)

Some of my suspicions at this point are:

The IfNeeded pattern is adding the amount of local memory used (instead of writing over the existing value, we're passing back a new, second value) -- even if this is null most of the time, there could be extra cost there. I see jumps in both SqlBulkCopy.ConvertValueIfNeeded and SqlParameter.CoerceValueIfNeeded
- However, this could also be due to boxing being pushed down to during type coercion.
I'm wondering if this pattern introduces more CPU cache misses as you are hitting different code paths for each column in the dataset, incurring more context switching -- I have no evidence of this, just a hunch.
There are a couple of methods that I could additionally inline for the object vs T use case -- TdsParser.WriteValueWithWait and SqlBulkCopy.DoWriteValueAsync -- these are the only extra invocations at this point that I can see. That would require some code duplication however.

I will work on getting your benchmarks into my repo later today -- @cheenamalhotra thanks for enhancing them, much appreciated it!

cmeyertons · 2024-08-08T17:57:41Z

Hi team, closing this PR to get it off my list. I no longer am working with MSSQL and it appears this experiment hampered performance rather than helped it (most likely due to CPU caching instructions, etc).

I sincerely appreciate everyone's help throughout this attempted contribution and learned a ton!

Carl Meyertons added 2 commits April 26, 2021 10:38

SqlBulkCopy - Leverage Generics to Eliminate Boxing

86c4a9b

tests Dispose pattern

667ffb1

cheenamalhotra reviewed Apr 26, 2021

View reviewed changes

src/Microsoft.Data.SqlClient/tests/ManualTests/SQL/SqlBulkCopyTest/NoBoxingValuesTypes.cs Outdated Show resolved Hide resolved

removing test

b3ed9a0

cheenamalhotra requested review from JRahnama, johnnypham and DavoudEshtehari April 26, 2021 17:56

removing test file

ee187fe

cmeyertons commented Apr 27, 2021

View reviewed changes

Carl Meyertons added 2 commits April 27, 2021 08:19

Merge remote-tracking branch 'upstream/main' into main

6a905c3

don't invoke GetType

bd390a3

method consolidation

f32394a

cheenamalhotra mentioned this pull request Apr 28, 2021

SqlBulkCopy - generics to avoid boxing #358

Closed

Wraith2 mentioned this pull request May 5, 2021

Perf: Static delegates #1060

Merged

cheenamalhotra added the ⏳ Waiting for Customer Issues/PRs waiting for user response/action. label Jun 14, 2021

benrr101 mentioned this pull request Mar 27, 2024

SqlBulkCopy - ReadWriteColumnValueAsync creates lot of garbage #353

Open

cmeyertons closed this Aug 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SqlBulkCopy - Leverage Generics to Eliminate Boxing #1048

SqlBulkCopy - Leverage Generics to Eliminate Boxing #1048

cmeyertons commented Apr 26, 2021 •

edited

Loading

cheenamalhotra commented Apr 26, 2021

cmeyertons commented Apr 26, 2021

Wraith2 commented Apr 26, 2021

cheenamalhotra commented Apr 26, 2021

cmeyertons commented Apr 26, 2021

Wraith2 commented Apr 27, 2021

cheenamalhotra commented Apr 27, 2021 •

edited

Loading

Wraith2 commented Apr 27, 2021

cmeyertons Apr 27, 2021

cmeyertons Apr 27, 2021 •

edited

Loading

mburbea Apr 29, 2021 •

edited

Loading

cmeyertons Apr 27, 2021

Wraith2 Apr 27, 2021 •

edited

Loading

Wraith2 Apr 27, 2021

cmeyertons commented Apr 27, 2021

Wraith2 commented Apr 27, 2021 •

edited

Loading

cheenamalhotra commented Apr 28, 2021 •

edited

Loading

cmeyertons commented Apr 28, 2021 •

edited

Loading

cmeyertons commented Aug 8, 2024


		Type t = ((IDataReader)_rowSource).GetFieldType(ordinal);

		if (t == typeof(bool))

SqlBulkCopy - Leverage Generics to Eliminate Boxing #1048

SqlBulkCopy - Leverage Generics to Eliminate Boxing #1048

Conversation

cmeyertons commented Apr 26, 2021 • edited Loading

cheenamalhotra commented Apr 26, 2021

cmeyertons commented Apr 26, 2021

Wraith2 commented Apr 26, 2021

cheenamalhotra commented Apr 26, 2021

cmeyertons commented Apr 26, 2021

Wraith2 commented Apr 27, 2021

cheenamalhotra commented Apr 27, 2021 • edited Loading

Wraith2 commented Apr 27, 2021

cmeyertons Apr 27, 2021

Choose a reason for hiding this comment

cmeyertons Apr 27, 2021 • edited Loading

Choose a reason for hiding this comment

mburbea Apr 29, 2021 • edited Loading

Choose a reason for hiding this comment

cmeyertons Apr 27, 2021

Choose a reason for hiding this comment

Wraith2 Apr 27, 2021 • edited Loading

Choose a reason for hiding this comment

Wraith2 Apr 27, 2021

Choose a reason for hiding this comment

cmeyertons commented Apr 27, 2021

CURRENT

NEW

Wraith2 commented Apr 27, 2021 • edited Loading

cheenamalhotra commented Apr 28, 2021 • edited Loading

CURRENT - IDataReader

NEW - IDataReader

CURRENT - SqlDataReader

NEW - SqlDataReader

cmeyertons commented Apr 28, 2021 • edited Loading

cmeyertons commented Aug 8, 2024

cmeyertons commented Apr 26, 2021 •

edited

Loading

cheenamalhotra commented Apr 27, 2021 •

edited

Loading

cmeyertons Apr 27, 2021 •

edited

Loading

mburbea Apr 29, 2021 •

edited

Loading

Wraith2 Apr 27, 2021 •

edited

Loading

Wraith2 commented Apr 27, 2021 •

edited

Loading

cheenamalhotra commented Apr 28, 2021 •

edited

Loading

cmeyertons commented Apr 28, 2021 •

edited

Loading