Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SqlBulkCopy - Leverage Generics to Eliminate Boxing #1048

Closed
wants to merge 7 commits into from

Conversation

cmeyertons
Copy link
Contributor

@cmeyertons cmeyertons commented Apr 26, 2021

@Wraith2 @cheenamalhotra - This is a re-creation of #358 - there were multiple merge conflicts that were difficult to resolve.

  • Leverage Generic Code Paths to eliminate boxing of value types as the value is being written to the database
  • Benchmark test that validates the boxing is eliminated (this only works in Release mode as the Debug JIT does not perform the necessary optimizations)
  • Extension Method for Generic Cast instead of a GenericConverter type -- this makes the code a bit less verbose + easier to read.

Possible breaking changes:

  • IDataReader.GetFieldType is invoked in SqlBulkCopy that was not invoked previously. Custom data reader implementations that do not properly implement this method could break.
  • IDataReader.IsDbNull is now invoked in the IDataReader code path for every value (previously it was only invoked if streaming was enabled). This is necessary to invoke the proper value method (GetInt32, etc.) and handle potential nullable value types.

TODO:

  • Migrate this code into netfx (I want to wait for all tests to pass before I spend any effort doing this)

@cheenamalhotra
Copy link
Member

Hi @cmeyertons

We appreciate your efforts.

I have implemented a custom data reader implementation in the NoBoxingValueTypes test -- I would've preferred to leverage the FastMember library to reduce the amount of code required, but that package was not available in the currently configured Nuget sources.

FastMember is not an official Microsoft package, and neither provides a unique solution that keeps us blocked. Yes it does simplify code, but that's not necessary to do the same job. We cannot justify the need to depend on this package for just code enhancements. Please continue to use the existing reflection design in tests to add your tests.

@cmeyertons
Copy link
Contributor Author

@cheenamalhotra based on #1046 -- i'm guessing that some of my test failures are expected at this point?

@Wraith2
Copy link
Contributor

Wraith2 commented Apr 26, 2021

You'll still have to check them individually and decide if they're code related of infra, apart from TVPMain, that's just flaky as hell.

@cheenamalhotra
Copy link
Member

@cmeyertons

Yes, we have some random failures which should go away once we rerun pipelines, don't bother about those ones. We're trying to fix them, but as Wraith mentioned, they're flaky..

@cmeyertons
Copy link
Contributor Author

@cheenamalhotra @Wraith2 alright tests are looking much better -- in the process of re-patching the code onto this branch, I did find the bug that was causing tests to fail.

If I could get a preliminary approval on how the code looks now (the potential breaking changes, using the extension method over the GenericConverter class), I can get started on the netfx port (I really don't want to change manage across both with a bunch of iterations)

@Wraith2
Copy link
Contributor

Wraith2 commented Apr 27, 2021

If I could get a preliminary approval on how the code looks now (the potential breaking changes, using the extension method over the GenericConverter class), I can get started on the netfx port

It's been 16 months, you've been trying to get this merged so I'd really hope that someone can find the time to review it and give constructive feedback. You've been far more patient than anyone should need to be on this.

@cheenamalhotra
Copy link
Member

cheenamalhotra commented Apr 27, 2021

@cmeyertons

IDataReader.IsDbNull is now invoked in the IDataReader code path for every value (previously it was only invoked if streaming was enabled). This is necessary to invoke the proper value method (GetInt32, etc.) and handle potential nullable value types.

As you specifically mentioned this now in your PR, I'm a little worried about the IDataReader.IsDbNull call that will be made on every value. I suspect this might have a negative impact on performance for large rows. I have previously heard reports on this call causing performance lag (e.g. #846 (comment) and also from internal teams when using 200+ columns) so doing so for every type may cause significant performance drop, specially on existing customer applications that rely on current behavior.

We of course need to write a use case to prove that, so I'll be looking more into that tomorrow. But if you can also test it out and come up with a use-case to find any significant performance drop with this PR, we may have to revisit the design a bit to make it an optional behavior. But if our test results do not suggest any performance degradation, we'd be happy to take it as is.

@Wraith2
Copy link
Contributor

Wraith2 commented Apr 27, 2021

I've investigated the perf impact of IsDBNull in the past and I found that the cost of the seek to the field was being attributed to it which made IsDBNull seem heavier than it is and GetFieldValue calls seem lighter . To check if a column value is null you have to seek to the column header which will run the parser if you haven't already touched the field and at that point IO can also occur. When you call GetFieldValue on that same column the seek should already have happened from the previous call so you usually just pick up the value in the SqlBuffer.

The fastest call is one that never happens so it would be good to get some numbers. For this PR can you demonstrate (with a BenchmarkDotNet bench if possible) that the original/current implementation is still slower than the PR branch with a reasonable column width of say 32 or 64? You'll need to give it a runtime of at least a minute ime to make sure the GC kicks in for the boxes that are being removed, without the GC the overhead of the new approach won't be properly balanced against the waste of the current one and your PR will look artificially poor, been there made that mistake.


// Only use IsDbNull when streaming is enabled and only for non-SqlDataReader
if ((_enableStreaming) && (_sqlDataReaderRowSource == null) && (rowSourceAsIDataReader.IsDBNull(sourceOrdinal)))
// previously, IsDbNull was only invoked in a non-streaming scenario with a non-SqlDataReader.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cheenamalhotra @Wraith2 starting a conversation here to keep this tied together -- I will create a benchmark around this logic that inserts a million records in a loop until i hit about the minute mark.

I will do this in a couple of other branches in my repo and will post some links to the code here -- should be able to get this done today.

Copy link
Contributor Author

@cmeyertons cmeyertons Apr 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If IsDbNull proves to be problematic, there could be an optimization around only invoking it for fields that are possibly null (e.g. if GetFieldType returns typeof(int) we don't need to check)

Edit: On second thought, the optimization poses some risk because DataTable (the return type of IDataReader.GetSchemaTable does not accept nullable types -- we should still add the OR check for the ValueMethod below however

Copy link

@mburbea mburbea Apr 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're inspecting the SchemaTable there is IsNullable field, if it's present and set to false you should be able to skip the dbnull checks.


Type t = ((IDataReader)_rowSource).GetFieldType(ordinal);

if (t == typeof(bool))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cheenamalhotra @Wraith2

Thinking about this in the context of the IsDbNull check above -- I think we need to add an OR check to the corresponding nullable type (e.g. t == typeof(bool) || t == typeof(bool?)

I think we have two routes we can go here.

  1. We add the OR check and invoke IsDbNull on each value. Pros: prevents nullable types from being boxed. Cons: have to invoke IsDbNull (per impact TBD) and requires data reader implementations to correct determine result of IsDbNull
  2. Leave this as is and we don't have to invoke IsDbNull on each value. Nullable value types would get boxed.

I would prefer option 1 if possible.

I will measure the option 1 approach as discussed and we will see how it shakes out.

Copy link
Contributor

@Wraith2 Wraith2 Apr 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe that GetFieldType can return a Nullable<bool>, at least it can't in the SqlClient implementation. We don't conflate sql and language nulls (rightly imo) so I don't think you need to change this. If a non-SqlClient data source provides a Nullable<bool> then I think boxing is probably appropriate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nullable int column should have GetFieldType of int, nullability is not part of the type in sql.

@cmeyertons
Copy link
Contributor Author

@cheenamalhotra @Wraith2 after much further investigation, it appears that migrating the code paths from object value to T value appears to have a detrimental impact on performance -- I see about a 10% increase across the board.

You can view my the benchmarking code in my repo in the benchmarks/current and benchmarks/new branches. I am bulk copying appx 40 columns

Some observations:

  • This does not appear to be solely related to IsDbNull -- i see the 10% uptick on various submethods that were genericized.
  • This might be due to passing decimal by value (128 bits instead of 64). I did attempt to change all of the signatures to ref and that did not alleviate the problem.
  • Maybe leveraging the in modifier would help -- however, this change is probably not backwards compatible.
  • Some extra boxing is forced to occur during CoerceValue if coercion is necessary -- eliminating boxing would require a generic version of Convert.ChangeType which is a framework change.
  • I did uncover a small performance change that I can submit in a subsequent PR

Inevitably, the boxing cost is much lower than I initially anticipated. While this was a good and fun personal exercise in getting into the lowers details of the SqlClient library, I'm at a deadend on how to improve performance at this point.

CURRENT

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.1500 (1909/November2018Update/19H2)
Intel Xeon E-2176M CPU 2.70GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.202
  [Host]     : .NET Core 3.1.14 (CoreCLR 4.700.21.16201, CoreFX 4.700.21.16208), X64 RyuJIT
  Job-ZNZZJB : .NET Core 3.1.14 (CoreCLR 4.700.21.16201, CoreFX 4.700.21.16208), X64 RyuJIT

IterationCount=5  LaunchCount=1  WarmupCount=0
Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
BulkCopy 17.91 s 1.270 s 0.330 s 364000.0000 2000.0000 - 2.13 GB

NEW

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.1500 (1909/November2018Update/19H2)
Intel Xeon E-2176M CPU 2.70GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.202
  [Host]     : .NET Core 3.1.14 (CoreCLR 4.700.21.16201, CoreFX 4.700.21.16208), X64 RyuJIT
  Job-UIKJLZ : .NET Core 3.1.14 (CoreCLR 4.700.21.16201, CoreFX 4.700.21.16208), X64 RyuJIT

IterationCount=5  LaunchCount=1  WarmupCount=0
Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
BulkCopy 20.01 s 2.480 s 0.384 s 178000.0000 2000.0000 - 1.04 GB

@Wraith2
Copy link
Contributor

Wraith2 commented Apr 27, 2021

The memory is a lot better which was the initial goal. The speed drop is a bit troubling. [edit] you said no hotspot, i should learn to read.

In modifier wouldn't work on netfx because it can't deal with the unknown modreq. It's also not clear that it would have a benefit here unless there's evidence that it's actually the copy that's a problem and I don't think we're running nearly fast enough for that to be the case. Is it work looking at call counts to see if there's any method result that can be cached?

@cheenamalhotra
Copy link
Member

cheenamalhotra commented Apr 28, 2021

Hi @cmeyertons

I extended your project to include more info (+10 columns +Async APIs +SqlDataReader +Threading Diagnoser).
I also used the same data to keep things comparable. Below are the results for ~50 columns:
My source changes are here: fork: benchmarks/new

CURRENT - IDataReader

Method Mean Error StdDev Completed Work Items Gen 0 Gen 1 Allocated
BulkCopy 22.82 s 12.746 s 1.972 s 2.0000 646000.0000 2000.0000 2.52 GB
BulkCopyAsync 22.48 s 1.601 s 0.416 s 250.0000 646000.0000 2000.0000 2.52 GB

NEW - IDataReader

Method Mean Error StdDev Completed Work Items Gen 0 Gen 1 Allocated
BulkCopy 1 25.86 s 13.320 s 2.061 s 4.0000 294000.0000 2000.0000 3 1.15 GB
BulkCopyAsync 1 26.16 s 1.946 s 0.505 s 2 645.0000 294000.0000 2000.0000 3 1.15 GB

1 Performance degradation is approx. 13% (sync) &19% (async).
2 Async flow consumes significantly more threads here.
3 There's no doubt allocation has improved.


CURRENT - SqlDataReader

Method Mean Error StdDev Completed Work Items Gen 0 Gen 1 Allocated
BulkCopy 28.81 s 2.297 s 0.355 s 10.0000 371000.0000 3000.0000 1.45 GB
BulkCopyAsync 65.16 s 19.408 s 5.040 s 3924955.0000 802000.0000 5000.0000 3.04 GB

NEW - SqlDataReader

Method Mean Error StdDev Completed Work Items Gen 0 Gen 1 Allocated
BulkCopy 1 34.56 s 2.044 s 0.531 s 3.0000 371000.0000 3000.0000 1.45 GB
BulkCopyAsync 1 70.56 s 2.968 s 0.459 s 3925037.0000 804000.0000 7000.0000 3.04 GB

1 Performance degraded by approx. 20% (sync) and 8% (async), even though there's no change in allocation.
Was this intended or can you improve allocations for SqlDataReader too?

@cmeyertons
Copy link
Contributor Author

cmeyertons commented Apr 28, 2021

@cheenamalhotra @Wraith2

1 Performance degraded by approx. 20% (sync) and 8% (async), even though there's no change in allocation.
Was this intended or can you improve allocations for SqlDataReader too?

Looking at the SqlDataReader internals, it looks like it holds data in a SqlBuffer object, so it might already be boxed -- i was anticipating less allocations due to invocations through GetSqlDecimal etc.

I've refactored some code paths to closer align the # of method invocations to the original -- refactored ConvertWriteValueAsync back to more similar of the original ConvertValue (changed to ConvertValueIfNeeded)

I lifted out the null checking from ConvertValueIfNeeded to eliminate the method dispatch for nulls (this is a small perf improvement that could be made to the existing codebase)

Some of my suspicions at this point are:

  • The IfNeeded pattern is adding the amount of local memory used (instead of writing over the existing value, we're passing back a new, second value) -- even if this is null most of the time, there could be extra cost there. I see jumps in both SqlBulkCopy.ConvertValueIfNeeded and SqlParameter.CoerceValueIfNeeded
    • However, this could also be due to boxing being pushed down to during type coercion.
  • I'm wondering if this pattern introduces more CPU cache misses as you are hitting different code paths for each column in the dataset, incurring more context switching -- I have no evidence of this, just a hunch.
  • There are a couple of methods that I could additionally inline for the object vs T use case -- TdsParser.WriteValueWithWait and SqlBulkCopy.DoWriteValueAsync -- these are the only extra invocations at this point that I can see. That would require some code duplication however.

I will work on getting your benchmarks into my repo later today -- @cheenamalhotra thanks for enhancing them, much appreciated it!

@cmeyertons
Copy link
Contributor Author

Hi team, closing this PR to get it off my list. I no longer am working with MSSQL and it appears this experiment hampered performance rather than helped it (most likely due to CPU caching instructions, etc).

I sincerely appreciate everyone's help throughout this attempted contribution and learned a ton!

@cmeyertons cmeyertons closed this Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⏳ Waiting for Customer Issues/PRs waiting for user response/action.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants