-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SqlBulkCopy - Leverage Generics to Eliminate Boxing #1048
Conversation
src/Microsoft.Data.SqlClient/tests/ManualTests/SQL/SqlBulkCopyTest/NoBoxingValuesTypes.cs
Outdated
Show resolved
Hide resolved
Hi @cmeyertons We appreciate your efforts.
FastMember is not an official Microsoft package, and neither provides a unique solution that keeps us blocked. Yes it does simplify code, but that's not necessary to do the same job. We cannot justify the need to depend on this package for just code enhancements. Please continue to use the existing reflection design in tests to add your tests. |
@cheenamalhotra based on #1046 -- i'm guessing that some of my test failures are expected at this point? |
You'll still have to check them individually and decide if they're code related of infra, apart from TVPMain, that's just flaky as hell. |
Yes, we have some random failures which should go away once we rerun pipelines, don't bother about those ones. We're trying to fix them, but as Wraith mentioned, they're flaky.. |
@cheenamalhotra @Wraith2 alright tests are looking much better -- in the process of re-patching the code onto this branch, I did find the bug that was causing tests to fail. If I could get a preliminary approval on how the code looks now (the potential breaking changes, using the extension method over the GenericConverter class), I can get started on the |
It's been 16 months, you've been trying to get this merged so I'd really hope that someone can find the time to review it and give constructive feedback. You've been far more patient than anyone should need to be on this. |
As you specifically mentioned this now in your PR, I'm a little worried about the We of course need to write a use case to prove that, so I'll be looking more into that tomorrow. But if you can also test it out and come up with a use-case to find any significant performance drop with this PR, we may have to revisit the design a bit to make it an optional behavior. But if our test results do not suggest any performance degradation, we'd be happy to take it as is. |
I've investigated the perf impact of The fastest call is one that never happens so it would be good to get some numbers. For this PR can you demonstrate (with a BenchmarkDotNet bench if possible) that the original/current implementation is still slower than the PR branch with a reasonable column width of say 32 or 64? You'll need to give it a runtime of at least a minute ime to make sure the GC kicks in for the boxes that are being removed, without the GC the overhead of the new approach won't be properly balanced against the waste of the current one and your PR will look artificially poor, been there made that mistake. |
|
||
// Only use IsDbNull when streaming is enabled and only for non-SqlDataReader | ||
if ((_enableStreaming) && (_sqlDataReaderRowSource == null) && (rowSourceAsIDataReader.IsDBNull(sourceOrdinal))) | ||
// previously, IsDbNull was only invoked in a non-streaming scenario with a non-SqlDataReader. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cheenamalhotra @Wraith2 starting a conversation here to keep this tied together -- I will create a benchmark around this logic that inserts a million records in a loop until i hit about the minute mark.
I will do this in a couple of other branches in my repo and will post some links to the code here -- should be able to get this done today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If IsDbNull proves to be problematic, there could be an optimization around only invoking it for fields that are possibly null (e.g. if GetFieldType
returns typeof(int)
we don't need to check)
Edit: On second thought, the optimization poses some risk because DataTable (the return type of IDataReader.GetSchemaTable
does not accept nullable types -- we should still add the OR check for the ValueMethod
below however
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're inspecting the SchemaTable there is IsNullable field, if it's present and set to false you should be able to skip the dbnull checks.
|
||
Type t = ((IDataReader)_rowSource).GetFieldType(ordinal); | ||
|
||
if (t == typeof(bool)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this in the context of the IsDbNull
check above -- I think we need to add an OR check to the corresponding nullable type (e.g. t == typeof(bool) || t == typeof(bool?
)
I think we have two routes we can go here.
- We add the OR check and invoke
IsDbNull
on each value. Pros: prevents nullable types from being boxed. Cons: have to invoke IsDbNull (per impact TBD) and requires data reader implementations to correct determine result ofIsDbNull
- Leave this as is and we don't have to invoke
IsDbNull
on each value. Nullable value types would get boxed.
I would prefer option 1 if possible.
I will measure the option 1 approach as discussed and we will see how it shakes out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe that GetFieldType can return a Nullable<bool>
, at least it can't in the SqlClient implementation. We don't conflate sql and language nulls (rightly imo) so I don't think you need to change this. If a non-SqlClient data source provides a Nullable<bool>
then I think boxing is probably appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nullable int column should have GetFieldType of int, nullability is not part of the type in sql.
@cheenamalhotra @Wraith2 after much further investigation, it appears that migrating the code paths from You can view my the benchmarking code in my repo in the Some observations:
Inevitably, the boxing cost is much lower than I initially anticipated. While this was a good and fun personal exercise in getting into the lowers details of the SqlClient library, I'm at a deadend on how to improve performance at this point. CURRENTBenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.1500 (1909/November2018Update/19H2)
Intel Xeon E-2176M CPU 2.70GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.202
[Host] : .NET Core 3.1.14 (CoreCLR 4.700.21.16201, CoreFX 4.700.21.16208), X64 RyuJIT
Job-ZNZZJB : .NET Core 3.1.14 (CoreCLR 4.700.21.16201, CoreFX 4.700.21.16208), X64 RyuJIT
IterationCount=5 LaunchCount=1 WarmupCount=0
NEWBenchmarkDotNet=v0.12.1, OS=Windows 10.0.18363.1500 (1909/November2018Update/19H2)
Intel Xeon E-2176M CPU 2.70GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.202
[Host] : .NET Core 3.1.14 (CoreCLR 4.700.21.16201, CoreFX 4.700.21.16208), X64 RyuJIT
Job-UIKJLZ : .NET Core 3.1.14 (CoreCLR 4.700.21.16201, CoreFX 4.700.21.16208), X64 RyuJIT
IterationCount=5 LaunchCount=1 WarmupCount=0
|
The memory is a lot better which was the initial goal. The speed drop is a bit troubling. [edit] you said no hotspot, i should learn to read. In modifier wouldn't work on netfx because it can't deal with the unknown modreq. It's also not clear that it would have a benefit here unless there's evidence that it's actually the copy that's a problem and I don't think we're running nearly fast enough for that to be the case. Is it work looking at call counts to see if there's any method result that can be cached? |
Hi @cmeyertons I extended your project to include more info (+10 columns +Async APIs +SqlDataReader +Threading Diagnoser). CURRENT - IDataReader
NEW - IDataReader
1 Performance degradation is approx. 13% (sync) &19% (async). CURRENT - SqlDataReader
NEW - SqlDataReader
1 Performance degraded by approx. 20% (sync) and 8% (async), even though there's no change in allocation. |
Looking at the SqlDataReader internals, it looks like it holds data in a SqlBuffer object, so it might already be boxed -- i was anticipating less allocations due to invocations through I've refactored some code paths to closer align the # of method invocations to the original -- refactored I lifted out the null checking from Some of my suspicions at this point are:
I will work on getting your benchmarks into my repo later today -- @cheenamalhotra thanks for enhancing them, much appreciated it! |
Hi team, closing this PR to get it off my list. I no longer am working with MSSQL and it appears this experiment hampered performance rather than helped it (most likely due to CPU caching instructions, etc). I sincerely appreciate everyone's help throughout this attempted contribution and learned a ton! |
@Wraith2 @cheenamalhotra - This is a re-creation of #358 - there were multiple merge conflicts that were difficult to resolve.
Possible breaking changes:
IDataReader.GetFieldType
is invoked in SqlBulkCopy that was not invoked previously. Custom data reader implementations that do not properly implement this method could break.IDataReader.IsDbNull
is now invoked in the IDataReader code path for every value (previously it was only invoked if streaming was enabled). This is necessary to invoke the proper value method (GetInt32, etc.) and handle potential nullable value types.TODO:
netfx
(I want to wait for all tests to pass before I spend any effort doing this)