Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow Parameter Usage on MultipleRowsCopy/MultipleRowsCopyAsync #2975

Merged

Conversation

to11mtm
Copy link
Contributor

@to11mtm to11mtm commented Apr 29, 2021

Allow Parameter usage on MultipleRowsCopy/MultipleRowsCopyAsync via BulkCopyOptions. Fix #2144

Comment on lines 354 to 372
helper.LastRowParameterIndex = helper.ParameterIndex;
helper.LastRowStringIndex = helper.StringBuilder.Length;
addFunction(helper, item!, from);

if (helper.CurrentCount >= helper.BatchSize || helper.Parameters.Count > maxParameters || helper.StringBuilder.Length > maxSqlLength)
var needRemove = helper.Parameters.Count > maxParameters ||
helper.StringBuilder.Length > maxSqlLength;
if (helper.CurrentCount >= helper.BatchSize || needRemove)
{
if (needRemove)
{
helper.Parameters.RemoveRange(helper.LastRowParameterIndex, helper.ParameterIndex-helper.LastRowParameterIndex);
helper.StringBuilder.Length = helper.LastRowStringIndex;
}
finishFunction(helper);
if (!helper.Execute())
return helper.RowsCopied;
if (needRemove)
{
addFunction(helper, item!, from);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I think this could -possibly- be simplified a little bit. I put LastRowParameterIndex and LastRowStringIndex into MultipleRowsHelper since .Execute() is what is clearing out the other bits we are comparing to. I'm not certain whether that is considered better or worse than just tracking those values locally in these functions.

other option to clean up would be if(CurrentCount > rowcount )->else if (needRemoveCondition).

…ase where a single row takes us over a limit; we should try anyway. Made MaxParameters and MaxSqlLength virtual properties so we can set it per-provider bulk copy. Set for Postgres, oracle, sqlite, sqlserver
@to11mtm to11mtm force-pushed the Allow-Parameter-Use-On-Bulk-Copy-Part-1 branch from 49fe3c5 to 9caf4b9 Compare April 29, 2021 19:48
Copy link
Contributor Author

@to11mtm to11mtm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some more comments since I have questions before I do much more.

Hoping for feedback before I go too far down a rabbit hole:

Should we limit Parameters to 8000 even if provider supports more (for LOH concerns)?

Also, Whether it's OK to move Implementations of MultipleRowsHelper/(Async) methods in BasicBulkCopy into Instance of MultipleRowsHelper, Open to better names than Bind/BindAsync. I tried doing another branch where we cached parameter arrays, and the code got pretty ugly and a bit hard to understand when kept in BasicBulkCopy.

private readonly OracleDataProvider _provider;

private const int _maxParameters = 32766;
private const int _maxSqlLength = 327670;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really, Oracle allows much longer SQL length; I'm not sure what number really 'makes sense' for batching however, huge strings probably aren't great for a number of reasons. These consts are here because this class has static functions so they need a const or static passed in.

@@ -8,6 +8,9 @@ namespace LinqToDB.DataProvider.SQLite

class SQLiteBulkCopy : BasicBulkCopy
{
protected override int MaxParameters => 999;
protected override int MaxSqlLength => 1000000;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went higher on this number since SQLite is going to be parsed locally and we won't have a network hop.

Source/LinqToDB/DataProvider/BasicBulkCopy.cs Show resolved Hide resolved
@@ -15,6 +15,9 @@ namespace LinqToDB.DataProvider

public class BasicBulkCopy
{
protected virtual int MaxParameters => 999;
protected virtual int MaxSqlLength => 100000;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting this as a virtual -feels- better, but I still ask whether we should let Max Length be tunable by the user or not (For Network round-trip purposes). I decided not, since MaxBatchSize is good enough to do the same most likely.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

COuld be usful. In baselines I see increase in number of roundtrips e.g. for DB2

int maxParameters = 10000,
int maxSqlLength = 100000)
int maxParameters,
int maxSqlLength)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be worth moving these, even with a simple Redirection for now, into a 'Bind' like method on MultipleRowsHelper. Over 15 helper. in a method feels like a smell to me.

@@ -12,7 +12,9 @@ namespace LinqToDB.DataProvider.SqlServer

class SqlServerBulkCopy : BasicBulkCopy
{
private readonly SqlServerDataProvider _provider;
protected override int MaxParameters => 2099;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDK if this is the value we -really- want to use, because of dotnet/SqlClient#974. Large number of parameters is actually counter-productive for performance.

if (helper.CurrentCount >= helper.BatchSize || helper.Parameters.Count > maxParameters || helper.StringBuilder.Length > maxSqlLength)
var needRemove = helper.Parameters.Count > maxParameters ||
helper.StringBuilder.Length > maxSqlLength;
var isSingle = helper.CurrentCount == 1;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this check because there may be an edge case where even a single row was putting us over Max SQL limit.

Since we are now trying to be more correct here, we now 'back off' the last row when we go over maxSqlLength, rather than just continuing and hoping for the best. That said, we should still try to insert in case of Single row, and let provider fail if it really is an issue.

Comment on lines +365 to +370
if (needRemove && !isSingle)
{
helper.Parameters.RemoveRange(helper.LastRowParameterIndex, helper.ParameterIndex-helper.LastRowParameterIndex);
helper.StringBuilder.Length = helper.LastRowStringIndex;
helper.RowsCopied.RowsCopied--;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do this before finishFunction, in case finishFunction decides to add a parameter in future.

@to11mtm
Copy link
Contributor Author

to11mtm commented May 3, 2021

@viceroypenguin If you get a chance can you beat me up on this one please?

@viceroypenguin
Copy link
Contributor

First thing I would suggest is move the maxParameters and maxSqlText from MultipleRowsCopyHelperAsync() method to MultipleRowsHelper constructor. Then you can use maxParameters to reduce MaxBatchSize as necessary (BatchSize = Math.Min(Math.Max(10, Options.MaxBatchSize ?? 1000), maxParameters / Columns.Count); notice the -1). This will remove the need to check for max parameters, since the batch size is implicitly less than max parameters.

The final concern will be the text length. For that, your current solution will work, though I think you can track LastRowXxx as local variables rather than properties on the Helper object? There's no way to move that boiler-plate that's copied three times into a common method or pattern is there? Seems like just enough code to screw up editing later and get the three out of sync w/ each other.

@to11mtm
Copy link
Contributor Author

to11mtm commented May 8, 2021

First thing I would suggest is move the maxParameters and maxSqlText from MultipleRowsCopyHelperAsync() method to MultipleRowsHelper constructor. Then you can use maxParameters to reduce MaxBatchSize as necessary (BatchSize = Math.Min(Math.Max(10, Options.MaxBatchSize ?? 1000), maxParameters / Columns.Count); notice the -1). This will remove the need to check for max parameters, since the batch size is implicitly less than max parameters.

Thank you for the feedback! I'll give that a try. I was trying not to be too aggressive in this PR since next round (i.e. another PR) I'd like to optimize the build stage so that we don't re-build the string/parameter lists. I think to do it well we'll want to re-do the loop.

The final concern will be the text length. For that, your current solution will work, though I think you can track LastRowXxx as local variables rather than properties on the Helper object? There's no way to move that boiler-plate that's copied three times into a common method or pattern is there? Seems like just enough code to screw up editing later and get the three out of sync w/ each other.

We might be able to pull out at least some of the logic, challenge is each one is just different enough where more indirection feels like as much cognitive load as the duped code.

But, I think overall the loop structure can be improved at least; Rather than Adding the item, and then checking length to decide whether to back off, we can check length at the start of the iteration and decide what to do. I think it will make the code a little cleaner, and will probably be cleaner as it will have less jumps.

@viceroypenguin
Copy link
Contributor

Definitely agree with re-working to check the length before adding. That would simplify all three back to reasonable levels again and don't need to abstract the three versions of the loop into one.

Personally, I'd say go ahead and do it, rather than trying to piece-meal in two separate PRs, but that's a personal opinion. On the other hand, if you're going to redo it in a second PR, then what you've got works well enough for now I think.

@to11mtm
Copy link
Contributor Author

to11mtm commented May 8, 2021

Definitely agree with re-working to check the length before adding. That would simplify all three back to reasonable levels again and don't need to abstract the three versions of the loop into one.

Personally, I'd say go ahead and do it, rather than trying to piece-meal in two separate PRs, but that's a personal opinion. On the other hand, if you're going to redo it in a second PR, then what you've got works well enough for now I think.

If I wasn't about to deal with a bunch of release stuff at work that's gonna have me busy for a while I'd agree. :) But I think second pr is going to be a bigger refactoring of things, I'm still thinking of doing #2960 and that will mean a decent amount of change to this again most likely.

Question though, based on your other feedback; Should I put the MaxParameters/MaxSqlText into SqlProviderFlags? I think that may be cleaner overall, but it still raises the question of whether we want to have 'Real' MaxParameters or ones that make sense for working in .NET

@to11mtm
Copy link
Contributor Author

to11mtm commented May 18, 2021

I looked into setting MaxBatchSize based on param usage, it unfortunately gets a little hacky to work-around the behavior of Oracle's Array-parameter copy. Might be better to get what is here into 3.4.0 for now, so I am marking ready to see if there is other feedback or if we must make more changes first.

@to11mtm to11mtm marked this pull request as ready for review May 18, 2021 01:33
@MaceWindu MaceWindu added this to the 3.4.0 milestone May 22, 2021
@MaceWindu
Copy link
Contributor

/azp run test-all

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

@MaceWindu MaceWindu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide sources for limits for various providers in comments? I see some of them specified but some not

@@ -88,5 +88,10 @@ public BulkCopyOptions(BulkCopyOptions options)
/// This callback will not be used if <see cref="NotifyAfter"/> set to 0.
/// </summary>
public Action<BulkCopyRowsCopied>? RowsCopiedCallback { get; set; }

/// <summary>
/// Gets or sets whether to Always use Parameters for MultipleRowsCopy.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense to add default value information and behavior in this mode like splitting operation into batches if parameters limit reached for operation.

Source/LinqToDB/DataProvider/BasicBulkCopy.cs Show resolved Hide resolved
@@ -276,5 +276,22 @@ public void ReuseOptionTest([DataSources(false, ProviderName.DB2)] string contex
db.Child. BulkCopy(options, new[] { new Child { ParentID = 111001 } });
}
}

[Test]
public void UseParametersTest([DataSources(false, ProviderName.DB2)] string context)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's wrong with DB2?

@MaceWindu
Copy link
Contributor

@to11mtm, we plan to release 3.4 next thursday, it would be nice if we can include this one.

@MaceWindu MaceWindu requested a review from sdanyliv May 29, 2021 14:29
@MaceWindu
Copy link
Contributor

/azp run test-all

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@to11mtm
Copy link
Contributor Author

to11mtm commented May 29, 2021

Could you provide sources for limits for various providers in comments? I see some of them specified but some not

I'll try to find my notes again. Most came from https://www.jooq.org/doc/3.12/manual/sql-building/dsl-context/custom-settings/settings-inline-threshold/ but for the strings I had to look in some other places.

@linq2dbot
Copy link

Test baselines changed by this PR. Don't forget to merge/close baselines PR after this pr merged/closed.

@to11mtm
Copy link
Contributor Author

to11mtm commented May 31, 2021

Firebird doesn't use type information from parameter, so you need to add it using CAST to first select of UNION

Yeah I found FirebirdSqlOptimizer.WrapParameters and cried for a few minutes.

I'm giving it a try with just casts on all rows. Ironically, after our other optimizations to SQL building (i.e. #2774) It appears that at least for SQL Server and maybe postgres, Parameters are slower than our string literals at this point. my guess/assumption is that the cost of marshaling the parameters is eating away any benefits, and because we aren't treating this as a prepared statement there's no gains to be had there either. My hope is that when myself or someone else does a more thorough refactoring of BulkCopy we can address some of this and some of the other cruft in BulkCopy/MultipleRowsHelper.

@to11mtm
Copy link
Contributor Author

to11mtm commented Jun 1, 2021

/azp run test-firebird

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@@ -77,7 +79,27 @@ public virtual void BuildColumns(object item, Func<ColumnDescriptor, bool>? skip
{
var name = ParameterName == "?" ? ParameterName : ParameterName + ++ParameterIndex;

StringBuilder.Append(name);
if (castParameters)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CAST needed only for first select in union. For other rows db will use type information from first type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah in retrospect I don't think it would hurt to make this if (castParameters && CurrentCount==0). I was over-thinking the jump cost lol.

StringBuilder.Append(name);
StringBuilder.Append(" AS ");
StringBuilder.Append(column.DataType);
if (!string.IsNullOrEmpty(column.DbType))
Copy link
Contributor

@MaceWindu MaceWindu Jun 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use sql builder here as it already have db-specific type builder method that takes into account db-specific stuff?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you could give some pointers on which method to try to use in SQLBuilder I can give it a shot after work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran into some challenges here (we are using ISqlBuilder, can't cleanly internal on an interface,) so tried to solve for this by adding to ISqlBuilder Interface, StringBuilder BuildDataType (StringBuilder sb, SqlDataType dataType);. I went with returning the stringbuilder because that was the convention for BuildTableName, Convert, and others.

@to11mtm
Copy link
Contributor Author

to11mtm commented Jun 3, 2021

/azp run test-all

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MaceWindu
Copy link
Contributor

/azp run test-all

@MaceWindu
Copy link
Contributor

@to11mtm, if you are interested in working on this area more I have several suggestions for improvements:

  • issue, similar to one you fought with Firebird, still exists for parameter-less mode when first data row contains nulls (no type information issue for multiple databases)
  • instead of query generation for each batch, we can reuse generated SQL for batches of same size
  • it makes sense to investigate new batching feature from .net 6 : New System.Data.Common batching API dotnet/runtime#28633 . Here I would add that some providers already support it. IIRC SAP HANA provider repeat command if command use less parameters than you passed to command, but it is provider with positional parameters, so it is not a common case

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MaceWindu
Copy link
Contributor

/azp run test-all

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MaceWindu
Copy link
Contributor

/azp run test-all

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@linq2dbot
Copy link

Test baselines changed by this PR. Don't forget to merge/close baselines PR after this pr merged/closed.

@to11mtm
Copy link
Contributor Author

to11mtm commented Jun 3, 2021

@to11mtm, if you are interested in working on this area more I have several suggestions for improvements:

* issue, similar to one you fought with Firebird, still exists for parameter-less mode when first data row contains nulls (no type information issue for multiple databases)

* instead of query generation for each batch, we can reuse generated SQL for batches of same size

* it makes sense to investigate new batching feature from .net 6 : [dotnet/runtime#28633](https://github.com/dotnet/runtime/issues/28633) . Here I would add that some providers already support it. IIRC SAP HANA provider repeat command if command use less parameters than you passed to command, but it is provider with positional parameters, so it is not a common case

Yeah, I want to do more work on this in future; My big goal is to have version of BulkCopy that will return inserted Identity columns (issue #2960 ), but it is a bigger refactor because every DB that supports returning on MultipleRows has different rules on how to do this right.

I started looking into re-using the generated SQL, I think for it to be of actual benefit we will need to do more refactoring. Our stringbuild is pretty efficent as-is, to get a worthwhile benefit I think we would have to also .Prepare() the statement, and go a little lower level on re-assigning parameter values (at least some data providers, if you add/remove/reassign DbParameter on DbCommand.Parameters the .Prepare() gets thrown out, you have to ONLY touch the actual Value of the existing DbParameter object, if that makes sense.)

@MaceWindu
Copy link
Contributor

/azp run test-all

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MaceWindu
Copy link
Contributor

/azp run test-all

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MaceWindu MaceWindu merged commit c815832 into linq2db:master Jun 3, 2021
MaceWindu pushed a commit to linq2db/linq2db.baselines that referenced this pull request Jun 3, 2021
* [Windows / NET472 / SQLite.MS] baselines

* [Windows / NET472 / SQL CE] baselines

* [Windows / NET472 / Access ODBC MDB] baselines

* [Windows / NET472 / SQL Server 2012 (System.Data.SqlClient)] baselines

* [Windows / NET472 / SQL Server 2005 (System.Data.SqlClient)] baselines

* [Windows / NET472 / SQL Server 2008 (System.Data.SqlClient)] baselines

* [Windows / NET472 / SQL Server 2016 (System.Data.SqlClient)] baselines

* [Windows / NET472 / SQL Server 2014 (System.Data.SqlClient)] baselines

* [Windows / NET472 / SQL Server 2017 (System.Data.SqlClient)] baselines

* [Windows / NET472 / Access Jet] baselines

* [Windows / NET472 / SQLite] baselines

* [Windows / NETCOREAPP2.1 / SQL CE] baselines

* [Windows / NETCOREAPP2.1 / Access ODBC ACE x64] baselines

* [Windows / NETCOREAPP2.1 / SQLite.MS] baselines

* [Windows / NET472 / SQL Server 2019 (System.Data.SqlClient)] baselines

* [Windows / NET472 / SQL Server 2019 (Microsoft.Data.SqlClient)] baselines

* [Windows / NETCOREAPP2.1 / SQL Server 2016 (System.Data.SqlClient)] baselines

* [Windows / NETCOREAPP2.1 / SQL Server 2005 (System.Data.SqlClient)] baselines

* [Linux / NETCOREAPP2.1 / Informix 14.10] baselines

* [Windows / NETCOREAPP2.1 / SQL Server 2008 (System.Data.SqlClient)] baselines

* [Linux / NETCOREAPP2.1 / DB2 LUW 11.5] baselines

* [Windows / NET 5.0 / SQL Server 2019 (Microsoft.Data.SqlClient)] baselines

* [Linux / NET5.0 / PostgreSQL 13] baselines

* [Linux / NETCOREAPP3.1 / PostgreSQL] baselines

* [Linux / NETCOREAPP3.1 / MySQL] baselines

* [Windows / NETCOREAPP2.1 / SQL Server 2012 (System.Data.SqlClient)] baselines

* [Windows / NETCOREAPP2.1 / SQL Server 2017 (System.Data.SqlClient)] baselines

* [Linux / NET5.0 / Sybase ASE 16] baselines

* [Linux / NET5.0 / SQLite] baselines

* [Linux / NETCOREAPP2.1 / MariaDB] baselines

* [Windows / NETCOREAPP2.1 / SQL Server 2014 (System.Data.SqlClient)] baselines

* [Linux / NETCOREAPP2.1 / Firebird 3.0] baselines

* [Linux / NETCOREAPP2.1 / Firebird 2.5] baselines

* [Linux / NETCOREAPP2.1 / MySQL 5.5] baselines

* [Linux / NETCOREAPP2.1 / PostgreSQL 10] baselines

* [Linux / NETCOREAPP2.1 / PostgreSQL 11] baselines

* [Linux / NETCOREAPP2.1 / PostgreSQL 12] baselines

* [Linux / NETCOREAPP2.1 / PostgreSQL 9.2] baselines

* [Linux / NETCOREAPP2.1 / PostgreSQL 9.3] baselines

* [Linux / NETCOREAPP2.1 / PostgreSQL 9.5] baselines

* [Linux / NETCOREAPP2.1 / Oracle 11g XE] baselines

* [Linux / NETCOREAPP2.1 / Oracle 12c] baselines

* [Linux / NETCOREAPP2.1 / SAP HANA 2] baselines

* [Linux / NETCOREAPP2.1 / Firebird 4.0 (RC1)] baselines

Co-authored-by: Azure Pipelines Bot <azp@linq2db.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Running BulkCopy without parameters
4 participants