Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SqlBulkCopy.RowsCopied overflows because it is an int #2001

Closed
madelson opened this issue Apr 17, 2023 · 8 comments · Fixed by #2004
Closed

SqlBulkCopy.RowsCopied overflows because it is an int #2001

madelson opened this issue Apr 17, 2023 · 8 comments · Fixed by #2004
Labels
🙌 Up-for-Grabs Issues that are ready to be picked up for anyone interested. Please self-assign and remove the label

Comments

@madelson
Copy link
Contributor

Describe the bug

SqlBulkCopy.RowsCopied is supposed to return "The number of rows processed in the ongoing bulk copy operation". However, because it is Int32 rather than Int64 like SqlRowsCopiedEventArgs.RowsCopied, it is prone to overflow into the negative with very large copy operations (something we're currently running into).

This makes it difficult to track the actual number of rows copied.

Furthermore, the internal int field behind this property also flows into the event args, resulting in negative values there even though the data type is long.

To reproduce

  • Run a SqlBulkCopy operation with > int.MaxValue rows.
  • Observe SqlBulkCopy.RowsCopied after the fact (and/or observe the SqlRowsCopied event's args)

Expected behavior

Ideally, both properties would be long and would only overflow after long.MaxValue rows copied, which seems extremely unlikely to happen. Given that we probably can't change the type of the RowsCopied property at this point, what would be nice to see is:

  • Internally the rows are tracked as long so that the event args has the right value
  • A new long RowsCopied64 property is added to SqlBulkCopy that returns the underlying value as long.

Further technical details

Microsoft.Data.SqlClient version: multiple
.NET target: .NET 6
SQL Server version: SQL Server 2019
Operating system: Windows 10

@JRahnama JRahnama added the 🆕 Triage Needed For new issues, not triaged yet. label Apr 17, 2023
@JRahnama
Copy link
Contributor

@madelson thanks for opening the issue. We will look into this and will get back to you soon.

@Wraith2
Copy link
Contributor

Wraith2 commented Apr 17, 2023

Given that we probably can't change the type of the RowsCopied property at this point, what would be nice to see is:

  • Internally the rows are tracked as long so that the event args has the right value
  • A new long RowsCopied64 property is added to SqlBulkCopy that returns the underlying value as long.

This seems reasonable.
What do we do with the existing int RowsCopied property if the value exceeds int.MaxValue? we can either leave the behaviour as it is currently or clamp it to int.MaxValue.

@ErikEJ
Copy link
Contributor

ErikEJ commented Apr 17, 2023

Leave as is, people may rely on the current behaviour

@Wraith2
Copy link
Contributor

Wraith2 commented Apr 17, 2023

That works for me. Doing nothing is always my preferred option 😁

@ErikEJ
Copy link
Contributor

ErikEJ commented Apr 17, 2023

It should be.

@roji
Copy link
Member

roji commented Apr 18, 2023

FWIW we did something similar in Npgsql: DbDataReader.RowsAffected is an int, which is insufficient, so we also exposed a ulong property alongside it.

For the existing property, I'd consider throwing OverflowException (pointing to the new property); I don't think anyone can be using that negative value productively in any way, and it may be causing subtle bugs.

@JRahnama JRahnama removed the 🆕 Triage Needed For new issues, not triaged yet. label Apr 18, 2023
@madelson
Copy link
Contributor Author

madelson commented Apr 19, 2023

What do we do with the existing int RowsCopied property if the value exceeds int.MaxValue? we can either leave the behaviour as it is currently or clamp it to int.MaxValue.

Leave as is, people may rely on the current behaviour

For the existing property, I'd consider throwing OverflowException (pointing to the new property); I don't think anyone can be using that negative value productively in any way, and it may be causing subtle bugs.

We are actually relying on the current behavior. Our goal is to issue a log at the end of the operation with the total count. However, to reduce overhead we don't want to fire the SqlRowsCopied event for every single row (possibly this concern is unfounded). Therefore we're doing this hack to get the total count:

var count = 0L;

sqlBulkCopy.NotifyAfter = 1024;
sqlBulkCopy.RowsCopied += (o, e) => count += 1024;
await sqlBulkCopy.WriteToServerAsync(...);
count += sqlBulkCopy.RowsCopied & 1023;
  • My top choice would be to leave as-is, perhaps marking the RowsCopied property as Obsolete to get people to use the new one instead.
  • My second choice would be to throw OverflowException so that when the behavior changes at least we'll be notified to go in and fix.
  • I don't think we should clamp the value; then we'd just be getting the wrong count.

@Wraith2
Copy link
Contributor

Wraith2 commented Apr 19, 2023

The PR I opened to add RowsCopied64 #2004 leave the current behaviour intact.

@JRahnama JRahnama added the 🙌 Up-for-Grabs Issues that are ready to be picked up for anyone interested. Please self-assign and remove the label label Apr 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🙌 Up-for-Grabs Issues that are ready to be picked up for anyone interested. Please self-assign and remove the label
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants