Add and use TryReadChars method #1544

Wraith2 · 2022-03-13T23:33:46Z

In #593 (comment) @panoskj pointed out that there was a path in the example reproduction which was copying characters from a buffer one at a time in a loop and that this was inefficient. I've done a little digging and this is an attempt to speed it up. Note that not all strings are copied this way.

Unfortunately the fix isn't as simple as copying into from the byte[]. We need to make sure that if we reach a packet boundary that we call a function capable of fetching the next packet. We also need to be aware that we could end up with a partial character in the input buffer and that needs special handling. The strategy I've settled on for this PR is to identify how much can be copied in bulk from the current packet and then pick up any remainder or simply force a new packet by calling TryReadChar() as the code currently does. This should result in most data being read in large blocks with single chars being picked up at existing boundaries which should be faster if not the fastest possible.

/cc @panoskj could you take a look and see if you agree with this and possibly run it through your benches/profiler to see if it improves the numbers you saw?

panoskj · 2022-03-14T20:54:33Z

I have not run any benchmarks with it yet, but it seems like a reasonable change. It should speed up both sync and async paths.

It also looks like you could replace usages of TryReadPlpUnicodeChars with TryReadStringWithEncoding (passing Encoding.Unicode as encoding parameter and doubled string size), if the result will be converted to a string anyway. This is the case inside TryReadSqlStringValue for example. This way you can avoid the case of "partially read characters" altogether. It looks like TryReadPlpUnicodeChars is only needed for SqlDataReader.GetChars actually.

Though, this change cannot improve the async path much, because as I mentioned in #593 (comment), its time complexity is quadratic. No matter how fast you make the copying itself, all packets in the replay buffer will be copied into the characters buffer for each packet received again and again.

The optimal solution would copy each packet once instead. See https://github.com/panoskj/SqlClient/tree/wip/issue/593 for a proof of concept. Async runs about as fast as the sync does, even for very large columns, e.g. 50MB. Note that I haven't reviewed the last commit much, it wasn't meant to be uploaded yet.

In conclusion, perhaps we should merge the netcore and netfx files before applying a dozen of new patches to them. But except for the pending merge, I agree with this change.

Wraith2 · 2022-03-14T21:12:45Z

It also looks like you could replace usages of TryReadPlpUnicodeChars with TryReadStringWithEncoding (passing Encoding.Unicode as encoding parameter and doubled string size), if the result will be converted to a string anyway. This is the case inside TryReadSqlStringValue for example. This way you can avoid the case of "partially read characters" altogether. It looks like TryReadPlpUnicodeChars is only needed for SqlDataReader.GetChars actually.

Possibly. Identifying whether the behaviour is the same can be time consuming.

Though, this change cannot improve the async path much, because as I mentioned in #593 (comment), its time complexity is quadratic. No matter how fast you make the copying itself, all packets in the replay buffer will be copied into the characters buffer for each packet received again and again.

We both have branches which can linearize that, just waiting on the tree merges.

In conclusion, perhaps we should merge the netcore and netfx files before applying a dozen of new patches to them. But except for the pending merge, I agree with this change.

I've love to. Tree merging is going slowly though. If I can get a small PR through to improve performance I'll try it.

panoskj · 2022-03-15T18:32:24Z

Identifying whether the behaviour is the same can be time consuming.

Nevertheless, this insight may be worth it, given the context of optimizing the async path later. But it is completely optional right now.

Wraith2 · 2022-04-08T00:56:21Z

It looks like this doesn't work with some of the bulkcopy tests.

Wraith2 · 2022-04-13T20:21:07Z

I updated without making any material change and the CI is mostly clean. The only failures are in AE and while they do look look related I have no way to replicate that environment to debug them.

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/TdsParser.cs

src/Microsoft.Data.SqlClient.sln

codecov · 2022-08-18T23:27:22Z

Codecov Report

Base: 70.86% // Head: 69.56% // Decreases project coverage by -1.31% ⚠️

Coverage data is based on head (2ad4391) compared to base (5ff68eb).
Patch coverage: 84.44% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1544      +/-   ##
==========================================
- Coverage   70.86%   69.56%   -1.31%     
==========================================
  Files         292      292              
  Lines       61750    61768      +18     
==========================================
- Hits        43759    42967     -792     
- Misses      17991    18801     +810

Flag	Coverage Δ
addons	`92.38% <ø> (ø)`
netcore	`73.74% <83.33%> (-0.76%)`	⬇️
netfx	`67.83% <93.93%> (-1.33%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...c/Microsoft/Data/SqlClient/TdsParserStateObject.cs	`23.30% <ø> (-21.08%)`	⬇️
.../netcore/src/Microsoft/Data/SqlClient/TdsParser.cs	`73.38% <66.66%> (-0.15%)`	⬇️
...nt/netfx/src/Microsoft/Data/SqlClient/TdsParser.cs	`70.84% <66.66%> (-0.14%)`	⬇️
...c/Microsoft/Data/SqlClient/TdsParserStateObject.cs	`87.46% <90.90%> (-2.27%)`	⬇️
...c/Microsoft/Data/SqlClient/TdsParserStateObject.cs	`26.18% <0.00%> (-39.85%)`	⬇️
...ata/SqlClient/SqlConnectionTimeoutErrorInternal.cs	`30.35% <0.00%> (-21.43%)`	⬇️
.../Microsoft/Data/SqlClient/SqlTransaction.Common.cs	`74.35% <0.00%> (-5.13%)`	⬇️
...ta/SqlClient/SqlConnectionPoolGroupProviderInfo.cs	`42.50% <0.00%> (-5.00%)`	⬇️
... and 17 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

DavoudEshtehari

@Wraith2 LGTM, do you have any plan to port it into netfx too?

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/TdsParserStateObject.cs

DavoudEshtehari · 2022-09-30T21:27:23Z

PR #1520 is the prerequisite of this change.

Wraith2 · 2023-02-01T01:32:43Z

Prerequisites are met...

panoskj · 2023-02-01T07:54:41Z

I see you haven't modified any existing code inside TdsParserStateObject - you only added a new method. This means, you can now update this PR to add the new method inside the merged file (src\Microsoft\Data\SqlClient\TdsParserStateObject.cs) instead of inside the corefx version. Of course, to do this, you will have to merge main branch into this one (to get #1520's changes).

Moreover, according to this comment, there may be someone working on merging TdsParser already, but there is no PR yet (they are waiting for #1844, which will help them a lot). You could port the changes to TryReadPlpUnicodeCharsChunk to netfx in order to help them. Alternatively, you could merge TryReadPlpUnicodeCharsChunk into a new common file (just merge this one method and leave the rest). That being said, I haven't checked whether TryReadPlpUnicodeCharsChunk is the same in corefx and netfx - if there are significant changes, porting/merging may be too hard. But I suspect they are identical.

Wraith2 · 2023-02-09T10:55:43Z

Count someone from the MS side check the CI machines? the enclave steps are either timing out or just plain not working.

Wraith2 · 2023-02-09T20:41:38Z

Fixed, quick, merge while it's green!

David-Engel

Lets get this in sooner than later so it gets some miles on it during preview.

Wraith2 · 2023-02-10T20:19:57Z

Lets get this in sooner than later so it gets some miles on it during preview.

Good idea. @panoskj do you have any benches you want to try with it since you found the original issue?

Wraith2 closed this Mar 14, 2022

Wraith2 reopened this Mar 14, 2022

David-Engel reviewed Apr 29, 2022

View reviewed changes

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/TdsParser.cs Show resolved Hide resolved

JRahnama requested review from DavoudEshtehari, lcheunglci and Kaur-Parminder August 16, 2022 16:46

JRahnama added the ➕ Code Health Issues/PRs that are targeted to source code quality improvements. label Aug 16, 2022

JRahnama added this to the 5.1.0-preview1 milestone Aug 16, 2022

lcheunglci reviewed Aug 18, 2022

View reviewed changes

src/Microsoft.Data.SqlClient.sln Outdated Show resolved Hide resolved

DavoudEshtehari reviewed Sep 29, 2022

View reviewed changes

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/TdsParserStateObject.cs Outdated Show resolved Hide resolved

DavoudEshtehari modified the milestones: 5.1.0-preview1, 5.1.0-preview2 Sep 30, 2022

lcheunglci closed this Oct 4, 2022

lcheunglci reopened this Oct 4, 2022

JRahnama removed this from the 5.1.0-preview2 milestone Nov 8, 2022

This was referenced Dec 8, 2022

Heavy GC thrashing in TdsParser #1864

Closed

Add array rental capability in TryReadPlpUnicodeChars #1866

Merged

add and use TryReadChars method

7674cdf

Wraith2 force-pushed the perf-tryreadchar branch 18 times, most recently from d920c94 to a406d3d Compare February 9, 2023 10:39

Wraith2 force-pushed the perf-tryreadchar branch from a406d3d to fa4965c Compare February 9, 2023 16:24

move TryReadChars to shared and cleanup

2ad4391

Wraith2 force-pushed the perf-tryreadchar branch from fa4965c to 2ad4391 Compare February 9, 2023 18:44

David-Engel approved these changes Feb 10, 2023

View reviewed changes

JRahnama approved these changes Feb 10, 2023

View reviewed changes

JRahnama merged commit 6db45d4 into dotnet:main Feb 10, 2023

Wraith2 deleted the perf-tryreadchar branch February 10, 2023 20:23

razvalex mentioned this pull request Mar 19, 2024

TransactionAbortedException when performing queries in parallel inside a transaction scope #1675

Closed

edwardneal mentioned this pull request Nov 28, 2024

Reading large data (binary, text) asynchronously is extremely slow #593

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add and use TryReadChars method #1544

Add and use TryReadChars method #1544

Wraith2 commented Mar 13, 2022

panoskj commented Mar 14, 2022

Wraith2 commented Mar 14, 2022

panoskj commented Mar 15, 2022

Wraith2 commented Apr 8, 2022

Wraith2 commented Apr 13, 2022

codecov bot commented Aug 18, 2022 •

edited

Loading

DavoudEshtehari left a comment

DavoudEshtehari commented Sep 30, 2022

Wraith2 commented Feb 1, 2023

panoskj commented Feb 1, 2023

Wraith2 commented Feb 9, 2023

Wraith2 commented Feb 9, 2023

David-Engel left a comment

Wraith2 commented Feb 10, 2023

Add and use TryReadChars method #1544

Add and use TryReadChars method #1544

Conversation

Wraith2 commented Mar 13, 2022

panoskj commented Mar 14, 2022

Wraith2 commented Mar 14, 2022

panoskj commented Mar 15, 2022

Wraith2 commented Apr 8, 2022

Wraith2 commented Apr 13, 2022

codecov bot commented Aug 18, 2022 • edited Loading

Codecov Report

DavoudEshtehari left a comment

Choose a reason for hiding this comment

DavoudEshtehari commented Sep 30, 2022

Wraith2 commented Feb 1, 2023

panoskj commented Feb 1, 2023

Wraith2 commented Feb 9, 2023

Wraith2 commented Feb 9, 2023

David-Engel left a comment

Choose a reason for hiding this comment

Wraith2 commented Feb 10, 2023

codecov bot commented Aug 18, 2022 •

edited

Loading