-
Notifications
You must be signed in to change notification settings - Fork 4.9k
SqlClient managed networking improvements #35363
Conversation
@afsanehr @tarikulsabbir @Gary-Zh @David-Engel can you please look at this one too? Also 2 weeks old without code review :( |
@afsanehr any update on review ETA? |
@karelz We are reviewing this PR. |
src/System.Data.SqlClient/src/System/Data/SqlClient/SNI/SNIPacket.cs
Outdated
Show resolved
Hide resolved
@@ -84,6 +84,8 @@ internal abstract class SNIHandle | |||
/// </summary> | |||
public abstract Guid ConnectionId { get; } | |||
|
|||
public virtual bool SMUXEnabled => false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to make it public?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't publc, it's a public method on an internal class so it's effectively private to external view. it is however part of the public surface inside the assembly.
src/System.Data.SqlClient/src/System/Data/SqlClient/SNI/SNIPacket.cs
Outdated
Show resolved
Hide resolved
/// </summary> | ||
/// <param name="obj"></param> | ||
/// <returns>true if equal</returns> | ||
public override bool Equals(object obj) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose to remove these public functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Packets are never compared to each other. If in further work they are compared it will be in a pool and the reference equality will be all that is needed. As such these methods aren't needed or used currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to remain what we have before even if it's not used currently.
There should be a reason why they were added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They were originally added with the intention of using them when pooling managed packets. The packet pooling was never fully written and integrated so they weren't used. See the removed WritePacketCache
code in TdsParserStateObjectManaged
. Are you sure you want dead and trivial code keeping?
src/System.Data.SqlClient/src/System/Data/SqlClient/TdsParserStateObjectFactory.Windows.cs
Outdated
Show resolved
Hide resolved
src/System.Data.SqlClient/src/System/Data/SqlClient/TdsParserStateObjectManaged.cs
Outdated
Show resolved
Hide resolved
@Wraith2 I've run EFCore test on Windows and some of the test cases failed due to server timeout.
Failed test cases are: |
Interesting. I saw similar behaviour when I enabled packet pooling but couldn't work out why. Nothing I've changed should affect the flow of packets only how they're constructed, why would a timeout occur? Can you see anything I've changed anywhere that could cause this? |
src/System.Data.SqlClient/src/System/Data/SqlClient/TdsParserStateObjectManaged.cs
Show resolved
Hide resolved
@Wraith2 Please look into |
I have the repo to run the tests but I've never used EF and that test case looks complicated. I know I can't fix this alone because I've tried. Trying to trace individual packets through this mountain of a library is virtually impossible. If it were my job then i've give it a go but this isn't my job. If there's not going to be any attempt at collaboration or suggestion on avenues of approach from the expert owners then there's no point leaving this open. |
Hey @Wraith2 don't close this PR. Let's add the do not merge tag to it and revisit it when there is bandwidth. I agree that it is not easy to debug through the mountain of packets. Please reopen this PR. |
Also please note that at this point, we will also have to debug into this problem and figure out where things are going wrong. |
@Gary-Zh do you have ETA for the test verification? It's been 9 days since that info ... |
There is an issue with EF tests hanging on Linux which is being followed up on dotnet/efcore#15333 |
Thanks for update! Do you have rough ETA? (just to set overall expectations) |
I've got a further branch of this which adds packet caching per connection and reliably fails a manual test with a timeout, probably the same one you're seeing in the EF core tests. I meant to add this functionality originally but the presence of the timeout every time I added the cache convinced me I was doing something wrong but couldn't find it. Now we know that it isn't my change causing it you might find it useful to use this branch to try and track it down, as I said I tried for some weeks and couldn't. https://github.com/Wraith2/corefx/tree/sqlperf-managedsmux2 |
In my local copy of the branch I linked I seem to have fixed the timeout. It wasn't my intention to do so but another bug where a packet was used after it was no longer valid became apparent when packet recycling was introduced. The error I fixed was in |
@saurabh500 @Wraith2 could you please provide a status on this? |
Hi @karelz , most of the tests have already passed, there is only one pending test: EFCore on Linux, and it keeps hanging on all of our Linux VMs. Here's the link to the hanging issue: |
Hi @karelz , no test failures spotted yet because we can't even run the tests successfully. The EFCore test itself is hanging and we are looking for help from EFCore team right now. |
Note: Porting to Microsoft.Data.SqlClient incomplete without #40732 |
SqlClient managed networking improvements, all test passed Commit migrated from dotnet/corefx@2469a91
While profiling SqlClient for various other PR's it became clear that the managed implementation was wasting a lot of resources in networking and that this was severely degrading performance relative to the native pinvoke version. This PR works towards closing that gap.
The core of the changes start in SNIPacket. This has had a start field added which allows the ability to reserve space at the start of the packet for an additional header, the MARS smux function can now take advantage of this reservation removing the need to allocate a new packet and data array before sending. As a result of the header functionality it is no longer possible to cache and re-use a single attention packet.
The lifetime management of packets has been tightened and calls to
Release
added to avoid dropping packets and their rented buffers for the GC to deal with. The packet has had the IDispose and IEquatable removed because they are not needed and the explicit call to Release over Dispose makes the lifetime management much clearer in the code. As a result of the clearer packet management there is now no need to clone a packet before sending which avoids further re-allocation. This changes the number of packet copies from 2 to 1 in standard cases and with the smux reservation 3 to 1 for mars connections.The functional and manual tests run through sucessfully.
Benchmark results are overall positive:
Those results make the managed version looks very similat to the native one in performance, this isn't really the case. Profiles show the improvement in a slighlty more nuanced way.
Baseline, master native: very little GC activity and the test takes 10s
Current, master managed: huge amounts of gc activity, a gen0 roughly every 16ms, the test takes 13s
Improved, branch managed: much less GC activity and the test takes 12s
So looking at it optimistically the memory improvements pull back roughly a third of the speed difference between managed and native.
/cc @afsanehr, @tarikulsabbir, @David-Engel, @saurabh500