-
Notifications
You must be signed in to change notification settings - Fork 23
DoWorkAsync uses sync methods #54
Comments
When SequentialAccess behavior is not used for ExecuteReader, ReadAsync guarantees that it will only complete once enough data has been read from the network that the complete row is in memory, i.e. that any column value reads will be completed synchronously. In other words, the async versions of the methods to read column values are only there for SequentialAccess mode. |
That's true for SqlClient but is it something that all providers adhere to? Without knowing the internals of a particular ado library i would expect users to take either a fully sync or async path when writing the code. Or are we just looking for an optimized case in this bench, in which case we should be using Sequential behaviour. |
@Wraith2 like everywhere else with ADO.NET, there's no absolute guarantee here for specific provider behavior. However, as @divega wrote, the ability to read the columns of a row out of order (when not using SequentialAccess) points to the expectation of row buffering in memory, in which case async column access is meaningless (because it will always return synchronously). The alternative would be a mechanism where I/O needs to be performed for every column, which seems like a very bad idea for perf (again, unless we're dealing with huge columns). At least as far as I'm aware, all major providers behave this way. Using SequentialAccess is a user decision, which only makes sense if there's an expectation of very large values - it is definitely not supposed to be a "default" option, regardless of sync or async. My expectation is that specifying SequentialAccess would actually reduce perf (possibly considerably), so it should only be used when the memory savings are worth it. This is also why the best practice for most providers should be to not use async column access, even when using async access for everything else. Finally, the way I understand the benchmarks themselves, there's no pretense of completeness or coverage of all cases, or even to necessarily execute completely provider-independent logic; the idea is more to check the common usage scenarios and see where the provider(s) are. This is why I don't think it makes sense to benchmark SequentialAccess scenarios: transferring large values from the database seems to be the rare case, and so the main focus is elsewhere... |
The intent being to get good perf in common scenarios is the guidance i was looking for i think. From my reading of how sequential works in SqlClient it means that you will not keep an array of SqlBuffer objects in memory containing values which have been read past only the instance required for the current column. If you need data which isn't in the current packet then network io will be required at the request time. I could have misunderstood but i've seen nothing to allow multipacket buffering. So GetValueAsync (in sqlclient, other providers could well be legacy constrained) can do io even in sequential mode since you can't always be certain that your row is small enough to fit in a packet. That's why i was wondering if it would be better to model the fully async version because there can be real wait in there and the Get* methods will do it synchrnously. This is extremely unlikely to be the case in this bench since the data is tiny and will always fit in a packet. It's also true that the Get* methods have much better perf than GetFieldValueAsync simply because of the task machinery involved so using them in this bench is much better for performance. As i said if i were writing naive async code i'd go full async because the common guidance to users is not to mix sync and async. In this case i know mixing is better but it wouldn't be obvious to casual users. Any idea on the other question? of why there is no mssql in the published TE benches? it leaves me not quite knowing what baseline i'm trying to beat for performance improvements. I've got a couple of branches which give SqlClient an overall ~17% throughput increase with async/64threads/60s but i'm not sure if that's useful or not. |
|
Would it have to be the linux versions? because the linux performance kinda sucks in comparison to native, dotnet/corefx#35363 |
That more or less aligns with how Npgsql works (except there's no SqlBuffer objects there, just a single byte array that gets decoded upon user request). This notably means that it's the user's responsibility to always read columns in the order in which they were requested, and it's not possible to read the same column twice. (On an unrelated note, it seems pretty vital for perf to be able to buffer multiple rows when they're small (not sure how that related to TCP packets in the SqlClient world). Typically when a query is sent, PostgreSQL will respond with a TCP packets that each contain multiple rows. On the Npgsql side, we read as much as we can into a fixed-length buffer (8K), which again means we get multiple rows into the internal buffer with one I/O system call - and subsequent
I think the main point is elsewhere: in sequential mode, I/O can indeed occur at any point as you say (reading rows, columns, whenever). The important part is that when not using sequential mode, I/O does not occur when reading columns. If I/O did occur in this situation, that would mean that the row wouldn't be buffered in memory, and it wouldn't be possible to read values in random order etc.
You're right that this isn't obvious. I do think that it's part of the contract of the column getters that they only perform I/O if SequentialAccess is specified, but this could probably be documented/explained better. FWIW there used to be a good ADO.NET blog post when the async overloads were first introduced, which explained exactly this point (it evaporated with the rest of the MSDN blogs unfortunately...). Finally, if users do use |
I think it was done this way to even in non-sequential mode you never have to go backwards because there's no guarentee that it won't require a previous packet. Even if your row spans 4 packets and you're on the last one you can still get to the decoded form of the data for previous columns that were in previous packets via the SqlBuffer. Arguably you shouldn't do that but people do strange things and it will work. The TDS spec from my reading requires that all packets be of the maximum size unless it is the final packet in a request stream (which is identified by a flag) and this means that it packs as much as possible into as few packets as possible, so multiple small rows in a single packet is common and will be the case for this benchmark which is why the sync fetches work better than async ones.
Fair enough. |
I suspect we have a bit of terminology mismatch: when I wrote packet I meant "TCP packet" - I'm guessing that your packets are TDS-level. PostgreSQL has no concept of a packet in the protocol. It does have messages (and each row is represented by a single message), but there can be multiple messages in a TCP packet, just like a single message can be made up of multiple packets.
I think I understand what you're saying, but I'm not sure why one would need to go back to the decoded form of the data for previous columns (regardless of any TCP packet involvement). Again on the Npgsql side, in non-sequential mode the entire row (represented by a single protocol message) is buffered in a single, contiguous memory buffer. As the user requests columns, we jump around that memory buffer and decode whatever is needed from the right area - nothing is cached or saved on the way.
I don't really understand why this would be the case... Assuming the async methods are optimized for returning synchronously (i.e. because the required data is already buffered in memory), there should be very little difference between the sync and async APIs (it's true that there is some difference, but it really should be quite negligible). |
Everything is decoded as it is encountered so you don't really have a choice but to keep the decoded form. It makes sense in the context of the underlying machinery. I'm not sure if i'd choose to do it differently or that there aren't better ways to do it. It's just what I see in the code. Even if an async task returns synchronously you have the overhead of a task wrapping the value. Those are non-reusable ref types and as you say they should be negligible but they add up, a few million small objects is still a lot of garbage. |
I'm guessing (or hoping..) you're referring to the current architecture of SqlClient, as opposed to any hard constraint of the protocol... In any case, as long as those SqlBuffer instances are pooled I guess it's not so bad...
At least internally in the provider, the use of ValueTask can mitigate that. For the surface API ( |
They aren't, that's something on my list to investigate but pools are complex and I've already broken the library with one so I'll take that slow. 😁 |
Sounds very wise :) |
In the AdoDriver DoSyncWork function the connection and reader are awaited but value reads still use sync Get* methods.
DataAccessPerformance/src/BenchmarkDb/AdoDriver.cs
Lines 120 to 129 in a3bf85d
Is this a specific choice, and if so what is the reasoning?
Also, is there a reason that there are no mssql benches in the techempower benchmarks? i'm using this repo to investigate and improve sqlclient throughput which will be of benefit to users but they aren't going to know about it.
The text was updated successfully, but these errors were encountered: