Skip to content
This repository has been archived by the owner on Feb 26, 2021. It is now read-only.

DoWorkAsync uses sync methods #54

Closed
Wraith2 opened this issue Apr 17, 2019 · 13 comments
Closed

DoWorkAsync uses sync methods #54

Wraith2 opened this issue Apr 17, 2019 · 13 comments

Comments

@Wraith2
Copy link

Wraith2 commented Apr 17, 2019

In the AdoDriver DoSyncWork function the connection and reader are awaited but value reads still use sync Get* methods.

using (var reader = await command.ExecuteReaderAsync())
{
while (await reader.ReadAsync())
{
results.Add(
new Fortune
{
Id = reader.GetInt32(0),
Message = reader.GetString(1)
});

Is this a specific choice, and if so what is the reasoning?

Also, is there a reason that there are no mssql benches in the techempower benchmarks? i'm using this repo to investigate and improve sqlclient throughput which will be of benefit to users but they aren't going to know about it.

@divega
Copy link

divega commented Apr 17, 2019

When SequentialAccess behavior is not used for ExecuteReader, ReadAsync guarantees that it will only complete once enough data has been read from the network that the complete row is in memory, i.e. that any column value reads will be completed synchronously. In other words, the async versions of the methods to read column values are only there for SequentialAccess mode.

@Wraith2
Copy link
Author

Wraith2 commented Apr 17, 2019

That's true for SqlClient but is it something that all providers adhere to? Without knowing the internals of a particular ado library i would expect users to take either a fully sync or async path when writing the code. Or are we just looking for an optimized case in this bench, in which case we should be using Sequential behaviour.

@roji
Copy link
Member

roji commented Apr 17, 2019

@Wraith2 like everywhere else with ADO.NET, there's no absolute guarantee here for specific provider behavior. However, as @divega wrote, the ability to read the columns of a row out of order (when not using SequentialAccess) points to the expectation of row buffering in memory, in which case async column access is meaningless (because it will always return synchronously). The alternative would be a mechanism where I/O needs to be performed for every column, which seems like a very bad idea for perf (again, unless we're dealing with huge columns). At least as far as I'm aware, all major providers behave this way.

Using SequentialAccess is a user decision, which only makes sense if there's an expectation of very large values - it is definitely not supposed to be a "default" option, regardless of sync or async. My expectation is that specifying SequentialAccess would actually reduce perf (possibly considerably), so it should only be used when the memory savings are worth it. This is also why the best practice for most providers should be to not use async column access, even when using async access for everything else.

Finally, the way I understand the benchmarks themselves, there's no pretense of completeness or coverage of all cases, or even to necessarily execute completely provider-independent logic; the idea is more to check the common usage scenarios and see where the provider(s) are. This is why I don't think it makes sense to benchmark SequentialAccess scenarios: transferring large values from the database seems to be the rare case, and so the main focus is elsewhere...

@Wraith2
Copy link
Author

Wraith2 commented Apr 17, 2019

The intent being to get good perf in common scenarios is the guidance i was looking for i think.

From my reading of how sequential works in SqlClient it means that you will not keep an array of SqlBuffer objects in memory containing values which have been read past only the instance required for the current column. If you need data which isn't in the current packet then network io will be required at the request time. I could have misunderstood but i've seen nothing to allow multipacket buffering.

So GetValueAsync (in sqlclient, other providers could well be legacy constrained) can do io even in sequential mode since you can't always be certain that your row is small enough to fit in a packet. That's why i was wondering if it would be better to model the fully async version because there can be real wait in there and the Get* methods will do it synchrnously. This is extremely unlikely to be the case in this bench since the data is tiny and will always fit in a packet. It's also true that the Get* methods have much better perf than GetFieldValueAsync simply because of the task machinery involved so using them in this bench is much better for performance.

As i said if i were writing naive async code i'd go full async because the common guidance to users is not to mix sync and async. In this case i know mixing is better but it wouldn't be obvious to casual users.

Any idea on the other question? of why there is no mssql in the published TE benches? it leaves me not quite knowing what baseline i'm trying to beat for performance improvements. I've got a couple of branches which give SqlClient an overall ~17% throughput increase with async/64threads/60s but i'm not sure if that's useful or not.

@AndresLuga
Copy link

why there is no mssql in the published TE benches?

#3786

@Wraith2
Copy link
Author

Wraith2 commented Apr 17, 2019

why there is no mssql in the published TE benches?

#3786

Would it have to be the linux versions? because the linux performance kinda sucks in comparison to native, dotnet/corefx#35363

@roji
Copy link
Member

roji commented Apr 17, 2019

From my reading of how sequential works in SqlClient it means that you will not keep an array of SqlBuffer objects in memory containing values which have been read past only the instance required for the current column. If you need data which isn't in the current packet then network io will be required at the request time. I could have misunderstood but i've seen nothing to allow multipacket buffering.

That more or less aligns with how Npgsql works (except there's no SqlBuffer objects there, just a single byte array that gets decoded upon user request). This notably means that it's the user's responsibility to always read columns in the order in which they were requested, and it's not possible to read the same column twice.

(On an unrelated note, it seems pretty vital for perf to be able to buffer multiple rows when they're small (not sure how that related to TCP packets in the SqlClient world). Typically when a query is sent, PostgreSQL will respond with a TCP packets that each contain multiple rows. On the Npgsql side, we read as much as we can into a fixed-length buffer (8K), which again means we get multiple rows into the internal buffer with one I/O system call - and subsequent DbDataReader.Read() calls will therefore return synchronously.)

So GetValueAsync (in sqlclient, other providers could well be legacy constrained) can do io even in sequential mode since you can't always be certain that your row is small enough to fit in a packet.

I think the main point is elsewhere: in sequential mode, I/O can indeed occur at any point as you say (reading rows, columns, whenever). The important part is that when not using sequential mode, I/O does not occur when reading columns. If I/O did occur in this situation, that would mean that the row wouldn't be buffered in memory, and it wouldn't be possible to read values in random order etc.

As i said if i were writing naive async code i'd go full async because the common guidance to users is not to mix sync and async. In this case i know mixing is better but it wouldn't be obvious to casual users.

You're right that this isn't obvious. I do think that it's part of the contract of the column getters that they only perform I/O if SequentialAccess is specified, but this could probably be documented/explained better. FWIW there used to be a good ADO.NET blog post when the async overloads were first introduced, which explained exactly this point (it evaporated with the rest of the MSDN blogs unfortunately...).

Finally, if users do use GetFieldValueAsync<T>(), that's not an error in any way - at worse they will get sub-optimal perf. But it's not necessarily a scenario we're interested in benchmarking.

@Wraith2
Copy link
Author

Wraith2 commented Apr 17, 2019

That more or less aligns with how Npgsql works (except there's no SqlBuffer objects there, just a single byte array that gets decoded upon user request).

I think it was done this way to even in non-sequential mode you never have to go backwards because there's no guarentee that it won't require a previous packet. Even if your row spans 4 packets and you're on the last one you can still get to the decoded form of the data for previous columns that were in previous packets via the SqlBuffer. Arguably you shouldn't do that but people do strange things and it will work.

The TDS spec from my reading requires that all packets be of the maximum size unless it is the final packet in a request stream (which is identified by a flag) and this means that it packs as much as possible into as few packets as possible, so multiple small rows in a single packet is common and will be the case for this benchmark which is why the sync fetches work better than async ones.

But it's not necessarily a scenario we're interested in benchmarking.

Fair enough.

@Wraith2 Wraith2 closed this as completed Apr 17, 2019
@roji
Copy link
Member

roji commented Apr 17, 2019

I suspect we have a bit of terminology mismatch: when I wrote packet I meant "TCP packet" - I'm guessing that your packets are TDS-level. PostgreSQL has no concept of a packet in the protocol. It does have messages (and each row is represented by a single message), but there can be multiple messages in a TCP packet, just like a single message can be made up of multiple packets.

I think it was done this way to even in non-sequential mode you never have to go backwards because there's no guarentee that it won't require a previous packet. Even if your row spans 4 packets and you're on the last one you can still get to the decoded form of the data for previous columns that were in previous packets via the SqlBuffer. Arguably you shouldn't do that but people do strange things and it will work.

I think I understand what you're saying, but I'm not sure why one would need to go back to the decoded form of the data for previous columns (regardless of any TCP packet involvement).

Again on the Npgsql side, in non-sequential mode the entire row (represented by a single protocol message) is buffered in a single, contiguous memory buffer. As the user requests columns, we jump around that memory buffer and decode whatever is needed from the right area - nothing is cached or saved on the way.

so multiple small rows in a single packet is common and will be the case for this benchmark which is why the sync fetches work better than async ones.

I don't really understand why this would be the case... Assuming the async methods are optimized for returning synchronously (i.e. because the required data is already buffered in memory), there should be very little difference between the sync and async APIs (it's true that there is some difference, but it really should be quite negligible).

@Wraith2
Copy link
Author

Wraith2 commented Apr 17, 2019

I think I understand what you're saying, but I'm not sure why one would need to go back to the decoded form of the data for previous columns (regardless of any TCP packet involvement).

Everything is decoded as it is encountered so you don't really have a choice but to keep the decoded form. It makes sense in the context of the underlying machinery. I'm not sure if i'd choose to do it differently or that there aren't better ways to do it. It's just what I see in the code.

Even if an async task returns synchronously you have the overhead of a task wrapping the value. Those are non-reusable ref types and as you say they should be negligible but they add up, a few million small objects is still a lot of garbage.

@roji
Copy link
Member

roji commented Apr 17, 2019

Everything is decoded as it is encountered so you don't really have a choice but to keep the decoded form. It makes sense in the context of the underlying machinery. I'm not sure if i'd choose to do it differently or that there aren't better ways to do it. It's just what I see in the code.

I'm guessing (or hoping..) you're referring to the current architecture of SqlClient, as opposed to any hard constraint of the protocol... In any case, as long as those SqlBuffer instances are pooled I guess it's not so bad...

Even if an async task returns synchronously you have the overhead of a task wrapping the value. Those are non-reusable ref types and as you say they should be negligible but they add up, a few million small objects is still a lot of garbage.

At least internally in the provider, the use of ValueTask can mitigate that. For the surface API (DbDataReader.GetFieldValueAsync<T>()), we have https://github.com/dotnet/corefx/issues/27682 tracking adding a ValueTask-returning version. But until then you're right that users are better off avoiding the async APIs unless they know they're going to perform I/O.

@Wraith2
Copy link
Author

Wraith2 commented Apr 17, 2019

In any case, as long as those SqlBuffer instances are pooled I guess it's not so bad..

They aren't, that's something on my list to investigate but pools are complex and I've already broken the library with one so I'll take that slow. 😁

@roji
Copy link
Member

roji commented Apr 17, 2019

Sounds very wise :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants