-
Notifications
You must be signed in to change notification settings - Fork 24
SqlServer.Core: Performance-oriented SQL Server .NET driver #6
Comments
@ErikEJ @Wraith2 @NickCraver @mgravell @Drawaes: Let's get this party started! @smitpatel, @roji, and I discussed how to get started with Woodstar, and we have created some initial issues for the ideas we came up with. We want this to be collaborative, so please add your ideas, either by commenting on issues or in new issues as appropriate. Initial ideas:
/cc @davidfowl |
Would it be useful to list current pain points, especially w.r.t. performance in the current driver structure? Discussing connection pooling and what we are/aren't able to see or get metrics on today due to what's exposed and what's internal is something that doesn't hurt so much per connection but is a pain at scale in various ways. Today, we're wrapping connections to make best-guess workarounds for how pooling is behaving, which commands are in flight when a stall happens, etc. I know out goal is performance, but I think it's worth considering the driver/implementation surface area along these lines as we go, and maybe current pain points is the good thing to have as a starting point on that front? |
Plus a whole load of other random bits and pieces that when/if we encounter the same problem in a new implementation I will hopefully spot and warn about so we can avoid the same implementation choice. In general it's about what you expect from a 20 year old codebase at this point. Decisions which were correct at the time are now outdated but the weight of the codebase is so high that changing direction carries unacceptably high risk. Writing from a high level async-first perspective will get us an easier codebase to work with. |
Should async-only be a design goal here, like HttpClient in .NET Core before...the happening? |
I'd prefer to support as much of the existing external surface area as possible because it will enable widest adoption if we get to production stages. I'd say async first but not async only if that's possible, I think it is possible because postgres manages it afaik. |
I'd hope we'd build the core engine with a different API shape than what ADO.NET wants. We can write a slow adapter for that layer but the new APIs shouldn't be bound by it. Also take a look at this issue dotnet/runtime#28882 as a way to avoid boxing for DbParameters. |
The problem with using another shape is knowing what it is, there can be no consumers until we write one so we don't know what shape is required without a specific and well defined set of consumer requirements, who's the first consumer? |
Well you have SO people commenting right here. The point I thought is to push the boundaries of performance and not support the widest number of use cases. As David has said slower compatibility could be built on top. In reality though in the case of sync you are just hiding what is really going on as all db operations are async in nature if there is network or any disk access involved. |
Ok, pure perf it is and then we'll see. That'll allow ignoring all the existing public surface area defined in terms of Task and using ValueTask throughout which will be lovely. You say that all db operations are async. All IO operations are async and if you meant that then I agree. However because TDS is packetized there can be multiple columns and even rows in a single packet so a single data fetch may not be async it could just be fetched from memory. At the moment in SqlClient you either have to allocate a task every time or you have to use sync and live with the thread blocking sometimes. We may also want to think about being clever with pre-dispatching packet requests. Currently you run through the packet until you run out of data and then request more and wait for the response. If you can guess based on reading, a string for example, that you won't have enough data you could pre dispatch the next packet request and then it might be ready when you need it. A fundamental limitation of any library like this is network latency and we might be able to hide some of that. |
My vote here would be to only do a non-ADO.NET provider if figures clearly show ADO.NET is a serious blocker for performance. Note that while doing optimization work on EF Core, I've recently seen that saving an allocation here and there has less of an impact on end RPS perf than expected (by me at least), and in at least some cases ADO.NET can be improved upon too (e.g. dotnet/runtime#17446 for parameter boxing). In other words, if some aspects of ADO.NET do turn out to block perf, I'd rather this effort helped us identify and fix them, rather than doing something completely different.
This is a good example. If the Task allocation from DbDataReader.GetFieldValue turns out to be significant (and let's see that it's the case, rather than assuming it is), we could simply add a new ValueTask-returning API alongside the existing one. We could then promote that into ADO.NET. Regardless of the above, I think an async-only driver definitely makes sense, at least as at a first phase (but maybe in general). Needing to support sync already creates various problems in today's world (e.g. no Pipelines support, at least last time I checked). |
Well, I think so. This is from the TE Fortunes benchmark test project that I've been using for perf improvements over the last couple of years. And since you're a bunch of people who'll know what you're looking at here's the dotTrace file for that so you can explore. The thing we actually want is the Fortune object and the strings it contains, look how far down the list it is. I'm pretty sure that the entire query result is inside a single packet response so it's all in memory at the point when fields are being fetched. Allocating a Task for every int32 read from an in memory packet isn't the worst things happening but because of the public surface area even if I managed to make the internals ValueTask based I'd still have the allocation at the user surface. I've considered proposing Anyway. You raise another good point @roji. What do we mean by high performance? The reason I started on this was that I wrote a really nicely performing server app and then the perf got destroyed by simply trying to save data to the database. I'd like to be able to plug in the database layer without it being the worst performing thing in the application. In my experience so far query speed is dominated by network latency and we can't really do a lot about that. This is why memory makes a very small difference quite a lot of the time. What we miss in this scenario is that there will be higher layers doing other work like layout etc. I think we're looking to get a driver which is as cpu and memory clean in as many aspects as possible so that many machine resources can be used simultaneously, so don't block threads don't waste memory causing time to be lost to GC. Get work done and get out of the way so higher layers can get higher throughput per unit hardware per time. |
We're getting down into the little details, but... I can't see any reason for GetFieldValueAsync to be used in TE Fortunes (or generally a big majority of ADO.NET usage). Unless one is reading very big rows and using CommandBehavior.SequentialAccess, the row is pretty much guaranteed to be buffered in memory; after all, without SequentialAccess one can read columns in any order. In that scenario, using GetFieldValueAsync is pretty much useless, and GetFieldValue should be used instead. But if we want to add another API just to remove the Task allocation for when huge rows and SequentialAccess are being used, then it really isn't that complicated - mostly a matter of finding the right name. I think it's a pretty low-priority scenario (if you're reading huge rows, that extra allocation isn't likely to matter), but it's trivial to push through. Re the rest... I do hope Npgsql shows that very good perf is possible while sticking to ADO.NET.
I'd want to see some backing to that assumption. Of course, I don't mean egregious useless allocations inside the driver (which is what SqlClient does in some cases) - that should simply be avoided or cleaned up. But again, my recent experience with EF Core shows that hunting down every single reasonable allocation has quickly diminishing returns, and therefore probably doesn't justify dumping ADO.NET. To summarize, .NET does have a standard database API. While it's not the best it could be, there's an entire ecosystem around it, and very good performance has been achieved without dumping it. Could a bit more perf be squeezed by saving a couple allocations that ADO.NET forces us to do? Maybe. Does that justify doing something completely non-standard, where maybe ADO.NET itself could be fixed? I think that remains to be shown. |
I don't understand the point of this repo if you just want to improve the sql client under ado.net? You might as well work on fixing that under the hood. SslStream was a similar age, very risky, a mix of sync and async and promise based code and we fixed it from the inside. There is no point just making another ado.net sql lib with all those restrictions. |
I'm on @Wraith2 's side, that profile shows so much waste, like nuke all of those allocations😄 |
In MySqlConnector, I've considered providing a "direct" API that exposes the raw MySQL protocol through a .NET API to see how much overhead ADO.NET contributes. However, it always seemed extremely unlikely that it would be used anywhere except the TE benchmarks due to the established ADO.NET ecosystem. I have no objections to this project pursuing a focused SQL Server solution to see what the maximum possible performance is, but would personally appreciate the effort going into directions that could benefit all ADO.NET providers/users. But please just take this as my 2c, and not a strong attempt to change the project direction. Some ADO.NET pain points I'd like to see fixed:
OTOH, if a lot of these (and similar) suggestions get implemented, maybe it's "not ADO.NET" anymore, and trying to preserve ADO.NET was a false goal? 😀 |
This is my 2c as well. But I see it as an opportunity to write a new fresh DB surface area that while focused on Sql Server initially there is no reason there couldn't be a MySql etc version. Connection pooling is a very good example that I think shoe horning into existing ado.net would extra pain. I would love to see a break from tradition of the example of pipelines vs streams. Initially built on their own and eventually bridging code to help you use pipelines with stream apis. But for pure perf you want pipelines all the way through. |
To be clear, I'm not the boss here or anything - am just stating my opinions!
There are various reasons why this isn't feasible - lots of backwards-compatibility, legacy and stability constraints simply don't make this possible. As a very simple example, SqlClient needs to continue supporting .NET Framework.
So there really are two efforts being discussed here:
I see these as two very separate and orthogonal efforts, each being quite huge. For example, discussing a first-class, cross-database pooling component as @bgrainger mentioned (see dotnet/runtime#24856) is just on its own a potentially huge debate. I believe having focus on one of the above two is important (trying to tackle too frequently backfires), but again, that's just my opinion. I'm guess I'm also a bit confused on why we're proposing to do a new database API. Are we convinced ADO.NET is a significant blocker to database driver perf, and if so, that we can't fix those problems incrementally? Do we have any data to support that? Or do we just hate the API and want a modern, new API (also legitimate)? Or do we just want to play around without any constraints (also legitimate)? The important point is that any new DB API breaks the ecosystem in both directions - DB drivers would need to be rewritten to support it (Npgsql, MySqlConnector, Oracle), and upper layers would have to do the work to support it as well (EF Core, Dapper, LLBLGen Pro...). The effort would be immense - and I'm not clear on exactly why we'd be proposing that. @bgrainger FWIW I agree with a lot of your points above, though some things can be handled without API changes. For example, Npgsql does avoid parsing the connection string more than once; DbCommand instances can be recycled by their owning DbConnection, and similarly DbDataReader can be recycled by their owning DbCommand (Npgsql does these things); we can possibly do something similar |
But to reiterate - I'm just a guy expressing my opinions. If we end up with a non-ADO.NET driver, and an ADO.NET shim over that, maybe that's fine too. |
FWIW I'm pretty sure 90+% of these allocations are the SqlClient implementation, rather than anything necessarily having to do with ADO.NET. |
FWIW, the resurrection of this thread a few days ago is part of what prompted me to get started on DapperAOT. Pretty sure I'll have code working in the flavor of Dapper (although using the new DapperAOT) in lock-step with any-and-all progress here. |
The current SqlClient codebase is too dangerous to use a "move fast and break things" approach. The number of users and the fact that we want people to move from System to Microsoft versions requires a slow and measured approach. The unfortunate side effect of that is that the speed of change is slow. I've been tinkering for two years and that trace is still abysmal compared to modern code. It is also important to note that the codebase is difficult to work with. The existing SqlClient library will not go away. It will not change with sufficient speed to improve performance for the majority of users. I'm fairly sure that the performance of SqlClient lags behind that of other database providers, even if it doesn't I'm sure there can be a better performing implementation. In writing an SqlCore library we can:
I think all of these things have value. Things that aren't current goals as I see it:
We do not have to choose to ignore the existing ADO api surface because we can probably provide most of it built on a new implementation. This doesn't have to be an explicit goal but it may be something to be aware of as a long term nice-to-have so we can flag up changes which would prevent it being possible. The existing ADO surface has problems such as I'm happy with working towards specific use cases from StackOverflow if they can be provided. It's a point to work towards and learn about the problem space. Once we have something that is of worth or we collectively decide that the goal is impossible or improbable we can re-assess. I don't see any problem with the general pattern of access which ado provides. As a sketch public IAsyncEnumerable<Fortune> GetFortunes()
{
await using (var connection = GetSqlCoreConnection())
{
using (var command = GetSqlCoreCommand(connection, command properties ?))
{
using (var reader = await command.GetReader(reader properties))
{
while (await reader.GetNextRow())
{
yield return new Fortune { Id = await reader.GetValue<int>(0), Message = await reader.GetValue<string>(1) };
}
}
}
}
} but that doesn't mean we have to start off with implementing all the interfaces that are in the System.Data namespace to do it. |
(1) I wonder how it would be possible to quantify the potential throughput gains. Internal driver overheads in SqlClient might stem from design choices, and those overheads might be smeared out across the code base. The profiler call tree might not highlight any particular hot point, yet an entirely different library design could potentially deliver big gains. An example of this is JSON serialization. If you were to profile Newtonsoft.Json, you'd probably find it rather optimized. Yet, the new BCL JSON serializer achieves a 2x gain on Newtonsoft with a fresh design. The (2) Even a highly optimized async codepath has considerable CPU overhead. The async machinery can be quite expensive. In my estimation, it will be necessary to provide a truly synchronous code path. This code path will be the one providing the lowest CPU usage for API users that want to optimize for that. With database access, the degree of parallelism usually is low because the database cannot take more than a handful of parallel queries or else it will become overloaded. For that reason, the thread saving benefits of async are less relevant in common scenarios. It seems that the main point of this new library would be to lower CPU usage which makes a synchronous codepath quite important. To make an example: If your database server has 8 cores (and disregarding disks for simplicity) it cannot help throughput to make more than 8 parallel queries. Having a thread pool of 8 threads with synchronous IO is a perfectly adequate solution in this scenario, and it will be the fastest solution. (I'm simplifying a lot here to make the general point.) The BCL team recently found the need to add synchronous APIs to (3) There's probably value in exposing a zero-overhead TDS connection class that has no additional layers. No pooling, no retry, no threading, not even a connection string. It's usage would basically be:
This API would provide raw access to some abstraction of the protocol. I'm sure there are some attractive features in there that are not widely known. For example, the recently exposed |
One approach we've been thinking about is to get an upper bound for performance by doing an absolute-minimum implementation of a minimal "driver" (#12) - send a minimal query and provide some sort of bare API for getting the results. See here for an Npgsql experiment; we've got a similar prototype in the works for SQL Server but nothing public yet.
This isn't really true... when the database sends back a resultset, it typically first sends some sort of metadata on the resultsets, which allows the driver to parse them (e.g. which column has which type); the driver can use that information to pregenerate any sort of code that will make subsequent row/column accesses faster. In fact, when a statement is prepared, this can even happen once for multiple query executions. This seems quite orthogonal to deserializing user POCOs vs. providing a DbDataReader-like interface to rows and columns.
Here are some problems I see with this:
|
The SqlClient cpu call graphs are a bit deeper than I'd like but in general cpu isn't a major problem (outside some known async problems). We spend most of the time waiting for network io . If I see ways to conserve cpu I implement them but in general avoiding allocations to reduce time spent outside user code yields far better returns at the moment. Fundementally, reading a forward only stream of self describing data isn't a hard cpu problem. Anecdotally; calculating connection pool keys is one of the most cpu expensive things we do because of some pinvokes. Providing a forward only reader implementation that requires the user to operate as if it were in sequential mode would significantly reduce internal complexity as would generics in some places.
It's also a tradeoff of cpu against time where cpu is a resource you can get more of relatively easily but time is linear and constrained, We usually judge that it is better to do as much work in a given allocation of time as is possible. |
If you happen to be more familiar with MySQL, I have a small prototype of what a "direct" .NET connection to MySQL would look like here: https://github.com/bgrainger/FrameworkBenchmarks/tree/mysql-direct/frameworks/CSharp/aspnetcore/Benchmarks/MySqlDirect; it could possibly be used as a base for experiments. There may also be useful code in https://github.com/neuecc/MySqlSharp. |
Interesting...! Shameless hijacking: what are these pinvokes about? I have an ADO.NET thing I'd like to push in .NET 7.0 which may help a lot here... Will keep you posted. |
They're the pinvokes behind Environment.UserName and Environment.DomainName on windows. It is possible to get both bits of information at once from one call but the BCL doesn't expose a way to do it so we call the same api twice and wastes buffers and cycles. It also always causes 3 string allocations to get a single string back. It's on my private trello list of "things to investigate" but trying to argue that we should do direct pinvokes into windows security related apis is going to get some pushback and it isn't worth taking my time trying to fight it at the moment. |
@Wraith2 ok, thanks - do these at least only get called when the Username isn't present in the connection string? |
It's not the database username it's the calling user. It depends on the SNI strategy. In native we use the windows identity SID in managed we use current username and password but we can't cache because of impersonation so we must always create and compare so even if we can find we've got the same user as last time we have to invoke and allocate the strings. |
OK, thanks for the details @Wraith2. I plan to work on a proposal to basically allow users to directly reference a "DbDataSource" (naming temporary!), which typically would represent a pool; this would allow applications to resolve the pool from the connection string exactly once at startup, and then request connections from that. However, if you really want to check the system username on every open (because of impersonation), this may not help so much. |
Can you first check whether the thread is impersonating, and if it is not, then use the previously allocated strings (or no strings at all)? Something with GetCurrentThreadToken and GetTokenInformation, perhaps. Whether that would help depends on how common impersonation is, I guess. And this would still involve "direct pinvokes into windows security related apis". |
Impersonation and Mars are things that could be left out of the Core implementation quite easily. They'll still have to be supported in SqlClient but if you're looking for a fast minimal sql implementation I don't think you want them. |
We know that a single row is nearly always small enough to hold in memory and that the user often knows their rows are small. I would normally rather not have await reader.GetNextRow() complete until all fields are in memory ready to read. This can then totally advoid the overhead of having GetValue() returning a task that then needs checking to see of it is completed. (Think of what these unnecessary conditions checks do to the CPU instruction pipeline) |
How do we know this? I know that mine often are but working with the sqlclient driver has shown me bug replications where people do things I would never consider.
Yes. However through maintenance this has a habit of changing. If a small row is 3 items and the driver optimizes for that what happens when 2 years later someone adds 3 more fields. Do we see a slowdown? Is that consequence going to be obvious from the code?
I'm almost certain we would use ValueTask as the return type. If for some reaon we could not then a custom poolable awaitable would likely be used. Any non-null valued data type including a single byte in the tds stream can span multiple packets (because there is always a header so all records are multibyte). That means that every read can cause a network delay to be observable, the faster you can consume the data the more likely you are to hit an io wait. This needs to be a task like return type. |
I agree that the majority use case is small rows which can be easily buffered in memory, and that this scenario should be optimized for (of course while keeping good support for large rows). This is also the reasoning for ADO.NET's CommandBehavior.SequentialAccess, which is an opt-in behavior for large rows.
I'm a bit confused... If GetNextRow (ADO.NET's DbDataReader.Read) doesn't wait to buffer all columns, then the subsequent GetValue calls (ADO.NET's DbDataReader.GetFieldValue) do potentially have to do I/O (since the row hasn't been completely read and buffered yet). So either you (possibly) wait longer up-front in GetNextRow - and then yuo're guaranteed that subsequent GetValue calls just accessed buffered results - or GetNextRow returns early without buffering, and then subsequent GetValue calls may need to wait. Stepping back... I think the low-level approach here would be to not do (implicit) row buffering. Crucially, this means it's the user's responsibility to read column values in the order in which they get delivered from the database; you can't read value 3 and then read value 1 (since we're just streaming columns). You also can't read the same value twice. This corresponds pretty much exactly to how CommandBehavior.SequentialAccess works today in ADO.NET. Row buffering can be considered a "convenience" feature layered on top of this, allowing users to seek back in the row and read the same value multiple times. I agree it's still not necessary to pre-buffer the entire row in GetNextRow: I/O can simply be performed later - as needed - from GetValue, which would return ValueTask (so calls to values which happen to be already buffered are very cheap).
I do think that programmers can be expected to know the general shape of their data (does a certain table contain huge rows or not), and that it's reasonable to expect stability here. If you don't want to make any assumptions (e.g. a dynamic data scenario), you can always use "sequential access" and stream columns - I don't think there's any downside to doing so, aside from a more difficult API to work with. Otherwise, if you use an API which does buffering (the ADO.NET default), and you encounter a huge row, your memory requirements will increase accordingly. |
I am thinking of a mode of operation when the user states upfront the rows are small and an exception is throw if the rows are large. "Small row" being defined as under the size of a network packet so it is known that a row will never need more then two networks pockets being buffered in a "zero copy" implementation. ======= |
Should an optional callback based interface be provided so the user implementation OnNextRow() and OnNextStringValue(id, span) etc? This would make it cheaper to read into an array(s) of structs for code that needs fast bulk loading. At present it can be faster to get SQLServer to write to a file and then read the file from .net, this should never be the case with a database access API. |
I think it's a good idea to stop and think exactly what kind of special optimizations we're thinking about for small rows vs. large rows. I think that the existing ADO.NET model isn't too bad here: an opt-in mode (SequentialAccess) for streaming the row and a default mode for, basically, buffering it. I don't know what "zero copy" means in this context: the driver needs to read raw bytes in any case, and decode them to a user-readable value (int, string); there shouldn't be any extra copying beyond that going on regardless of whether the row is small or not. The only impact that should have is whether we keep earlier columns in memory so the user can access them after later ones.
That sounds like a very allocate-y API to get the row.
Not sure exactly why this is inherently faster than an API where application code calls ReadRow (it's basically just push vs. pull). |
It advoids C# having to feed as many returned tasks into the generated state machine. |
The number of row values in a single packet depends on the data values, not even the types. The ability to identify if you had a specific set of data which did not span a packet is prohibitively difficult to calculate. You can also change the packet size in the connection string. I agree with @roji that the current high level way of accessing the data is reasonable and covers most scenarios that we need. All other methods considered have clear downsides like requiring sequential access at all times of large upfront investment of developer time to ensure they avoid packet boundaries which should be handled by the driver transparently. |
@Wraith2 at least from my perspective, the idea here wouldn't be to automatically calculate anything, but rather for the developer to use one API or another based on whether they want to stream the row or not, based on their knowledge of their data domain. In other words, pretty much what CommandBehavior.SequentialAccess already means in ADO.NET. I definitely do think we need to provide a row sequential/streaming mode, which avoids having to buffer entire rows in memory; otherwise huge rows must require huge amounts of memory, which isn't feasible. This is already supposed to be done in all major ADO.NET drivers (via CommandBehavior.SequentialAccess).
I'm not sure we should shape the entire row-reading API around saving on the async state machine box for when ReadRow needs to complete asynchronously (note that at least in reasonable scenarios, many/most invocations of ReadRow return synchronously since (small) rows are already buffered in memory). |
See #22 for a summary of the Woodstar experiment. |
An experiment in collaboration with the community to determine what potential there is modern .NET features in a highly performant SQL Server driver.
The text was updated successfully, but these errors were encountered: