-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API for writing parameters without boxing #17446
Comments
Isn't the boxing overhead next to nothing compared to the fixed cost of making a SQL call? I cannot imagine this being an issue even for 1000 parameters. |
@GSPP I'm definitely not talking about the overhead of allocating the memory and copying the value - i.e. the cost of the boxing operation itself. The problem is that boxing allocates an object on the heap, producing potentially large amounts of garbage. This garbage creates pressure on the GC, which can be a problem for some applications. Basically it's a different kind of overhead compared to making an SQL call. |
@roji (or anyone) can you provide more details what is your plan here? |
On the read side of things, there's Unfortunately nothing like this exists on the write side - DbParameter has a object It would also be necessary to add Let me know if this makes sense or if you'd like more info. |
@saurabh500 @YoungGah is it sufficient info for you? Or do you need more details? |
@saurabh500 @YoungGah thoughts? |
Can anybody take a look at this? It would be good to know if you guys see this somewhere on your roadmap etc. |
@saurabh500 @divega @corivera any opinion here? Can we at least set expectations / timeline when we will have time to look at it? Thanks! |
@karelz @danmosemsft @saurabh500 @corivera I think we should remove the "needs-more-info" label and add "up-for-grabs". This sounds like a good idea to at least explore. @roji it would be great if you could do some prototyping of this in Npgsql if you haven't already. I suspect it should be possible to do enough to asses the API and make some measurements of the impact without making the actual changes on I am not sure about the cc @ajcvickers |
@divega sounds good to me - feel free to make such labels changes yourself, as area expert/owner :) When you mark things "up for grabs", just please try to describe what is needed (next steps) & rough complexity / time investment - see triage rules for details. Thanks! |
Marking as "up for grabs". In order to make progress on this issue we need to do some exploration to understand both the magnitude of the performance impact (e.g. how many allocations we can actually avoid and how that benefits performance) and how to best extend the API. See https://github.com/dotnet/corefx/issues/8955#issuecomment-260108275 and https://github.com/dotnet/corefx/issues/8955#issuecomment-313905218. |
FYI I'm working on implementing this within Npgsql, I'll be coming back with some info pretty soon. |
OK, I've done this in Npgsql (npgsql/npgsql#1639). The question is now how to best add this to ADO.NET as a whole, to allow this to be used in a database-independent way. General Benefits
Adding to ADO.NETThis would consist of adding either a new In addition, Base classes vs. interfacesAs @divega mentioned above, ADO.NET APIs are based on base classes rather than interfaces. I worked in both directions for a while to explore what the API would look like, here are some points: Via base class (
|
I saw this mentioned in the issue review earlier and thought it might be useful to provide some feedback since it isn't dead. I did some investigation on doing this with SqlClient because I wanted to try and remove the parameter box. It's messy and I'm not sure how you would achieve it without having the parameter instance write the data into the tds buffer.. At the moment the parameter is asked for it's value object (including any coercion) and that object is then written by the TdsParserStateObject which understands the layout of all the relevant types. Doing this without the object variable means you've either got to have the correct storage location generated at runtime by non-generic code or you delegate the writing of bytes to the generic object which can avoid the variable. Asking the parameter to do it exposes the internals of the tds layer to the user layer or will force multiple allocation and copying between buffers, neither of which is a great idea. It's worth doing but it seems difficult to implement practically at the moment. I also think it will flow new members into System.Data which may cause compatibility problems without care. |
@Wraith2 you're right that this isn't necessarily a trivial thing to do inside an ADO.NET provider, and can mean serious refactoring to actually avoid refactoring. The main idea here is to at least allow providers to do this API-wise - if they do it or not is a different question. The default implementation for this new generic API would in any case delegate to the existing non-generic API, in order to prevent breaking changes, so existing providers would simply continue to work. I don't know anything about the SqlClient internal implementation... Full generic parameter handling indeed means that writing has to be generic "all the way down", without passing through a non-generic layer (such as TdsParserStateObject?) which switches on the specific type etc. Definitely not trivial. |
Possibly, at the very least it means that the thing writing the value had to be generic though the caller may not need to be aware of the exact type. It's well worth doing but it will be a big very complicated job.
It's quite fun in there. Lots of history to learn 😁 |
Note: one non-trivial issue to be resolved is how nulls are written - with the current non-generic DbParameter, nulls are represented via DBNull, but that's not possible with a generic Unlike non-generic DbParameter, we could accept null values, but that would only work for reference types. We could introduce another property on |
[edit] stuff to do with readers which wasn't relevant to parameters (https://github.com/dotnet/corefx/issues/27682#issuecomment-526928918). |
@Wraith2 these is a very important thing that I really hope we get to improve for 5.0, but you're discussing nullability when reading values from a reader, whereas this issue is about the introduction of a generic parameter API which would avoid boxing when sending values to the database. The issue of GetFieldValue nullability has been discussed in https://github.com/dotnet/corefx/issues/27682#issuecomment-436736787 - for now that issue seems like a good place to continue that conversation. Do you mind moving this comment over there? |
You're right. They're sort of linked in my thinking since they're about carrying two pieces for information, is it null and what is the value which is easy as a tuple on output but better modelled as two properties on input. Much better to separate the information out to |
They definitely are, and I can imagine a world where the same solution holds for both (i.e. dotnet/csharplang#2194), but for now that doesn't seem like it would happen... PS have edited your comment above to link to the other issue. |
I'm not sure whether equating language null to DBNull is correct. I can't find a scenario where you need the distinction but the my instinct is to keep them separate. There's also the possibility of using a separate type, so having |
I'm not sure I see why... C# null maps to other nulls in other contexts (e.g. JSON serialization?), and apart from the problematic intersection of generics, value types and null, I think it would work quite well. If you have any concrete argument here I'd love to hear it. Somewhat ironically, with the current non-generic DbParameter, mapping language null to database null works even better - since the parameter simply holds a object, it's always possible to simply set it to null. I'd be curious to learn why historically DBNull was chosen to express null instead of language null; it's possibly useful in that it distinguishes an uninitialized parameter (which contains language null) from a parameter set to null (DBNull), but I'm not sure of the actual importance of that (again, the C# language doesn't have that distinction and things seem to work out fine).
Wouldn't you have to solve the same question with DbNullableParameter, i.e. how to represent null when T is a value type? Having separate nullable and non-nullable parameter types would then be orthogonal to that question. In any case, is there any reason we need a non-nullable parameter type and a nullable parameter type? ADO.NET currently has a single nullable parameter type (DbParameter, where null is expressed via DBNull) and it seems to be working well. PS DbParameter actually has an IsNullable property, which may resemble your distinction. I'm not sure what it's actually used for though, it possibly only plays a role when using the command builder. |
Not really no. If IsDBNull the value is undefined so in storage terms it'd be default(T) or the last value, doesn't matter since attempting to read it would be an error. For some reason I see JSON null as a language null and think that the javascript null and c# null are compatible. I feel that DBNull is a data null which is distinct from a language null. As you pointed out you can use the difference to explicitly see the difference between not setting a value and setting it to be null. Not exactly convincing I agree but it comes from my quite direct dealings with data, I don't use orms. |
I have found a case where I consider I have never seen a situation where the existence of |
@GSPP thanks for linking to that question. FWIW I agree with you and Marc's response, and also think that DBNull was a mistake. But it actually doesn't matter that much, since once we move to generic type parameters DBNull simply becomes impossible anyway, so a different way to represent null on parameters must be found in any case. The same may be true also of a new API which would be an alternative to DbDataReader.GetFieldValue, if we choose to go down that path. |
That is a useful SO answer. We shouldn't add new uses of DBNull, i think we all agree on that. public class DbParameter<T>
{
public bool IsAssigned { get;set; }
public bool IsNullable { get;set; }
public bool IsNull { get; set { if (value && !IsNullable) throw new InvalidOperationException() } )
public T Value {
get { if (IsNull) throw new InvalidOperationsException() }
set {
IsAssigned=true;
if (typeof(T).IsClass && value is null) // the only point of contention?
{
IsNull=true;
}
_value = value;
}
}
} |
(just to set expectations - I personally am not going to start working on this right away, although I definitely intend to do this for 5.0) |
Regarding the discussion around the purpose of DbNull, see also this SO example As someone who has written an abstraction layer over ADO.Net providers (and now looking to extend it to Postgres via the npgsql library) If you want to tell Sql Server to set a column to an explicit So indeed if an abstraction layer wants to take advantage of the |
Thanks for that info @mldisibio. |
Why doesn't IDbParameter offer SetString(), SetBoolean(), SetInt32(), ...etc implementations? Seems like a very easy-to-understand solution that would be backwards compatible with existing ado.net architecture. |
@pha3z one big issue with that pattern, is that it creates a closed set of types that can be used with DbParameters - but different databases have very different supported types (e.g. PostgreSQL has a type for IP addresses). This is one reason why DbDataReader has a generic GetFieldValue which can return any type. A good solution for writing would do the same here. |
Yes but the initial reasoning stated for DbParameterT was to avoid boxing. To my knowledge, boxing is only relevant on value types. PostgresSQL defines an IP Address type, sure, but is that relevant? The dotnet type is what we're talking about passing into the parameter. What Npgsql does to translate the Dotnet type into the DbType is under-the-hood. Are there any database providers that support directly assigning dotnet value types other than the standard primitive types? I'm not saying that a generic is something to be opposed. Clearly its the right solution. But this issue has been backlogged for years. If the reason its backlogged is due to high refactoring requirements to support generics, then it seems like it would make more sense to add strongly typed primitives to cover 99% of use cases. Please correct me if I am somehow completely off about the boxing issue. |
Yes. For example FirebirdClient. |
I've looked at it several times from the SqlClient perspective and the problem isn't simply adding a generic parameter it's all the internal logic that relies on the object type for coercion between the source type and destination type. |
You're right the boxing is relevant on value types and that IPAddress is a value type, but we shouldn't make the assumption that there won't be a value type supported by some ADO.NET provider out there. Some good examples are the Npgsql support for NodaTime types (instead of the built-in BCL DateTime/TimeSpan), which are structs.
It's true that this has been backlogged for quite a while, but I do think we should do it right when we do it - I'm still hoping to get around to it for .NET 6. Default interface methods should actually simplify the design (see #17446 (comment)). |
Ok all that makes sense. Thank you for the consideration! 👍 |
Another reason for doing this is for the user to be able to provide a requested CLR type for an output parameter; see dotnet/SqlClient#2092 for more details. |
Regarding the read side,
Do you have examples of |
Yes, take a look at Npgsql. |
@DmitryMak When running on CLR, patterns you've shown are recognized by jit compiler and boxing is automatically removed. It will still box on other runtimes like mono for example. If you've hit a pattern like that (generic method with typeof checks then box-unbox sequence) that still boxes then you should file an issue I guess. |
if (typeof(T) == typeof(int))
return (T) (object) GetInt32(ordinal); note that this is a special pattern that the jit can recognise and optimize away. In the case where T is int casts would resolve to |
In the current ADO.NET API, writing a parameter to the database involves passing it through an object. This implies a boxing operation, which can create lots of garbage in a scenario where lots of value types (e.g. ints) are written to the database.
A generic subclass of DbParameter could solve this, if properly implemented by providers.
The text was updated successfully, but these errors were encountered: