Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GetValues() fails if any column contains an infinite timestamp #1324

Closed
realityexists opened this issue Oct 16, 2016 · 12 comments
Closed

GetValues() fails if any column contains an infinite timestamp #1324

realityexists opened this issue Oct 16, 2016 · 12 comments

Comments

@realityexists
Copy link

The following code

using (var command = new NpgsqlCommand("SELECT 'a string', 'infinity'::timestamp", connection))
using (var reader = command.ExecuteReader())
{
    reader.Read();
    var values = new object[reader.FieldCount];
    reader.GetValues(values);
}

currently throws an exception:

Unhandled Exception: System.InvalidCastException: Can't convert infinite timestamp values to DateTime
at Npgsql.NpgsqlDataReader.GetValue(Int32 ordinal)
at Npgsql.NpgsqlDataReader.GetValues(Object[] values)

I've read #557 and I understand that not converting infinity to DateTime is intentional, but in this case I didn't even ask for the infinity value - maybe I just want the string in the other column? I don't think it makes sense for GetValues to entirely fail just because one column contains a value that cannot be converted. At least you could populate that column with some special object to indicate that that value could not be read, preferably containing the error message.

Alternatively, GetValue() could return an NpgsqlDateTime for an infinite timestamp. Yes, it's inconsistent to do that when it usually returns a DateTime. If the user code doesn't handle it they'll probably get an InvalidCastException, but at least this way it can be handled, whereas currently the InvalidCastException is guaranteed if they do anything other than call GetProviderSpecificValue(). GetValue, GetValues, ExecuteScalar, this[] all fail.

@roji
Copy link
Member

roji commented Oct 16, 2016

I've read #557 and I understand that not converting infinity to DateTime is intentional, but in this case I didn't even ask for the infinity value - maybe I just want the string in the other column? I don't think it makes sense for GetValues to entirely fail just because one column contains a value that cannot be converted.

GetValues() is a method that requests all columns of the current row. It can either succeed or fail - it can't succeed on some columns but not on others... You have other APIs on NpgsqlDataReader which work on a column-by-column basis - you can call GetValue() on all columns and catch exceptions, etc. If you choose to use GetValues(), then either the whole operation succeeds or the whole operation fails.

At least you could populate that column with some special object to indicate that that value could not be read, preferably containing the error message.

What special object? Are you suggesting we invent some "error" object for the sole purpose of putting it in the array populated by GetValues() in case of error? That doesn't make much sense... In modern languages failure is indicated by an exception.

Alternatively, GetValue() could return an NpgsqlDateTime for an infinite timestamp. Yes, it's inconsistent to do that when it usually returns a DateTime.

As you say, this would be a very inconsistent API that would be very cumbersome to use. It would be impossible to assume anything about the return value without first checking what's returned.

If the user code doesn't handle it they'll probably get an InvalidCastException, but at least this way it can be handled, whereas currently the InvalidCastException is guaranteed if they do anything other than call GetProviderSpecificValue(). GetValue, GetValues, ExecuteScalar, this[] all fail.

I really don't see the problem here... For 95% of applications, CLR DateTime is a good default mapping - it is intuitive, covers most use cases (infinity and values outside of DateTime's range are probably rare), and is portable with regards to other databases.

If your application needs values not covered by DateTime (e.g. infinity), just use NpgsqlDateTime by calling GetProviderSpecificValue(), or even better, GetFieldValue. This is a much better option than having an inconsistent API that sometimes returns DateTime and sometimes NpgsqlDateTime. True, you can't use GetValues() to get all values at once, but you have all the alternatives you need.

On a side note, GetValues(), GetValue() and GetProviderSpecificValue() should be avoided in many cases anyway, since they box your values (as opposed to GetFieldValue).

Finally, you can also use the "Convert Infinity DateTime" connection string parameter to make Npgsql return infinity as DateTime.MaxValue. I think this is a pretty bad idea but it's available for backwards compatibility.

I'm going to close this as I don't see a problem, but we can continue the conversation if you like.

@roji roji closed this as completed Oct 16, 2016
@FransBouma
Copy link
Contributor

Hi Roji, a customer of ours ran into the same problem as this issue: http://www.llblgen.com/TinyForum/Messages.aspx?ThreadID=24814

We read all values for an entity using GetValues() as we do postprocessing on values and store all values in an array anyway so it's the right way to read the values for us. The exception approach is really unfortunate as it stops the user from dealing with the value properly: the exception is hard to process (which column failed?). We implemented a system to deal with this as ODP.NET for oracle has a similar problem where a large decimal value in their database doesn't fit in a System.Decimal value and it simply throws an overflow exception.

What I'd like to propose is that GetValues() returns the value in the Npgsql specific type. This doesn't give an exception and gives the possibility for the user to deal with it. While the entity / object itself likely doesn't have a property with that type (but e.g. typed as DateTime), through Type Converters (our variant of what's now also implemented in EF Core as value converters) this can then be converted from the Npgsql specific type to DateTime (with loss of data but c'est la vie) by the user if they want to.

Just a thought :) Cheers!

@Brar
Copy link
Member

Brar commented Jun 23, 2018

What I'd like to propose is that GetValues() returns the value in the Npgsql specific type.

I certainly can relate to this argument.

IMO it falls in the same category we discussed around GetValue():

"If the user doesn't request a certain type we promise to return an object that provides the best compatibility to the type returned by the backend."

Throwing an exception for a perfectly valid backend type just because we decided to map it to an incompatible .Net type (no matter if the user really cares) and forcing the consequences upon the user doesn't seem very friendly to me.

In many cases it's a simple cast to the corresponding standard .Net type if that's what you want.

@roji
Copy link
Member

roji commented Jun 23, 2018

First to answer @Brar, regarding GetValue()...

I'll start by saying that returning .NET decimal rather than a provider-specific type is the SQL Client behavior (which we generally try to follow). SqlDataReader has a special GetSqlDecimal() accessor to get an SqlDecimal, but it's by no means the default type returned.

Throwing an exception for a perfectly valid backend type just because we decided to map it to an incompatible .Net type (no matter if the user really cares) and forcing the consequences upon the user doesn't seem very friendly to me.

It's significant that we're not mapping it to some arbitrary .NET type out there - we're mapping to the standard, built-in way to represent an arbitrary-precision number (i.e. decimal). I'll venture to say that for 99% of applications, built-in decimal is good enough. So there really are two options here:

  1. Return what users actually need/want for 99% of the cases, and provide a special way for the remaining 1% to get the extended type (which is also less performant, see below).
  2. Saddle 99% of users with a special, Npgsql- (or Sql-)specific type which basically nobody wants, just so that GetValue() never throws an exception.

At least to me it looks like a pretty simple choice :) There's very little actual downsides to having GetValue() throw for the extremely rare edge cases, aside from the relaxation of a theoretical principle that "GetValue() should never throw" (which isn't actually documented or claimed anywhere).

Finally, any change here would obviously a massive breaking change as all current code which expects decimal, DateTime and friends breaks.

In many cases it's a simple cast to the corresponding standard .Net type if that's what you want.

Maybe in many, but not all. More importantly, there's a reason that built-in .NET types such as decimal and DateTime don't always cover the entire range of possibilities - performance. As a general rule, the provider-specific type are significantly slower than their built-in .NET counterparts (the increased range/precision don't come for free), so reading the provider-specific type and casting to the standard .NET type would incur a penalty.

Now regarding @FransBouma's original question on GetValues()... It seems that whatever decision is made for GetValue() should hold for GetValues() at all - anything else would be pretty inconsistent. However, a GetProviderSpecificValues() counterpart actually exists, which seems to be exactly what you're looking for, any reason not to use that? Note also that both GetValues() and GetProviderSpecificValues() do nothing beyond looping internally and populating the given array by calling GetValue() or GetProviderSpecificValue(); this is something you could easily do yourself from outside Npgsql, catching exceptions and being able to tell exactly which column triggered them...

@Brar
Copy link
Member

Brar commented Jun 23, 2018

However, a GetProviderSpecificValues() counterpart actually exists, which seems to be exactly what you're looking for, any reason not to use that?

This is the solution I didn't know until today 👍

@FransBouma
Copy link
Contributor

It seems that whatever decision is made for GetValue() should hold for GetValues() at all - anything else would be pretty inconsistent. However, a GetProviderSpecificValues() counterpart actually exists, which seems to be exactly what you're looking for, any reason not to use that? Note also that both GetValues() and GetProviderSpecificValues() do nothing beyond looping internally and populating the given array by calling GetValue() or GetProviderSpecificValue(); this is something you could easily do yourself from outside Npgsql, catching exceptions and being able to tell exactly which column triggered them...

I have to admit, I never really understood GetProviderSpecificValues, or better: what it will give me. What I'm after is the situation where I get native .net types in all cases, except when it would create an exception, (like with Oracle NUMBER(10, 38) or here with the time) and in that case return a provider specific value. The method however seems to be it will create provider specific types in all cases, so return a SqlInt32 or whatever and not an int32. So I never looked into this method as it's not useful in any case except perhaps if you have an exception due to a .net type that can't be created and you have to create a provider specific type.

We have a mechanism in place for the user to tap into that (the exception happens, they get a call, so they can recover) and I'll suggest the providerspecificvalues method as a solution :)

@roji
Copy link
Member

roji commented Jun 25, 2018

I have to admit, I never really understood GetProviderSpecificValues, or better: what it will give me. What I'm after is the situation where I get native .net types in all cases, except when it would create an exception, (like with Oracle NUMBER(10, 38) or here with the time) and in that case return a provider specific value. The method however seems to be it will create provider specific types in all cases, so return a SqlInt32 or whatever and not an int32. So I never looked into this method as it's not useful in any case except perhaps if you have an exception due to a .net type that can't be created and you have to create a provider specific type.

I think the idea is to have a predictable type returned when you get get a column's value. GetValue() will always return the basic .NET type (e.g. decimal, DateTime) and throw otherwise, while GetProviderSpecificValue() will always return the provider-specific type which is guaranteed to always be able to represent the database value.

I get where you're coming from, though - when you're writing a generic data access layer such as an O/RM, you're in a dynamic situation in the sense that you have no idea what types are coming back, and you don't necessarily care very much. However, when you code directly against ADO.NET you know which type is coming because you coded the SQL query; if you know your specific database values in your specific application cannot be represented by the built-in .NET type, you call GetProviderSpecificValue().

On the other hand, if you're in that scenario where you know what type you're expecting, you should actually just call GetFieldValue<T>(), which has the additional advantage of being a generic API (safer and potentially avoids boxing). I'm guessing (but not sure) that GetValue() and GetProviderSpecificValue() have been around since the days of .NET 1.0 - before generic programming was introduced - and so can and should be considered quite obsolete in applications where the returned data type is known.

So to summarize, I can see the logic of having GetValue() return whatever type works (decimal if that's OK, otherwise SqlDecimal), since it's only supposed to be used in dynamic applications anyway. But this is not how the API works and it's quite a huge change... Hopefully you can work around it.

@FransBouma
Copy link
Contributor

thanks for the info, yes it's ok now, the providerspecificvalue method is a good 'fallback' option for the situation when the exception occurs. I understand your situation and agree it's something you don't want to change.

Boxing the values is our case OK as in this particular fetch pipeline we store the values as an object[] inside the entity (we have to as this is due to backwards compatibility as my former self thought he was wise to store values in field objects instead of typed properties! current self thinks former self was a bit of an idiot thinking that but alas, we can't change things now and have to deal with the boxed values forever now ;)). But indeed, GetFieldValue might be a good option as well.

Thanks!

@roji
Copy link
Member

roji commented Jun 25, 2018

OK cool :) FYI at some point I went and made sure the entire read path is generic to completely avoid boxing of any value types. There's also #1639 which does the same for writing, although that involves a new non-standard API (generic parameters) which hopefully will make it into ADO.NET one day (https://github.com/dotnet/corefx/issues/8955).

FWIW I'd bet that SQL Client and other drivers already box internally, so you may not gain anything on them when switching to GetFieldValue<T>(), but on Npgsql it would cut down on allocations (potentially significantly). Just something to think about :)

@FransBouma
Copy link
Contributor

:)

Interestingly SqlDataReader.GetInt32(ordinal) doesn't seem to box, however SqlDataReader.GetFieldValue<int>(ordinal) does :D (as it returns the 'value' property of the SqlBuffer object). Not sure why they did this, might be the code to determine which property of SqlBuffer to return (it stores an int in its Int32 property etc.) per value is slower than simply box once and return the value with a cast.

@roji
Copy link
Member

roji commented Jun 28, 2018

That's a bit odd... That would mean that GetInt32() has a completely different code-path, not passing through their internal SqlBuffer class...

In any case, in Npgsql GetInt32() is nothing more than a wrapper over GetFieldValue<int>().

@FransBouma
Copy link
Contributor

Actually GetInt32 uses their internal SqlBuffer class (and returns the Int32 property), the GetFieldValue<int> doesnt (it does, but it returns (T)data[index].value), I couldn't find any boxing behavior in the SqlBuffer class (looked in .net full code with ILSpy), as they use a property per type in that class so it's not boxed.

Your approach is indeed better :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants