-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SqlDataReader.GetSqlDateTime does not support DATETIME2 #846
Comments
@bjornbouetsmith |
Hi, The former do not throw exceptions if the column is NULL, GetXX methods do - so I would still suggest that you either "Fix" SqlDateTime to handle the extended range of DATETIME2 - or implement a SqlDateTime2 to handle the DATETIME2 and then naturally add a GetSqlDateTime2 method. When reading millions of rows from the database where every single column can be null, it is a major performance degredation to use GetXX methods instead of GetSqlXX methods, since you need to check every single column on ever single row with first IsDBNull and then do something depending on the return value. With GetSqlXX methods they never fail because the row is null and you can call the IsNull property directly on the return value, which is a more than an order of 10 times faster because of the overhead in IsDBNull. Most of the code to handle DATETIME2 is already implemented in GetDateTime, so it should be only a matter of either implementing a GetSqlDateTime2 or fixing GetSqlDateTime to handle the extra range by extending SqlDateTime. I would prefer you implemented a GetSqlDateTime2 - since that would complete the API, since there is a DATETIME2 data type in the database without a corresponding Sql class in .NET. Which would probably require adding a SqlDateTime2 property to the SqlBuffer struct. I will be happy to submit a pull request to this effect. And casting to DateTime is not really an option on the SQL server, since that would truncate the precision. |
There are APIs to get the nullability from a reader before calling the get methods, so you can avoid calling IsDbNull |
@ErikEJ What API is that? SqlDataReader.IsDbNull is what I am using, and that is not performant, since it does a lot of checks. And I need to call for each field on every row, so I have found no API, except SqlDataReader.IsDbNull, Convert.IsDbNull and the IsNull property of the SqlXX types. Convert.IsDbNull is useless, since that will BOX any value type, which will cause garbage, SqlDataReader is a performance HOG because it needs to check for the correct state of the reader before accessing the field. Which unless I have missed something leaves me with the IsNull property of the SqlXX types. I am not interested in an API that can tell me whether or not a column "can" be null - I already know ALL columns can be null, but a performant way to check if any field is null. |
GetColumnSchema or GetSchemaTable |
@ErikEJ Yes, but that will only tell me the schema of the table, which I don't need and cannot use - I already know all columns can be null - I need to handle null values because SqlClient will throw exception if the field is null. Where the GetSqlXX methods will not throw exceptions, but instead return a struct with a IsNull property. So the only real way to manage null values if you care about performance and garbage collections. |
Can you provide an isolated test case that exhibits the behaviour you see around null checks and direct type access causing more garbage than using the sql types. I've always assumed when working on the internals that the sql types would be more costly to use but if they're not I can investigate if there are ways to equalize the performance of the paths. |
@Wraith2 I never said my issue was around garbage - but only with performance, since the SqlClient is missing a SqlDateTime2 or equivalent class to represent DATETIME2. if you use DATETIME2 - you are forced to use SqlDataReader.GetDateTime - and since that method does not gracefully handle null, you are then forced to first call SqlDataReader.IsDbNull. The codepath for IsDbNull does a lot of work, which is required because it needs to check if the state is correct to be able to check for null. And then it ends up by accessing a property on SqlBuffer which is basically just a field accessor telling it whether or not data is null. If there was a SqlDateTime2 struct/GetSqlDateTime2, I would not have to call SqlDataReader.IsDbNull and then GetSqlDateTime - I could just use the performant way of just retrieving the SqlDateTime2 struct and call the IsNull property and if false, return the data contained in the struct. using my "testcase" from above - I am sure you will be able to see the issue running this code below. for (int x = 0; x < 10000000; x++)
{
DateTime dateTime;
if (!reader.IsDBNull(0))
{
// Do something
dateTime = reader.GetDateTime(0);
}
else
{
dateTime = default;
}
}
//vs below code which fails with an InvalidCastException
// so you have to imagine calling a GetSqlDateTime2 which returned a SqlDateTime2
// or imagine that SqlDateTime would be able to handle the extended range of DATETIME2 and not throw exceptions
for (int x = 0; x < 10000000; x++)
{
DateTime dateTime;
SqlDateTime sqlDateTime = reader.GetSqlDateTime(0);
if (!sqlDateTime.IsNull)
{
dateTime = sqlDateTime.Value;
}
else
{
dateTime = default;
}
} |
Is this something you're actually seeing in production, or have benchmarked? Because a quick benchmark seems to show that the perf of GetSqlDateTime is the same as IsDBNull + GetDateTime (see below). After all, I'd expect GetSqlDateTime to have to do more or less the same null checking internally as you're doing externally when calling GetDateTime. Benchmark results: BenchmarkDotNet=v0.12.1, OS=ubuntu 20.04
Intel Core i7-9750H CPU 2.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.101
[Host] : .NET Core 3.1.10 (CoreCLR 4.700.20.51601, CoreFX 4.700.20.51901), X64 RyuJIT
DefaultJob : .NET Core 3.1.10 (CoreCLR 4.700.20.51601, CoreFX 4.700.20.51901), X64 RyuJIT
Benchmark codepublic class Program
{
const string ConnectionString = @"Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0";
[Params(true, false)]
public bool IsNull { get; set; }
SqlConnection _connection;
SqlCommand _command;
[GlobalSetup]
public void Setup()
{
_connection = new SqlConnection(ConnectionString);
_connection.Open();
_command = new SqlCommand(IsNull
? "SELECT CAST(NULL AS datetime)"
: "SELECT CAST('2020-01-01' AS datetime)",
_connection);
}
[GlobalCleanup]
public void Cleanup()
{
_connection.Dispose();
_command.Dispose();
}
[Benchmark]
public DateTime GetDateTime()
{
using var reader = _command.ExecuteReader();
reader.Read();
return reader.IsDBNull(0) ? default : reader.GetDateTime(0);
}
[Benchmark]
public SqlDateTime GetSqlDateTime()
{
using var reader = _command.ExecuteReader();
reader.Read();
return reader.GetSqlDateTime(0);
}
static void Main(string[] args)
=> BenchmarkRunner.Run<Program>();
} |
Should it not be:
|
@ErikEJ Yes that would have been great - except using GetSqlDateTime does not support DATETIME2 - which is my whole problem. |
@roji Yes I have profiled this and could see that IsDbNull was the method using the most time. And I have no idea how that benchmark you made runs, but if you call ExecuteReader, and only read one field from that reader, the isolated performance of IsDbNull will drown in the connection overhead etc. So I would suggest if you want to test like that, that you open the reader once, and then call reader.IsDbNull/GetDateTime 10million times inside a loop and do the same with GetSqlDateTime. that would give you the performance testing in question - and only measure the time of reader.IsDbNull/GetDateTime - not the time it takes to open the connection and execute the query. Using this simple test shows a huge performance difference on my machine: IsDbNull:00:00:04.1302114 And reading 10million+ very wide rows where all fields can be null for performance reasons, then yes it makes a big difference. I considered "fixing" the issue by making my stored procedure use ISNULL, but that would defeat the purpose of having null for performance reasons, since then suddenly the sql server would have to transfer a value instead of just a bit stating the field was null - so we would use more bandwith/time just reading the data. So I am still convinced that having a GetSqlDateTime method that either handles DATETIME2 - or having a GetSqlDateTime2 method is the correct solution. string Statement = $@"SELECT CAST('2020-12-11 09:17:12.123' AS DATETIME) AS [MyDate]";
var builder = new SqlConnectionStringBuilder()
{
InitialCatalog = "master",
DataSource = @"(LocalDb)\AggService",
IntegratedSecurity = true,
};
int count=100000000;
var sqlConnection = new SqlConnection(builder.ConnectionString);
sqlConnection.Open();
using (SqlCommand command = new SqlCommand(Statement))
{
command.CommandType = CommandType.Text;
command.Connection = sqlConnection;
command.CommandText = Statement;
using (var reader = command.ExecuteReader())
{
while (reader.Read())
{
var dummy=reader.GetDateTime(0);
var dummy2 = reader.GetSqlDateTime(0);
Stopwatch sw = Stopwatch.StartNew();
for (int x = 0; x < count; x++)
{
DateTime dt;
if (!reader.IsDBNull(0))
{
dt = reader.GetDateTime(0);
}
else
{
dt = default;
}
}
sw.Stop();
Console.WriteLine("IsDbNull:{0}",sw.Elapsed);
sw = Stopwatch.StartNew();
for (int x = 0; x < count; x++)
{
var sqlDt = reader.GetSqlDateTime(0);
DateTime dt;
if (!sqlDt.IsNull)
{
dt = sqlDt.Value;
}
else
{
dt = default;
}
}
sw.Stop();
Console.WriteLine("GetSqlDateTime:{0}",sw.Elapsed);
sw = Stopwatch.StartNew();
for (int x = 0; x < count; x++)
{
DateTime dt;
if (!reader.IsDBNull(0))
{
dt = reader.GetDateTime(0);
}
else
{
dt = default;
}
}
sw.Stop();
Console.WriteLine("IsDbNull:{0}",sw.Elapsed);
sw = Stopwatch.StartNew();
for (int x = 0; x < count; x++)
{
var sqlDt = reader.GetSqlDateTime(0);
DateTime dt;
if (!sqlDt.IsNull)
{
dt = sqlDt.Value;
}
else
{
dt = default;
}
}
sw.Stop();
Console.WriteLine("GetSqlDateTime:{0}",sw.Elapsed);
}
}
} |
That's my expectation based on the code as well. The first field access on the row will create the SqlBuffer and then it's used by the subsequent access. the Sql* versions should actually do more work because they're creating an additional object rather than passing back data. This will mean that profiles show most of the time spend in IsDbNull because that's where the value is resolved and it changes the Get* call to a simple lookup, however the overall time shared between them should differ only in the extra time taken to setup callstacks and do duplicated arg checking. The Sql* types are considered legacy currently as far as I'm aware. They're from .net 1.0 when there were no generics and thus no way to express struct nullability, mostly it was for DataTable as far as I know. Because the types are defined in terms of SQL Server's capabilities other providers have to ignore them, provide their own conversions or provider replacements. It seems unlikely that there will be a new type added to the runtime (dotnet/runtime) at this point but it might be possible to add it directly to sqlclient. I'd rather make the check-null-then-maybe-get pattern faster if it's causing a perf problem in comparison to the Sql type version. Idealogically I think the separation of null checking and field getting is a purer mapping of the distinction between nulls and values in SQL than a nullable clr type, I consider sql nulls "stronger" than language nulls because of their untyped and uncomparable nature. |
@Wraith2 I was not aware that the SqlTypes were considered legacy - if that is the case, I would suggest they would be marked as obsolete - and if they are not obsolete, they should really be extended with support for the new data types in SQL Server. If you can fix the performance issue of IsDbNull - or handle null better in GetXX methods - I am fine by that. But it seems "strange" that an API was chosen that cannot handle null and requires that caller has to call another method to check for null the GetSqlXX methods is much easier to use and are more equivalent to what you would expect in my opinion from something that reads from a database. Imagine SQL server not being able to return NULL from a column and you had to always wrap every column in a ISNULL check before accessing it. Getting a value back without having to check for null first, will always perform better than having to check for null and then going through almost the same work again just to get the value. Compare the two methods: virtual public SqlDateTime GetSqlDateTime(int i)
{
ReadColumn(i);
return _data[i].SqlDateTime;
}
/// <include file='../../../../../../../doc/snippets/Microsoft.Data.SqlClient/SqlDataReader.xml' path='docs/members[@name="SqlDataReader"]/GetDateTime/*' />
override public DateTime GetDateTime(int i)
{
ReadColumn(i);
DateTime dt = _data[i].DateTime;
// This accessor can be called for regular DateTime column. In this case we should not throw
if (_typeSystem <= SqlConnectionString.TypeSystem.SQLServer2005 && _metaData[i].IsNewKatmaiDateTimeType)
{
// TypeSystem.SQLServer2005 or less
// If the above succeeds, then we received a valid DateTime instance, now we need to force
// an InvalidCastException since DateTime is not exposed with the version knob in this setting.
// To do so, we simply force the exception by casting the string representation of the value
// To DateTime.
object temp = (object)_data[i].String;
dt = (DateTime)temp;
}
return dt;
} Almost the same work in both methods - both call ReadColumn and both return data from the SqlBuffer - the difference is that the DateTime property throws an exception if the field is null, where the SqlDateTime returns a struct that can tell you if the field was null - similar to a nullable datetime. A much better API in my opinion SqlDateTime takes up private bool m_fNotNull; // false if null
private int m_day; // Day from 1900/1/1, could be negative. Range: Jan 1 1753 - Dec 31 9999.
private int m_time; // Time in the day in term of ticks Where Datetime takes up: private UInt64 dateData; So almost similar storage requirements in memory. A solution could be to return nullable types from a range of new methods, since a Nullable should perform similar to the SqlDateTime, since it has: private bool hasValue;
internal T value; i.e. a full range of: GetNullableXX that would simply do the same as the normal methods, but instead of blindly accessing the "Value" field of SqlBuffer it would first call IsNull, i.e. override public DateTime? GetNullableDateTime(int i)
{
ReadColumn(i);
if (_data[i].IsNull)
{
return (DateTime?)null;
}
DateTime dt = _data[i].DateTime;
// This accessor can be called for regular DateTime column. In this case we should not throw
if (_typeSystem <= SqlConnectionString.TypeSystem.SQLServer2005 && _metaData[i].IsNewKatmaiDateTimeType)
{
// TypeSystem.SQLServer2005 or less
// If the above succeeds, then we received a valid DateTime instance, now we need to force
// an InvalidCastException since DateTime is not exposed with the version knob in this setting.
// To do so, we simply force the exception by casting the string representation of the value
// To DateTime.
object temp = (object)_data[i].String;
dt = (DateTime)temp;
}
return dt;
} |
I'd recommend not confusing the discussion around DATETIME2 support and SqlClient's SqlTypes with a general discussion on how ADO.NET behaves around nulls - this is simply how the .NET database API works. I wouldn't consider this behavior around nulls a good reason to introduce a GetSqlDateTime2. The ADO.NET behavior around nulls was a design decision for the entire API (regardless of what we think about it today) - database nulls are generally not represented by .NET nulls, but rather by DBNull (for generic methods this is a problem). There's some relevant discussion on this in dotnet/runtime#26511, even if that's about a proposed generic ExecuteScalar, rather than readere methods. In any case, as there isn't a performance aspect here, it's trivial to write a simple |
@roji What do you mean there isn't a performance aspect? Did you see my test code and the results I posted? going from 4.1 seconds to 2.7 is a huge difference. The test you wrote are next to useless, since it has the entire overhead of the ExecuteReader and Read - if you moved those calls to the "setup" part of the test, then perhaps it would be equivalent to the test I made. I cannot just write a GetNullable method, since that would help me exactly nothing, since I would still need to call IsDbNull, which is the entire reason for this post - IsDbNull is SLOW by design - since it has to check for metadata etc before accessing the field - and then you have to do a lot of the work again by reading the data. Rewrite your test, to only call the methods being discussed. IsDbNull + GetDateTime vs. GetSqlDateTime - and I am sure you will se similar performance difference than I did. |
No, it isn't. As I explained earlier the first one one of IsDBNull or GetField will cause the sqlbuffer entry to be created and will show the majority of the cost. The benchmark @roji provided compares: As I said I'll try and find time to check your specific scenario. There is a difference between legacy and obsolete. Obsolete means you shouldn't use it and there is usually a replacement. Legacy means it's old and it's not obsolete but it's not likely to be expanded. DataTable and the associated items supporting it in System.Data fit into the legacy category as far as I know, they work they're supported but they're unlikely to change. Does that sound right @roji? |
@Wraith2 it does indeed compare the methods you write, but it also includes the time it takes to call ExecuteReader and Read - and the majority of the time in those tests will probably be from ExecuteReader & Read - so that is why I still think the test is testing more than it should - the time taken by IsDbNull+GetDateTime drowns in the time taken by ExecuteReader & Read. If you add in a Thread.Sleep(10000) into the same test, it also tests reader.IsDbNull - but the time taken will drown - which is why it should be rewritten to only measure the methods in question - not measure code that is part of the setup. Which is what my test did, except not using a fancy test framework. In regards to Legacy, Do you have any links that say they are legacy or is it just guesses or hearsay? And if they are legacy I think it is a shame, since they provide a better API in my opinion. |
Ok, can you provide a benchmark.net example that shows what you're saying? |
There you go. Point proven - almost 100% faster to use GetSqlDateTime when value is not null, and a bit slower when value is null. which is surprising, but possibly just because DateTime is a smaller struct than SqlDateTime. Edited with string and int64 test as well. BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
public class Program
{
const string ConnectionString = @"Server=(localdb)\MSSQLLocalDB;Database=master;Integrated Security=true;Connect Timeout=3;ConnectRetryCount=1";
[Params(true, false)]
public bool NullValue { get; set; }
SqlConnection _connection;
SqlCommand _command;
SqlDataReader _reader;
[GlobalSetup]
public void Setup()
{
_connection = new SqlConnection(ConnectionString);
_connection.Open();
_command = new SqlCommand(@"
SELECT CAST(NULL AS datetime) as [MyDT], CAST(NULL AS NVARCHAR(10)) as [MyStr], CAST(NULL AS BIGINT) as [MyBigInt]
UNION
SELECT CAST('2020-01-01 01:01:01.123' AS datetime) as [MyDT], CAST('AbCdEfGhIj' AS NVARCHAR(10)) as [MyStr], CAST(4611686018427387903 AS BIGINT) as [MyBigInt] ",
_connection);
_reader = _command.ExecuteReader();
_reader.Read();
if (!NullValue)
{
_reader.Read();
}
}
[GlobalCleanup]
public void Cleanup()
{
_connection.Dispose();
_command.Dispose();
_reader.Close();
}
[Benchmark]
public DateTime GetDateTime()
{
return _reader.IsDBNull(0) ? default : _reader.GetDateTime(0);
}
[Benchmark]
public SqlDateTime GetSqlDateTime()
{
return _reader.GetSqlDateTime(0);
}
[Benchmark]
public string GetString()
{
return _reader.IsDBNull(1) ? default : _reader.GetString(1);
}
[Benchmark]
public SqlString GetSqlString()
{
return _reader.GetSqlString(1);
}
[Benchmark]
public long GetInt64()
{
return _reader.IsDBNull(2) ? default : _reader.GetInt64(2);
}
[Benchmark]
public SqlInt64 GetSqlInt64()
{
return _reader.GetSqlInt64(2);
}
static void Main(string[] args)
=> BenchmarkRunner.Run<Program>();
} |
So it seems like when the value in a field is null - the IsDbNull + default is the fastest. structs/primitives seems faster as well - but I would be happy to write and run a full test of all the built in .NET types so we can get the full picture. But I am guessing the following pattern will emerge. structs/primitives will be faster with GetSqlXX methods, objects will be faster with GetXX methods - but it probably depends on the implementation of the SqlTypes struct used how much overhead there are in that struct. This is probably why SqlString is so slow: private string m_value;
private CompareInfo m_cmpInfo;
private int m_lcid;
private SqlCompareOptions m_flag;
private bool m_fNotNull; plus it converts a byte array to a string in the constructor - where GetString seems to do the equivelant of casting an object to a string |
With a few more data types tested:
And code: public class Program
{
const string ConnectionString = @"Server=(localdb)\MSSQLLocalDB;Database=master;Integrated Security=true;Connect Timeout=3;ConnectRetryCount=1";
[Params(true, false)]
public bool NullValue { get; set; }
SqlConnection _connection;
SqlCommand _command;
SqlDataReader _reader;
private byte[] _buffer;
[GlobalSetup]
public void Setup()
{
_connection = new SqlConnection(ConnectionString);
_connection.Open();
_command = new SqlCommand(@"
SELECT CAST(NULL AS datetime) as [MyDT], CAST(NULL AS NVARCHAR(10)) as [MyStr], CAST(NULL AS BIGINT) as [MyBigInt],CAST(NULL as FLOAT) as [MyFL],CAST(NULL as INT) as [MyInt],CAST(NULL as SMALLINT) as [MySInt],CAST(NULL AS BIT) as [myBit],cast(NULL AS uniqueidentifier) as [MyId],CAST(NULL AS timestamp) AS [MyTS]
UNION
SELECT CAST('2020-01-01 01:01:01.123' AS datetime) as [MyDT], CAST('AbCdEfGhIj' AS NVARCHAR(10)) as [MyStr], CAST(4611686018427387903 AS BIGINT) as [MyBigInt],CAST(46116860184.27387903 as FLOAT) as [MyFL],CAST(1234567 as INT) as [MyInt],CAST(1234 as SMALLINT) as [MySInt],CAST(0 AS BIT) as [myBit],cast(NEWID() AS uniqueidentifier) as [MyId],CAST(CURRENT_TIMESTAMP AS timestamp) AS [MyTS]",
_connection);
_reader = _command.ExecuteReader();
_reader.Read();
if (!NullValue)
{
_reader.Read();
}
}
[GlobalCleanup]
public void Cleanup()
{
_connection.Dispose();
_command.Dispose();
_reader.Close();
}
[Benchmark]
public DateTime GetDateTime()
{
return _reader.IsDBNull(0) ? default : _reader.GetDateTime(0);
}
[Benchmark]
public SqlDateTime GetSqlDateTime()
{
return _reader.GetSqlDateTime(0);
}
[Benchmark]
public string GetString()
{
return _reader.IsDBNull(1) ? default : _reader.GetString(1);
}
[Benchmark]
public SqlString GetSqlString()
{
return _reader.GetSqlString(1);
}
[Benchmark]
public long GetInt64()
{
return _reader.IsDBNull(2) ? default : _reader.GetInt64(2);
}
[Benchmark]
public SqlInt64 GetSqlInt64()
{
return _reader.GetSqlInt64(2);
}
[Benchmark]
public double GetDouble()
{
return _reader.IsDBNull(3) ? default : _reader.GetDouble(3);
}
[Benchmark]
public SqlDouble GetSqlDouble()
{
return _reader.GetSqlDouble(3);
}
[Benchmark]
public int GetInt32()
{
return _reader.IsDBNull(4) ? default : _reader.GetInt32(4);
}
[Benchmark]
public SqlInt32 GetSqlInt32()
{
return _reader.GetSqlInt32(4);
}
[Benchmark]
public short GetInt16()
{
return _reader.IsDBNull(5) ? default : _reader.GetInt16(5);
}
[Benchmark]
public SqlInt16 GetSqlInt16()
{
return _reader.GetSqlInt16(5);
}
[Benchmark]
public bool GetBoolean()
{
return _reader.IsDBNull(6) ? default : _reader.GetBoolean(6);
}
[Benchmark]
public SqlBoolean GetSqlBoolean()
{
return _reader.GetSqlBoolean(6);
}
[Benchmark]
public Guid GetGuid()
{
return _reader.IsDBNull(6) ? default : _reader.GetGuid(7);
}
[Benchmark]
public SqlGuid GetSqlGuid()
{
return _reader.GetSqlGuid(7);
}
[Benchmark]
public byte[] GetBytes()
{
byte[] buffer = _buffer??=new byte[8];
if (_reader.IsDBNull(8))
{
return null;
}
_reader.GetBytes(8, 0, buffer, 0, 8);
return buffer;
}
[Benchmark]
public SqlBytes GetSqlBytes()
{
return _reader.GetSqlBytes(8);
}
static void Main(string[] args)
=> BenchmarkRunner.Run<Program>();
} |
So far my prediction holds, primitives/structs are faster using GetSqlXX methods. |
@bjornbouetsmith there are many things above, so I'll try to answer them one by one.
That is quite a problematic way to show that something has a performance problem in any real-world scenario; yes, method A may be negligibly slower than method B (e.g. by a few nanoseconds), but that doesn't mean it's significant. It's always possible to devise a benchmark which would amplify that difference, e.g. by simulating a 10 million-column row (something which isn't even supported by SQL Server AFAIK), but that doesn't necessarily have any real meaning to anyone in the real world. This is why I asked above whether you've seen an actual perf problem in production, as opposed to in an artificially-crafted benchmark.
This is a respectful discussion with differing opinions - I'd avoid calling the other person's benchmark "next to useless". My benchmark tests exactly what it tests: the difference for a single query retrieving a single value; ExecuteReader and Read are the same in both methods, so they should be constant. Regardless, what you consider a real-world usage is definitely a matter for discussion, but 10 million+ wide rows definitely seems further away from reasonable than what a query fetching a single value. Looking at your results, I'm seeing differences between almost 0 to around 30ns (the bytes case is an interesting outlier). It's also worth keeping in mind that our benchmarks are being executed against localhost, meaning that there is almost zero latency involved; I'd guess that in any reasonable scenario where a remote database is being accessed (which is most real-world cases), these differences are going to be drowned out. That should of course be confirmed via further benchmarking. But stepping back for a second... The question here is whether the difference between IsDBNull + GetWhatever and GetSqlWhatever is something that justifies prioritizing optimization efforts - compared to other possible performance work on SqlClient. My personal opinion is that it isn't. But even if it were the case, rather than introducing a new GetSqlDateTime2, efforts could be concentrated on simply making IsDBNull + GetWhatever faster. First, there's no reason to believe that's not possible; second, GetDateTime is the standard ADO.NET API, as opposed to GetSqlDateTime which is SqlClient-specific and so therefore not available to a database-agnostic application.
To reiterate, there's nothing about IsDBNull that's slow by design. The ADO.NET API (which is what defines IsDBNull) says nothing about implementation, or about when any metadata gets checked. And again, at least conceptually there's nothing IsDBNull does which wouldn't need to be done by GetSqlDateTime internally. |
I'm not sure about the exact status of the SqlTypes. Re DataTable/DataSet specifically - which is part of ADO.NET rather than SqlClient - I'd say they're more "archived" than obsolete/legacy. That is, generally speaking no further improvements are being made to those types, and changes are likely to be restricted to bug/security fixes. But note that this isn't any official status or anything - just the way I see it. |
I am working on a system that for unknown reasons have a "contract" defined that defines 200+ fields which all are populated from a stored procedure that returns those fields. Many of the fields comes from outer joins, so they can be null. Changing this pattern is possible, but a huge undertaking that requires changing a large code base with systems older than 15 years.
I am sorry, I could have worded it differently - my point is that you include code that is an order of magnitude slower than the methods we are discussing - which is why I wrote that you might as well have added a Thread.Sleep(10000) in the test - that is also a constant, but would again drown out the time of the methods you are testing. The tests I have written tests the methods - and show a big difference between the two methods in question, that was hidden the way you tested, you even wrote yourself:
Which could not be further from the truth.
That is correct, but looking at it in absolute numbers is probably not the only relevant way - in percentage, its up to 100% faster for some of the methods and only GetSqlString is slower than GetString - which is explained by the extra work GetSqlString needs to do. If I can shave of 30ns* 10million*200 - that is 60 seconds, in two years it might be worth 120 seconds etc., then its very much measurable and worth doing for my company/work. The system I work on is a low latency system, we don't use .NET tasks because they allocate memory, and use structs widely to get rid of heap allocations - I agree for most people it matters not one bit, but for those systems where you work your utmost to tweak and squeeze every bit of performance out of the system - it matters
I doubt you can make any changes to the sql client to make IsDbNull+GetWhatever as fast as GetSqlWhatever - simply because you have to check for null and you have to check for metadata both in IsDbNull+GetWhatever - GetSqlWhatever can short circuit a lot of that work, since it returns a struct that know whether or not its value is null
That is probably true, but the advantage is that GetSqlWhatever at the same time returns the value - where IsDbNull only returns an indication whether or not its safe to call GetWhatever. The only real possible alternative that would work "similar" as I see it - Would be a "GetIfNotNull" method that simply combined what IsDbNull did. Similar to the "example" I made earliear in the thread public bool TryGetValue<T>(int i, out T value)
{
ReadColumn(i);
if (_data[i].IsNull)
{
value = default;
return false;
}
// Do whatever is needed for the specific type
}
// or
public (bool IsNull,T Value) GetValue<T>(int i)
{
ReadColumn(i);
if (_data[i].IsNull)
{
return(true,default);
}
// Do whatever is needed for the specific type
}
// or probably my favorite
public T? GetValue<T>(int i)
{
ReadColumn(i);
if (_data[i].IsNull)
{
return null;
}
// Do whatever is needed for the specific type
} The point is that the null check needs to be in the same method that returns the data, otherwise I cannot se it will ever be as optimal as the GetSqlWhatever methods, because they handle null internally. |
I can see your need and use case. Unfortunately the best advice available at the moment would be to use the Sql* types where you can and handle datetime2 the other way. I think you're slightly trading off memory use for cpu use but if that works for you then it's a sensible decision to make, I may decide to do the same in your position. The time difference you're seeing is almost entirely unavoidable based on my review of the code, it's all function call setup and state validation, none of which can be skipped. It's simply that much slower to make two calls than one when you're running that fast. I think the likelihood of getting an SqlDateTime2 struct added to runtime is almost 0. It's a lot more complicated than I originally thought it would be last time I investigated those types and as I've said it's old code that isn't seeing any attention beyond compatibility. Runtime are unlikely to have an interest. I think the likelihood of adding something in this library is slightly higher than in runtime that but still quite small. If and only if the owners approved of the idea it isn't going to make the top of the priority list. On the plus side apart from the type being in a different namespace (It'd end up in If approved and it's worth the effort it might be something that would be acceptable as an external contribution. If you need help plumbing in the type once it's written with covering tests etc then I'll help out if I can. Don't do anything without knowing first if the idea is acceptable to the owners. Now, there might be a case for |
I agree with a lot of what @Wraith2 wrote above. Re adding to Microsoft.Data.SqlTypes, although those types live in the runtime, they're technically part of SqlClient/SQL Server, so I think the decisions would have to be made there. If there were a correctness issue here, i.e. that a SqlDateTime2 struct is needed to fully represent SQL Server's datetime2, that would be a stronger argument. However, I'd be against doing anything like this purely to save users an IsDBNull - regardless of whether there's an actual perf issue (because again, if there's a problem it should be addressed generally, and not in a SqlClient-specific API). More generally, I've done yet another benchmark (code and results below) to specifically measure the overhead of calling IsDBNull; it's ~14ns for SqlClient and ~12ns for Npgsql. Granted, there may be applications out there where that overhead is significant; these would probably be applications running on the same machine as their DB (for zero-latency), continuously pumping values of LocalDB and doing almost nothing with those values (otherwise again, the cost of IsDBNull would be dominated by whatever is being done). However, I'd venture to guess that for the vast majority of SqlClient users, shaving off these 14ns would be very low on their priority list, compared to all the other possible improvements. IsDBNull benchmark results
IsDBNull benchmark codepublic class Program
{
const string SqlConnectionString = @"Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0";
const string NpgsqlConnectionString = @"Server=localhost;Username=test;Password=test";
[Params(true, false)]
public bool NullValue { get; set; }
[Params(DatabaseType.SqlServer, DatabaseType.PostgreSQL)]
public DatabaseType DatabaseType { get; set; }
DbConnection _connection;
DbCommand _command;
DbDataReader _reader;
[GlobalSetup]
public void Setup()
{
switch (DatabaseType)
{
case DatabaseType.SqlServer:
_connection = new SqlConnection(SqlConnectionString);
_connection.Open();
_command = _connection.CreateCommand();
_command.CommandText = @"
SELECT CAST(NULL AS datetime), CAST(NULL AS NVARCHAR(10)), CAST(NULL AS BIGINT), CAST(NULL as FLOAT), CAST(NULL as INT), CAST(NULL as SMALLINT), CAST(NULL AS BIT), CAST(NULL AS uniqueidentifier), CAST(NULL AS timestamp)
UNION
SELECT CAST('2020-01-01 01:01:01.123' AS datetime), CAST('AbCdEfGhIj' AS NVARCHAR(10)), CAST(4611686018427387903 AS BIGINT), CAST(46116860184.27387903 as FLOAT), CAST(1234567 as INT), CAST(1234 as SMALLINT), CAST(0 AS BIT), CAST(NEWID() AS uniqueidentifier), CAST(CURRENT_TIMESTAMP AS timestamp)";
break;
case DatabaseType.PostgreSQL:
_connection = new NpgsqlConnection(NpgsqlConnectionString);
_connection.Open();
_command = _connection.CreateCommand();
_command.CommandText = @"
SELECT CAST(NULL AS timestamp), CAST(NULL AS TEXT), CAST(NULL AS BIGINT), CAST(NULL as FLOAT), CAST(NULL as INT), CAST(NULL as SMALLINT), CAST(NULL AS BIT), CAST(NULL AS uuid), CAST(NULL AS bytea)
UNION
SELECT CAST('2020-01-01 01:01:01.123' AS timestamp), CAST('AbCdEfGhIj' AS TEXT), CAST(4611686018427387903 AS BIGINT), CAST(46116860184.27387903 as FLOAT), CAST(1234567 as INT), CAST(1234 as SMALLINT), CAST(0 AS BIT), CAST(gen_random_uuid() AS uuid), CAST('hello' AS bytea)";
break;
}
_reader = _command.ExecuteReader();
_reader.Read();
if (!NullValue)
{
_reader.Read();
}
}
[GlobalCleanup]
public void Cleanup()
{
_connection.Dispose();
_command.Dispose();
_reader.Close();
}
[Benchmark]
public bool DateTime() => _reader.IsDBNull(0);
[Benchmark]
public bool String() => _reader.IsDBNull(1);
[Benchmark]
public bool Int64() => _reader.IsDBNull(2);
[Benchmark]
public bool Double() => _reader.IsDBNull(3);
[Benchmark]
public bool Int32() => _reader.IsDBNull(4);
[Benchmark]
public bool Int16() => _reader.IsDBNull(5);
[Benchmark]
public bool Boolean() => _reader.IsDBNull(6);
[Benchmark]
public bool GetGuid() => _reader.IsDBNull(7);
[Benchmark]
public bool GetBytes() => _reader.IsDBNull(8);
static void Main(string[] args)
=> BenchmarkRunner.Run<Program>();
}
public enum DatabaseType
{
SqlServer,
PostgreSQL
} |
Which is what I have done, but I would prefer to also handle Datetime this way - and since the API is not complete, since its missing DATETIME2 - and possibly other types I made this issue.
Indeed, which is also more or less what I wrote - that you probably cannot make two methods ever perform as fast as GetSqlWhatever.
So if I focus my attention to getting a SqlDateTime2 into runtime, and if I succeed - it would make sense in my opinion to also implement a GetSqlDateTime2 method in this library?
"owners"? Who are you guys commenting - just random dudes on the internet? Who do I need to convince that it is needed?
Thank you, but hopefully I should be able to manage on my own, unless there are some requirements that I cannot see. But if something is accepted, then I will contact you if needed
Yes - that would solve all issues (if you also added a version that handled strings/byte arrays etc. i.e. objects :-) ) Surely SqlDataReader is not compatible with .NET 1 which support ended in 2009 and nullable came with .NET 2 as far as I remember. |
So if calling IsDbNull takes up ~12-14ns - then it means the GetWhatEver methods on average is slower than the GetSqlWhatever methods - since my test clearly shows that the GetSqlWhatever was mostly faster - and because of how "NULL" is handled in this library, you are forced to call IsDbNull - unless you are 100% sure that NULL cannot be returned from the database. In terms of the actual time taken - you are forgetting one crucial part of a system - the time it takes to start, i.e. the time it takes from you click "start" in ServiceManager do your systemctl start whatever to its fully operational. This is where shaving off time can be crucial - if you are having a production issue and you need to spin up a replacement server and you can shave 1 minute of startup time, simply because of using a different method in sql client - it is definately worth it - not just for a tiny fraction of users. So - who do I need to convince that a change is needed if its no you two? |
Ok, but I guess it do not have to be Methods did not even have to be generic, although that would require a full range of methods instead of just one - but then no type checking would have to be done inside the method, in case some person decided to try with their own homemade structs. i.e. it would require a full range of I hope the powers in charge can see the reason for adding a SqlDateTime2 struct - otherwise lifting accessibility of the required internal structs/methods could help people implement their own version of it, i.e. if |
@bjornbouetsmith I'd recommend that you make the PR, then we can review it with other team members and with the strong support from @roji and @Wraith2 we can decide if we are going to proceed or not. |
@JRahnama What PR is that - and for what - several solutions has been discussed? |
@bjornbouetsmith First off, thanks for submitting the idea and showing where it has performance benefits. @roji and @Wraith2 are valued contributors here and I appreciate their analysis. We all have jobs and existing priorities so it's really up to the requestor to make the case for a change and get others to agree that the change should be prioritized over other people's needs. Your needs are obviously very performance oriented. Implementing the change yourself would be the quickest way forward. You just need to make sure your changes would be accepted, as @JRahnama, said. It's really up to you to consolidate input and suggestions into a coordinated feature request. As it is, this thread is getting quite long. I suggest adding a proposal summary in your original comment for this issue based on discussions. I'm not sure what the technical proposal is, either. (See CONTRIBUTING for more tips.) My opinion: I think it would be difficult to push a new type into System.Data.SqlTypes. If you were successful, it would only go into future versions of .NET. Then you would still have to get SqlClient changed to use it. And that would require SqlClient to add a binary specific to that future target. If there is a solution that is limited to SqlClient that you yourself can implement and that fits into the project's goals of stability, reliability, backwards compatibility, maintainability, and performance, then that sets you up for a better chance of success. Regards, |
@bjornbouetsmith ^ that's official guidance. 😁 So, how about implementing a
Does that seems like a start of a plan? |
Perfect
I will get right on that - I assume I just fork and then make a PR when I have something I think is finished and of sufficient quality? It has been a while since I have contributed to code via GitHub - last time was with NLog where I "funnily" enough also worked with performance and memory.
It does indeed. |
@bjornbouetsmith you can also have a look at the contributor guide... |
Just a quick question - why does the source code have two implementations of everything? One for netcore/netstandard and one for netframework? - I can see some of the implementations are slightly different, but most are exactly the same. For simplicity's sake? |
@bjornbouetsmith No - for complexity's sake 😉 Actually, for historical reasons, the .NET Core implementation was branched out some years ago (as System.Data.SqlClient), and was not kept in sync with the .NET Framework implementation, and then copied into this project (Microsoft.Data.SqlClient). Work is ongoing to make more code common, the shared files are located here: https://github.com/dotnet/SqlClient/tree/master/src/Microsoft.Data.SqlClient/src/Microsoft/Data |
I'm slowly working through merging the two codebases wherever I can and eventually we'll get to the point where the MS team will be needed to make policy decisions about each of the differences. Until then i'd prefer if you add any new code to the shared folder and add it identically in both projects. Start with just one build and then replicate the changes into the other. This is one of the reasons I offered to help because I'm familiar with the project peculiarities. If you can get the |
It is great that this project is available as open source, but the weight of 20 years of .NET code is present. |
So I can dump my code into https://github.com/dotnet/SqlClient/tree/master/src/Microsoft.Data.SqlClient/src - and add as Of course the changes I have to make into existing classes need to go to both places. |
I think what @Wraith2 is suggesting is that you add the files there, and for example just link from the .NET Framework project, and make tests pass for that, then he will add to .NET Core and add docs links and other quirky stuff. |
Ok, and how do you feel about me adding |
Not favourably. We've avoided it in the past by including the file directly in the test assembly and writing tests that way. You'll want to add the type tests to the FunctionalTests project because they don't require an active sql connection, anything that does require a database would go in the ManualTests. I've done a shared file to access internal classes in this PR #827 |
@Wraith2 Makes it hard to test my new Property then - since there are two SqlBuffer versions - and they are not the same implementation - so I would need to do conditional includes based on the framework built - that is ugly. But if that is what is needed so I can write a couple of tests for the DateTime2 property, then so be it - but it would be nice if the two versions of SqlBuffer could be merged into one. Edit: Just tried - and it escalates, then I also have to link SqlCachedBuffer,SqlCahedStream,SqlMetaDataPriv,SqlReource and so forth. It just explodes - and eventually I would probably have to include the entire SqlClient into the test project. |
You don't need to test SqlBuffer in the functional tests because it'll be covered in the manual tests through doing real queries. The new type should be the only one you need to add tests for and that file will be shared. |
Pull request created for your initial comments. |
Describe the bug
It seems like DATETIME2 sql server datatype is not fully or correctly implemented in SqlDataReader.
using method GetDateTime - it handles DATETIME2 correctly
using method GetSqlDateTime - it throws an
System.InvalidCastException:
To reproduce
Expected behavior
I expect that GetSqlDateTime handles DATETIME2 and returns a SqlDateTime struct
Further technical details
Microsoft.Data.SqlClient version: 2.1.0
.NET target: Framework 4.7.2
SQL Server version: 15.0.4079.3 (2019)
Operating system: Windows 10
Additional context
Why I found this issue, is because I was profiling our code and saw that SqlDataReader.IsDbNull took up a lot of time since we need to check every column for null, since GetXX methods does not handle DBNull - and then I found SqlDataReader.GetSqlXX methods which handles DBNull.
Basically trying to get rid of the overhead in SqlDataReader
by swithing to
The IsNull property in the SqlTypes
i.e.
The text was updated successfully, but these errors were encountered: