Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace DvDateTimeZone, DvDateTime, DvTimeSpan with .NET standard types. #693

Closed
wants to merge 24 commits into from

Conversation

codemzs
Copy link
Member

@codemzs codemzs commented Aug 19, 2018

fixes #673

@codemzs codemzs requested review from TomFinley, dotnet-bot, Ivanidzo4ka, eerhardt and Zruty0 and removed request for dotnet-bot August 19, 2018 07:10
@codemzs
Copy link
Member Author

codemzs commented Aug 19, 2018

@TomFinley @eerhardt Do we need missing value support for datetime, timespan, datetimeoffset? These are structs so may be use default as missing values? For now I have assumed there is no missing value indicator but I can add it back. #Resolved

@@ -54,11 +54,11 @@ public CodecFactory(IHostEnvironment env, MemoryStreamPool memPool = null)
RegisterSimpleCodec(new UnsafeTypeCodec<ulong>(this));
RegisterSimpleCodec(new UnsafeTypeCodec<Single>(this));
RegisterSimpleCodec(new UnsafeTypeCodec<Double>(this));
RegisterSimpleCodec(new UnsafeTypeCodec<DvTimeSpan>(this));
RegisterSimpleCodec(new UnsafeTypeCodec<TimeSpan>(this));
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka Aug 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UnsafeTypeCodec [](start = 36, length = 15)

Is it still need to be UnsafeTypeCodec? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should it be? I have defined TimeSpanUnsafeTypeOps to handle this type.


In reply to: 211110052 [](ancestors = 211110052)

@TomFinley
Copy link
Contributor

Hi @codemzs not sure I understand, are you proposing that the type for these structures be, for instance, not DateTime but DateTime? given that the default value for DataTime is not sensible?


In reply to: 414136160 [](ancestors = 414136160)

kind = DataKind.DT;
else if (type == typeof(DvDateTimeZone) || type == typeof(TimeZoneInfo))
else if (type == typeof(DateTimeOffset) || type == typeof(TimeZoneInfo))
Copy link
Contributor

@TomFinley TomFinley Aug 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TimeZoneInfo [](start = 70, length = 12)

In what way does a TimeZoneInfo map into a DateTimeOffset? #Resolved

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this seems totally wrong.


In reply to: 211315779 [](ancestors = 211315779)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a miss, I did intend to remove it.


In reply to: 211662608 [](ancestors = 211662608,211315779)

@@ -135,7 +135,7 @@ public bool IsTimeSpan
}

/// <summary>
/// Whether this type is a DvDateTime.
/// Whether this type is a DateTime.
Copy link
Contributor

@TomFinley TomFinley Aug 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DateTime [](start = 35, length = 8)

<see tags perhaps #Resolved

@TomFinley
Copy link
Contributor

TomFinley commented Aug 20, 2018

            return true;

I know you didn't write this code but this if (...) return false; return true; type code is incredibly annoying to me...

Maybe:

Contracts.Assert((this == DateTimeType.Instance) == (this is DateTimeType));
return this is DateTimeType

Similar cleanup possible for the other Is properties here. #Resolved


Refers to: src/Microsoft.ML.Core/Data/ColumnType.cs:148 in e0d66b0. [](commit_id = e0d66b0, deletion_comment = False)

return true;
}
dst = DvDateTime.NA;

return IsStdMissing(ref src);
Copy link
Contributor

@TomFinley TomFinley Aug 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsStdMissing(ref src); [](start = 19, length = 22)

So I feel like if you parse missing into a value type that does not support missing, we might expect the result to be that we would throw, rather than just the default value. #Resolved

return true;
}
dst = DvDateTimeZone.NA;

return IsStdMissing(ref src);
Copy link
Contributor

@TomFinley TomFinley Aug 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsStdMissing [](start = 19, length = 12)

Similar for this. #Resolved

@TomFinley
Copy link
Contributor

TomFinley commented Aug 20, 2018

I might have expected to see some change where we take out DvDateTime, Dv... etc. Is it not so? #Resolved

{
private List<short> _offsets;
Copy link
Member

@eerhardt eerhardt Aug 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure on how concerned we need to be with sizes, but an offset can only be in the range of -14 to 14 hours. So using a long seems like a bit of an overkill.

It also appears that previously the _offsets was the number of minutes in the offset. Now we are using the number of ticks in the offset. Do we need to be concerned with this difference? #Resolved

Copy link
Member Author

@codemzs codemzs Aug 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.NET standard type has offsets in ticks. We could convert offset in minutes when writing to disk and convert it back to ticks when reading but we may lose precision. What do you think?


In reply to: 211667357 [](ancestors = 211667357)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DateTimeOffset exposes its offset as a TimeSpan, but internally it uses short and in minutes.

https://github.com/dotnet/coreclr/blob/9499b08eefd895158c3f3c7834e185a73619128d/src/System.Private.CoreLib/shared/System/DateTimeOffset.cs#L51-L53

https://github.com/dotnet/coreclr/blob/9499b08eefd895158c3f3c7834e185a73619128d/src/System.Private.CoreLib/shared/System/DateTimeOffset.cs#L286-L292

From everything I can find online (ISO8601, RFC3339, SQL Server doc), the offset supports the range -14 to 14 hours, and only supports minute precision.


In reply to: 212488658 [](ancestors = 212488658,211667357)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot! this is helpful. I'm now writing the ticks in minutes as short values because they are already validated when TimeSpan is created and when reading I convert the minutes back into Ticks when creating the TimeSpan object.


In reply to: 212670585 [](ancestors = 212670585,212488658,211667357)

@@ -906,11 +906,11 @@ protected Func<bool> GetColumnComparer(IRow r1, IRow r2, int col, ColumnType typ
case DataKind.Bool:
return GetComparerOne<DvBool>(r1, r2, col, (x, y) => x.Equals(y));
case DataKind.TimeSpan:
return GetComparerOne<DvTimeSpan>(r1, r2, col, (x, y) => x.Equals(y));
return GetComparerOne<TimeSpan>(r1, r2, col, (x, y) => x.Ticks == y.Ticks);
Copy link
Contributor

@TomFinley TomFinley Aug 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TimeSpan [](start = 42, length = 8)

As near as I can see, both TimeSpan and DateTime are directly comparable, that might be better. #Resolved

//https://github.com/dotnet/coreclr/blob/9499b08eefd895158c3f3c7834e185a73619128d/src/System.Private.CoreLib/shared/System/DateTimeOffset.cs#L51-L53
//https://github.com/dotnet/coreclr/blob/9499b08eefd895158c3f3c7834e185a73619128d/src/System.Private.CoreLib/shared/System/DateTimeOffset.cs#L286-L292
//From everything online(ISO8601, RFC3339, SQL Server doc, the offset supports the range -14 to 14 hours, and only supports minute precision.
_offsets.Add((short)(value.Offset.Ticks / TimeSpan.TicksPerMinute));
Copy link
Member

@eerhardt eerhardt Aug 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't this just be value.TotalMinutes? #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't see totalMinutes under value, did you mean under offset?


In reply to: 213152272 [](ancestors = 213152272)

Copy link
Member

@eerhardt eerhardt Aug 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, value.Offset.TotalMinutes #Resolved

{
Contracts.Assert(!_disposed);
value = new DvDateTimeZone(_ticks[_index], _offsets[_index]);
value = new DateTimeOffset(new DateTime(_ticks[_index]), new TimeSpan(_offsets[_index] * TimeSpan.TicksPerMinute));
Copy link
Member

@eerhardt eerhardt Aug 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new TimeSpan(_offsets[_index] * TimeSpan.TicksPerMinute) should be new TimeSpan(0, _offsets[_index], 0) #Resolved

RegisterOtherCodec("VBuffer", GetVBufferCodec);
RegisterOtherCodec("DvDateTimeZone", new DateTimeOffsetCodec(this).GetCodec);
RegisterOtherCodec("DvDateTime", new DateTimeCodec(this).GetCodec);
RegisterOtherCodec("DvTimeSpan", new UnsafeTypeCodec<TimeSpan>(this).GetCodec);
RegisterOtherCodec("Key", GetKeyCodec);
Copy link
Contributor

@TomFinley TomFinley Aug 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the new test helped show up some issues.

BTW where are we testing the writing of data into what amounts to a new format? #Resolved

Copy link
Member Author

@codemzs codemzs Aug 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a test that reads from the old IDV format then it writes back as text and that is where writing of data into new format is tested. Also for date/time the serialization format has not changed. I believe it only changed for bools.


In reply to: 213349875 [](ancestors = 213349875)

Copy link
Contributor

@TomFinley TomFinley Aug 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a simple way to do this is modify your test a little bit.

Currently it is something like this.

TestCore("savedata", idvPath, "loader=binary", "saver=text", textOutputPath.Arg("dout"));

If we modify it to be this:

var intermediateData = /// some temp IDV file.
TestCore("savedata", idvPath, "loader=binary", "saver=text", textOutputPath.Arg("dout"));
TestCore("savedata", idvPath, "loader=binary", "saver=binary", intermediateData.ArgOnly("dout"));
TestCore("savedata", intermediateData, "loader=binary", "saver=text", textOutputPath.Arg("dout"));

That will test everything. #Resolved

@shauheen shauheen added the enhancement New feature or request label Aug 28, 2018
@shauheen shauheen added this to the 0818 milestone Aug 28, 2018
@shauheen shauheen removed this from the 0818 milestone Aug 30, 2018
…to dvdatetime

# Conflicts:
#	test/BaselineOutput/SingleDebug/SavePipe/TestParquetPrimitiveDataTypes-Data.txt
#	test/BaselineOutput/SingleDebug/SavePipe/TestParquetPrimitiveDataTypes-Schema.txt
#	test/BaselineOutput/SingleRelease/SavePipe/TestParquetPrimitiveDataTypes-Data.txt
#	test/BaselineOutput/SingleRelease/SavePipe/TestParquetPrimitiveDataTypes-Schema.txt
#	test/data/Parquet/alltypes.parquet
@codemzs
Copy link
Member Author

codemzs commented Sep 20, 2018

This change was committed via #863

@codemzs codemzs closed this Sep 20, 2018
@codemzs codemzs deleted the dvdatetime branch September 20, 2018 18:16
@ghost ghost locked as resolved and limited conversation to collaborators Mar 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

.NET data type system instead of DvTypes
5 participants