Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling of dates and timestamps before the 20th century #4563

Merged
merged 1 commit into from
Jul 30, 2020

Conversation

aalbu
Copy link
Member

@aalbu aalbu commented Jul 24, 2020

No description provided.

@cla-bot cla-bot bot added the cla-signed label Jul 24, 2020
@aalbu aalbu requested a review from findepi July 24, 2020 10:43
@losipiuk
Copy link
Member

nit: commit message summary to long

Comment on lines 122 to 123
// the default date when the Gregorian calendar was instituted (October 15, 1582)
private static final long GREGORIAN_CALENDAR_INTRODUCTION = new GregorianCalendar().getGregorianChange().getTime();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "the default" here?

The cutoff date defaults to 1582-10-15, but you are not using a constant.
Instead youre asking for locale/env specific date. Maybe

// The date when the Gregorian calendar was instituted, environment specific. Defaults to October 15, 1582.

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a constant. The actual change happened on different dates, in different countries, but GregorianCalendar doesn't have information for specific locales. Instead, it provides a method the client can call to specify the cutover date (setGregorianChange()). So unless that method is called, getGregorianChange() will always return the same value.

Anyway, after adding some more tests, it turns out that Joda millisecond values are not consistent with java.sql.Date even for more recent dates (as late as the 1800's), so I decided to use Joda only starting with the 20th century.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, after adding some more tests, it turns out that Joda millisecond values are not consistent with java.sql.Date even for more recent dates (as late as the 1800's), so I decided to use Joda only starting with the 20th century.

I forgot about that, but indeed. I think i remember finding a zone where java time and java.util.Calendar differed > 1900 (but before 1950).
java time and java.util.Calendar use the same tz data file, but they seem to parse it differently.

... I can only find as recent as 1890s in multiple zones (eg Europe/Warsaw or Asia/Aden), so maybe my memory is wrong. Using 1900 seems to be safe.

java time and joda seem to behave the same as long as they have the same tz data. But tz data shouldn't change for the past date/times.

This is some variation of the test code i am using: https://github.com/findepi/urandom-bits/blob/master/src/main/java/findepi/time/JodaJdkZoneDrift.java

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, that's good info.

if (millis >= GREGORIAN_CALENDAR_INTRODUCTION) {
return new Date(millis);
}
// the chronology used by default by Joda is not appropriate for dates preceding the introduction of the Gregorian calendar,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is "appropriate" in some way. I think it would be better to indicate the actual problem

Suggested change
// the chronology used by default by Joda is not appropriate for dates preceding the introduction of the Gregorian calendar,
// The chronology used by Joda is not consistent with java.sql.Date
for dates preceding the introduction of the Gregorian calendar.
// Same millisecond value represents a different year/month/day.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation says:

it is not historically accurate before 1583

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the "historically accurate" term

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of course it's not historically accurate for >1583 for some (many) countries as well

}
// the chronology used by default by Joda is not appropriate for dates preceding the introduction of the Gregorian calendar,
// so for such cases we are falling back to the more expensive GregorianCalendar; note that Joda also has a chronology that
// works for older dates, but it uses a slightly different algorithm, so we are sticking with the standard library
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the end user visible symptoms of these differences? are they test covered?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertions I have added for dates before the introduction of the Gregorian calendar were failing using Joda's chronology.

@aalbu aalbu force-pushed the jdbc-old-dates branch 2 times, most recently from 235660c to e9fcffe Compare July 28, 2020 15:58
@electrum electrum changed the title Handle dates/timestamps preceding the introduction of the Gregorian c… Handle dates preceding the introduction of the Gregorian calendar calendar Jul 29, 2020
@electrum
Copy link
Member

Please update the commit description to be a single line, without an asterisk:

Fixes handling of dates and timestamps before the 20th century.

The body should be wrapped at 72 characters and provide explanatory text. It's fine to use a bulleted list if that's the best way to explain something, but using a single bullet with nothing else looks strange. See https://chris.beams.io/posts/git-commit/

@@ -193,6 +225,45 @@ public void testObjectTypes()
assertEquals(rs.getTimestamp(column), Timestamp.valueOf(LocalDateTime.of(2018, 2, 13, 13, 14, 15, 555_555_556)));
});

// distant past, but apparently not an uncommon value in practice
checkRepresentation("TIMESTAMP '0001-01-01 00:00:00'", Types.TIMESTAMP, (rs, column) -> {
assertEquals(rs.getObject(column), Timestamp.valueOf(LocalDateTime.of(1, 1, 1, 0, 0, 0)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize the others do this, but it'd be better to factor this out

Timestamp expected = Timestamp.valueOf(LocalDateTime.of(1, 1, 1, 0, 0, 0);

That way it's clear to the reader that these are the same value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Others do it to avoid sharing a mutable objected as the expected state.

@findepi findepi merged commit 4efe924 into trinodb:master Jul 30, 2020
@findepi
Copy link
Member

findepi commented Jul 30, 2020

Merged, thanks!

@findepi findepi changed the title Handle dates preceding the introduction of the Gregorian calendar calendar Fix handling of dates and timestamps before the 20th century Jul 30, 2020
@findepi findepi mentioned this pull request Jul 30, 2020
8 tasks
@findepi findepi added this to the 340 milestone Jul 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants