Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-13977: [Format] clarify leap seconds for interval type #11138

Closed
wants to merge 3 commits into from

Conversation

houqp
Copy link
Member

@houqp houqp commented Sep 11, 2021

Like other time units, we don't account for leap seconds for interval types. For MONTH_DAY_NANO, we do account for leap days.

… type

Like other time units, we don't account for leap seconds for interval
types. For MONTH_DAY_NANO, we do account for leap days.
@github-actions
Copy link

@github-actions
Copy link

⚠️ Ticket has no components in JIRA, make sure you assign one.

@github-actions
Copy link

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

@houqp
Copy link
Member Author

houqp commented Sep 11, 2021

For the context, this came out of the exercise to define partial order for Interval Types in jorgecarleitao/arrow2#398.

@jorgecarleitao
Copy link
Member

If we do not account for leap seconds, isn't this type equivalent to Duration[ms]?

I.e. Given MS_IN_DAY = 60*60*24*1000, if Interval(1d,MS_IN_DAY) == Interval(2d,0), then Interval(a,b) == Duration[ms](a*MS_IN_DAY + b) which makes them semantically equivalent (and thus redundant?). This is how I would conclude that Interval(1d,MS_IN_DAY) != Interval(2d,0), i.e. to capture different semantics.

OTOH, many systems do not handle leap seconds and in 2012 there were no plans to support leap seconds in Postgres.

So, my conclusion atm is that the decision tree is

care about leap Type
days Interval MonthDaysNs
seconds Interval DaysMs
days and seconds Extension
None Duration

If we ignore leap seconds in DaysMs, the table would become

care about leap Type
days Interval MonthDaysNs
seconds Extension
days and seconds Extension
None Duration

and we may deprecate DaysMs without loss of generality?


Since DataFusion follows postgres, I shouldn't we stick to MONTHS_DAYS_NS so that we capture its behavior of storing months, days and (nano)seconds separately,

postgres=# SELECT  interval '11 months + 28 days + 86399 seconds' + interval '1 month + 1 day + 1 second';
        ?column?         
-------------------------
 1 year 29 days 24:00:00
(1 row)

@houqp
Copy link
Member Author

houqp commented Sep 12, 2021

@jorgecarleitao even if we don't account for leap seconds, Interval(1d,MS_IN_DAY) is still not equal to Interval(2d,0) due to daylight saving adjustments. Here is the quote from the comment:

// A "calendar" interval which models types that don't necessarily
// have a precise duration without the context of a base timestamp (e.g.
// days can differ in length during day light savings time transitions).

Since DataFusion follows postgres, I shouldn't we stick to MONTHS_DAYS_NS so that we capture its behavior of storing months, days and (nano)seconds separately,

Are you suggesting that we should use MONTHS_DAYS_NS everywhere in datafusion and drop day_ms entirely?

@jorgecarleitao
Copy link
Member

You are right 👍

I understood that comment to differentiate it from Duration, that denotes a physical time interval. Specifically, my understanding is that Duration[s](1) equals 1 second as per its definition of atomic clocks and stuff.

In opposition to an Interval, that corresponds time units that are constructs associated with a calendar (historically associated with the Earth and moon movements), such as "Month", "day", etc.

The main difference (in my understanding) is that add_duration(now,duration(MS_IN_DAY)) - now = duration(MS_IN_DAY) and add_interval(now,interval(1d)) - now = interval(1d), but duration(MS_IN_DAY) != interval(1d) because we may cross a (calendar) offset, either due to a leap second, leap hour (daylight, which I forgot about in the comment above), leap day (February), etc.

@houqp
Copy link
Member Author

houqp commented Sep 12, 2021

Good summary 👍 this aligns with what I have in mind as well.

@emkornfield
Copy link
Contributor

Could you send an e-mail out to dev@ to make sure this gets wide visibility

@jorgecarleitao
Copy link
Member

Updating the summary table above with the result of the follow-up discussion,

care about leap Type
No Duration
hours Interval DaysMs
days and hours Interval MonthsDaysNs
days, hours, and seconds Extension

format/Schema.fbs Outdated Show resolved Hide resolved
@houqp
Copy link
Member Author

houqp commented Sep 17, 2021

I removed sentences that mentioned ordering and leap days. now the only change is mention of leap seconds to be consistent with other temporal types.

@houqp houqp changed the title ARROW-13977: [Format] clarify leap seconds and leap days for interval type ARROW-13977: [Format] clarify leap seconds for interval type Sep 17, 2021
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. This is fine with me.

format/Schema.fbs Outdated Show resolved Hide resolved
@houqp
Copy link
Member Author

houqp commented Sep 21, 2021

@edponce pushed a new commit to apply your suggestions, PTAL.

Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@houqp houqp closed this in 7f7f72d Sep 26, 2021
@houqp houqp deleted the qp_interval branch September 26, 2021 05:19
ViniciusSouzaRoque pushed a commit to s1mbi0se/arrow that referenced this pull request Oct 20, 2021
Like other time units, we don't account for leap seconds for interval types. For MONTH_DAY_NANO, we do account for leap days.

Closes apache#11138 from houqp/qp_interval

Authored-by: Qingping Hou <dave2008713@gmail.com>
Signed-off-by: Qingping Hou <dave2008713@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants