You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge?
Data of type DateTime::Interval (spans of time measured in calendar units like days and months) are tricky business because the absolute size (in number of seconds) is not a fixed quantity. For example the 1 month is 28 days for February but 1 month is 31 days in December.
This makes the seemingly simple operation of comparing two intervals quite complicated in practice. For example is 1 month more or less than 30 days? The answer depends on what month you are talking about.
Arrow also includes a type DateTime::Duration that is a fixed width of time and suffers from many fewer challenges, but may not be as intuitive for humans to use;
Currently DataFusion \ errors with "not supported" when trying to compare an interval. This is
good as it doesn't get inconsistent answers
bad as it might imply it is a feature gap to be filled rather than an intentional lack (which seems to have happened in Interval Comparison arrow-rs#5180)
For example you can't compare an interval of 1 month and 100 days:
❯ select interval '1 month' = interval '100 days';
This feature is not implemented: Unsupported interval operator: Eq
❯ select interval '1 month' < interval '100 days';;
This feature is not implemented: Unsupported interval operator: Lt
Nor can you compute things like min/max on an interval
❯ create table foo(i interval) as values (interval '1 month'), (interval '100 days');
0 rows in set. Query took 0.008 seconds.
❯ select max(i) from foo;
Internal error: Min/Max accumulator not implemented for type Interval(MonthDayNano).
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
❯ select min(i) from foo;
Internal error: Min/Max accumulator not implemented for type Interval(MonthDayNano).
(those should probably be unsupported errors rather than internal errors 🤔 )
select interval '100 days';
+---------------------------------------------------------+
| IntervalMonthDayNano("1844674407370955161600") |
+---------------------------------------------------------+
| 0 years 0 mons 100 days 0 hours 0 mins 0.000000000 secs |
+---------------------------------------------------------+1 row inset. Query took 0.000 seconds.
❯ select interval '1 month';
+-------------------------------------------------------+
| IntervalMonthDayNano("79228162514264337593543950336") |
+-------------------------------------------------------+
| 0 years 1 mons 0 days 0 hours 0 mins 0.000000000 secs |
+-------------------------------------------------------+1 row inset. Query took 0.001 seconds.
Describe the solution you'd like
It would be nice to discuss and determine what behavior we would like. Saying "datafusion will not support this kind of comparison" would be fine as an outcome, for the reasons explained above
Describe alternatives you've considered
We could also adopt what postgres does which is to simply convert months into 30 days and treat them as a fixed quantity
We could also adopt what postgres does which is to simply convert months into 30 days and treat them as a fixed quantity
Coercing to a duration is an interesting option, although this would inherently be lossy and so could also be confusing...
Perhaps we could have a middle-ground where we add syntactic sugar to make coercing to a duration easy, and then direct users to this in the error message? Perhaps something like '1 month'::duration_nanos or something... 🤔
We fixed this in our fork with the semantics discussed in the linked arrow-rs issue. Happy to contribute it upstream if there is interest in it, just let us know.
Is your feature request related to a problem or challenge?
Data of type DateTime::Interval (spans of time measured in calendar units like days and months) are tricky business because the absolute size (in number of seconds) is not a fixed quantity. For example the
1 month
is 28 days for February but1 month
is 31 days in December.This makes the seemingly simple operation of comparing two intervals quite complicated in practice. For example is
1 month
more or less than30 days
? The answer depends on what month you are talking about.Arrow also includes a type DateTime::Duration that is a fixed width of time and suffers from many fewer challenges, but may not be as intuitive for humans to use;
Currently DataFusion \ errors with "not supported" when trying to compare an interval. This is
For example you can't compare an interval of
1 month
and100 days
:Nor can you compute things like min/max on an interval
(those should probably be unsupported errors rather than internal errors 🤔 )
Describe the solution you'd like
It would be nice to discuss and determine what behavior we would like. Saying "datafusion will not support this kind of comparison" would be fine as an outcome, for the reasons explained above
Describe alternatives you've considered
We could also adopt what postgres does which is to simply convert months into 30 days and treat them as a fixed quantity
Additional context
I made apache/arrow-rs#5192 to try and clarify the semantics in arrow
The text was updated successfully, but these errors were encountered: