-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support timestamps and steps of less than a day for range/generate_series #12400
Conversation
@@ -109,8 +121,11 @@ impl ScalarUDFImpl for Range { | |||
match args[0].data_type() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can probably factor our this block to avoid make a modifications twice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is so much duplication in these UDF's. I think it's best to push the cleanup of that to a separate PR and keep this one as small as possible.
return exec_err!( | ||
"arguments length must be 3 for {}", | ||
if include_upper { | ||
"generate_series" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"generate_series" | |
"GENERATE_SERIES" |
if include_upper { | ||
"generate_series" | ||
} else { | ||
"range" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"range" | |
"RANGE" |
@@ -394,3 +412,136 @@ fn gen_range_date(args: &[ArrayRef], include_upper: bool) -> Result<ArrayRef> { | |||
|
|||
Ok(arr) | |||
} | |||
|
|||
fn gen_range_timestamp(args: &[ArrayRef], include_upper: bool) -> Result<ArrayRef> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn gen_range_timestamp(args: &[ArrayRef], include_upper: bool) -> Result<ArrayRef> { | |
fn gen_range_timestamp(args: &[ArrayRef], include_upper_bound: bool) -> Result<ArrayRef> { |
|
||
// values are timestamps | ||
let values_builder = if let Some(start_tz_str) = start_tz_opt { | ||
TimestampNanosecondBuilder::new().with_timezone(&**start_tz_str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that looks somewhat weird, do we really need so many indirections here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it was written, yes, it was required. I changed it up to use map_or_else to accomplish the same thing.
_ => { | ||
internal_err!( | ||
"Unexpected argument type for {} : {}", | ||
if include_upper { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this error block is third time showing up, perhaps we can factor it out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Omega359 great work
Please check some code comments
For tests its also good to have examples like
series of date2 less than date1 with positive interval
fractional intervals - which might be tricky
Thanks @comphead for the review, much appreciated. I'll go through the proposed changes and comments tonight or tomorrow and push updates. |
I think that may already be in there for range and generate_series
I added this one - I think this may be along the lines of what you are thinking of:
|
I plan to review this as well tomorrow -- thank you @Omega359 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Omega359 for this PR and @dmitrybugakov and @comphead for the reviews. I think this PR is good to go from my perspective (I left some optional nit picky comments but I don't think they are required)
I also ran the reproducer from @Abdullahsab3 on #11822 locally on this branch and it appears to work 👌
> select UNNEST(generate_series('2023-01-01 10:00:00'::timestamp, '2023-01-31 23:59:00'::timestamp, '1 hour'::interval)) as date_time;
+---------------------+
| date_time |
+---------------------+
| 2023-01-01T10:00:00 |
| 2023-01-01T11:00:00 |
| 2023-01-01T12:00:00 |
| 2023-01-01T13:00:00 |
| 2023-01-01T14:00:00 |
| 2023-01-01T15:00:00 |
| 2023-01-01T16:00:00 |
| 2023-01-01T17:00:00 |
| 2023-01-01T18:00:00 |
| 2023-01-01T19:00:00 |
| 2023-01-01T20:00:00 |
| 2023-01-01T21:00:00 |
| 2023-01-01T22:00:00 |
| 2023-01-01T23:00:00 |
| 2023-01-02T00:00:00 |
| 2023-01-02T01:00:00 |
| 2023-01-02T02:00:00 |
| 2023-01-02T03:00:00 |
| 2023-01-02T04:00:00 |
| 2023-01-02T05:00:00 |
| 2023-01-02T06:00:00 |
| 2023-01-02T07:00:00 |
| 2023-01-02T08:00:00 |
| 2023-01-02T09:00:00 |
| 2023-01-02T10:00:00 |
| 2023-01-02T11:00:00 |
| 2023-01-02T12:00:00 |
| 2023-01-02T13:00:00 |
| 2023-01-02T14:00:00 |
| 2023-01-02T15:00:00 |
| 2023-01-02T16:00:00 |
| 2023-01-02T17:00:00 |
| 2023-01-02T18:00:00 |
| 2023-01-02T19:00:00 |
| 2023-01-02T20:00:00 |
| 2023-01-02T21:00:00 |
| 2023-01-02T22:00:00 |
| 2023-01-02T23:00:00 |
| 2023-01-03T00:00:00 |
| 2023-01-03T01:00:00 |
| . |
| . |
| . |
+---------------------+
734 row(s) fetched. (First 40 displayed. Use --maxrows to adjust)
Elapsed 0.014 seconds.
make_scalar_function(|args| gen_range_timestamp(args, false))(args) | ||
} | ||
dt => { | ||
exec_err!("unsupported type for RANGE. Expected Int64, Date32 or Timestamp, got: {dt}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️ for a much better error
Ok(arr) | ||
} | ||
|
||
fn cast_timestamp_arg( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
You could potentially use as_timestamp_nanosecond_array
(the datafusion version) from: https://docs.rs/datafusion/latest/datafusion/common/cast/index.html
And then call timezone
to get the timezone from the array
I personally found the use of the word cast
somewhat confusing as function doesn't actually call the arrow cast
kernel, but instead does the rust as_any
stuff (which is like a cast in C or Java)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My programming roots are showing through :) I can take a look at that
[2021-01-01T00:00:00, 2021-01-01T01:00:00, 2021-01-01T02:00:00, 2021-01-01T03:00:00, 2021-01-01T04:00:00, 2021-01-01T05:00:00, 2021-01-01T06:00:00, 2021-01-01T07:00:00, 2021-01-01T08:00:00, 2021-01-01T09:00:00, 2021-01-01T10:00:00, 2021-01-01T11:00:00, 2021-01-01T12:00:00, 2021-01-01T13:00:00, 2021-01-01T14:00:00, 2021-01-01T15:00:00] | ||
|
||
query ? | ||
select generate_series('2021-01-01T00:00:00EST'::timestamp, '2021-01-01T15:00:00-12:00'::timestamp, INTERVAL '1' HOUR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it worth testing when the end timestamp is not an exact increment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a test for -5.5sec (in nanoseconds) somewhere in that file.
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
🚀 - thanks again @comphead @Omega359 @dmitrybugakov and @Abdullahsab3 |
Which issue does this PR close?
Closes #11822
Rationale for this change
Add support for timestamps and steps less than a day for timestamps.
What changes are included in this PR?
Code, tests, doc.
Are these changes tested?
Yes.
Are there any user-facing changes?
No.