Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support EXTRACT on intervals and durations #12514

Merged
merged 1 commit into from
Sep 20, 2024
Merged

Conversation

nrc
Copy link
Contributor

@nrc nrc commented Sep 17, 2024

Which issue does this PR close?

Closes #6327.
Closes #7097

What changes are included in this PR?

Support extract/date_part by passing through requests to Arrow (where functionality was added in apache/arrow-rs#6071 and apache/arrow-rs#6246). There were also two changes required to the seconds helper function:

  • If the number of nanoseconds overflows a 32 bit integer, (e.g., a DayTime interval of 5 seconds), then querying the nanoseconds returns null. However, it is reasonable to return a number of seconds in this case. However, if there is a non-round number of seconds, this will give an inaccurate answer, e.g., 5.1 seconds will get returned as 5, but I'm not sure how to solve this - returning null for round numbers of seconds seems bad. We could fall back to milliseconds, but that complicates the code, is still imperfect, and has less predictable results. I also had a question about how I did this (REVIEW comment in the code).
  • The number of seconds was previously counted twice in cases where seconds are reflected in the number of nanoseconds.

A change which would fix both these issues (and I assume is an invariant which previously held) would be to not include whole seconds when returning nanoseconds. However, I think that breaks compat with Postgres.

Are these changes tested?

Yes, sqllogictests. The tests are not exhaustive because these features are properly tested in Arrow.

Are there any user-facing changes?

Users can use extract or date_part with intervals and durations.

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) functions labels Sep 17, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nrc -- this is looking really close to me.

I think we should reasses the null handling (I left a suggestion) as well as some additional testing but otherwise this is ready to go 🚀

datafusion/functions/src/datetime/date_part.rs Outdated Show resolved Hide resolved
datafusion/sqllogictest/test_files/expr.slt Show resolved Hide resolved
datafusion/sqllogictest/test_files/expr.slt Show resolved Hide resolved
Signed-off-by: Nick Cameron <nrc@ncameron.org>
@nrc nrc force-pushed the interval-extract-2 branch from 2d4732d to 3f91868 Compare September 20, 2024 02:13
@nrc
Copy link
Contributor Author

nrc commented Sep 20, 2024

Null handling and tests addressed. It took a tiny bit more code to make epoch work.

@alamb ready for re-review, thanks!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great -- thank you @nrc and @samuelcolvin and @alexmojaki

})?;
Ok(Arc::new(r))
} else {
// Nulls in secs are preserved, nulls in subsecs are treated as zero to account for the case
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb alamb merged commit 21ec332 into apache:main Sep 20, 2024
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
functions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

is there a way to get a result similar to datediff function Extract from interval type failed
4 participants