Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api): add DateValue.epoch api for computing days since epoch #9856

Merged
merged 26 commits into from
Aug 26, 2024

Conversation

cpcloud
Copy link
Member

@cpcloud cpcloud commented Aug 16, 2024

Description of changes

Adds DateValue.epoch method and underlying ops.UnixDate.

This was originally meant to help with window function queries with range
bounds where the backend doesn't support intervals in preceding/following, but
I think it has general utility and is much more clear IMO than casting a
date value to an integer.

@cpcloud cpcloud added feature Features or general enhancements timestamps Issues related to the timestamp API labels Aug 16, 2024
@cpcloud cpcloud force-pushed the epoch-date branch 2 times, most recently from 4bec42e to 26b224f Compare August 16, 2024 22:17
@cpcloud cpcloud added the ci-run-cloud Add this label to trigger a run of BigQuery, Snowflake, and Databricks backends in CI label Aug 17, 2024
@ibis-docs-bot ibis-docs-bot bot removed the ci-run-cloud Add this label to trigger a run of BigQuery, Snowflake, and Databricks backends in CI label Aug 17, 2024
@gforsyth
Copy link
Member

Propose to codify the word uxfail as you've implicitly defined it above.

@cpcloud cpcloud force-pushed the epoch-date branch 2 times, most recently from 25e9f5f to 2c2215c Compare August 19, 2024 14:05
@jcrist
Copy link
Member

jcrist commented Aug 19, 2024

Is this operation semantically the same as date_col.delta(ibis.date(1970, 1, 1), "d")? If so, is there a reason why that's insufficient? We might even add an ibis.epoch() function that's effectively an alias for ibis.date(1970, 1, 1) in this context. I find the delta spelling a bit easier to understand just from looking at it.

Alternatively, could we spell this as epoch_days() (to match the existing epoch_seconds)? I find the epoch() method to be a bit confusing, since it's not clear to me from the name alone what it's referencing (days_since_epoch would be even clearer, but 🤷).

@cpcloud
Copy link
Member Author

cpcloud commented Aug 19, 2024

Is this operation semantically the same as date_col.delta(ibis.date(1970, 1, 1), "d")? If so, is there a reason why that's insufficient? We might even add an ibis.epoch() function that's effectively an alias for ibis.date(1970, 1, 1) in this context. I find the delta spelling a bit easier to understand just from looking at it.

It's not insufficient, just more verbose than I'd like.

Alternatively, could we spell this as epoch_days() (to match the existing epoch_seconds)? I find the epoch() method to be a bit confusing, since it's not clear to me from the name alone what it's referencing (days_since_epoch would be even clearer, but 🤷).

epoch_days seems reasonable and keeps consistency with epoch_seconds.

My original plan was to have epoch on DateValue and TimestampValue (with the latter having a unit parameter) but I think three separate methods epoch_secs/epoch_millis/epoch_micros is probably better (with an eventual epoch_seconds deprecation).

gforsyth pushed a commit that referenced this pull request Aug 22, 2024
@cpcloud cpcloud added this to the 9.4 milestone Aug 22, 2024
@cpcloud cpcloud reopened this Aug 22, 2024
@cpcloud cpcloud requested a review from jcrist August 23, 2024 14:45
Copy link
Member

@jcrist jcrist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall happy with the user-facing API, just a few comments on the implementation.

ibis/expr/types/temporal.py Outdated Show resolved Hide resolved
@@ -27,6 +28,32 @@
sge.ArraySort: rename_func("arraySort"),
sge.LogicalAnd: rename_func("min"),
sge.LogicalOr: rename_func("max"),
sge.DateFromParts: lambda self, e: sg.func(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually find reading these in the sqlglot dialect harder to understand than the previous version in the compiler. These are pretty complicated implementations (length-wise), doing more in the dialect means we have additional places to check for complex logic when trying to understand how something is compiled (compiler, rewrite rules, our dialects.py file, sqlglot's dialects, sqlglot's rewrite rules). If possible, I'd find it easier to understand compilation if we try to keep most of our implemented logic in the compiler itself rather than the dialects here.

Just a -0.25 though, if you feel strongly about doing it this way then that's fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately without these (or something like it), it's a non-starter to implement this without using a new operation. Happy to try the no-new-op route, but to do that we need sqlglot to correctly compile DateFromParts correctly for all the backends it can, which requires porting the previous code to this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I agree with your point, but this happened to be an annoying case where it seemed better to centralize this logic in sqlglot rather than repeating in the compiler.

ibis/expr/types/temporal.py Outdated Show resolved Hide resolved
@cpcloud cpcloud force-pushed the epoch-date branch 2 times, most recently from 32cb541 to 52ddabf Compare August 23, 2024 22:56
@jcrist
Copy link
Member

jcrist commented Aug 26, 2024

LGTM! Leaving the merge to you - I assume you want to squash some/all of these commits.

@cpcloud cpcloud merged commit 8b0fb66 into ibis-project:main Aug 26, 2024
81 checks passed
@cpcloud cpcloud deleted the epoch-date branch August 26, 2024 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Features or general enhancements timestamps Issues related to the timestamp API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants