Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(rust, python): add dt.combine for combining date and time components #6121

Merged
merged 4 commits into from
Jan 18, 2023

Conversation

alexander-beedie
Copy link
Collaborator

@alexander-beedie alexander-beedie commented Jan 8, 2023

Background:

Saw two references to this not being straightforward recently, once in the shiny new Modern Polars guide...

"It turns out that combining pl.Date and pl.Time isn’t that straightforward: we have to convert them both to microseconds, add them and then cast as pl.Datetime."

...and again in the Discord:

"and how can I combine both date and time column so that I can group by datetime? I found the pl.datetime function but it takes y/m/d/h/m/s as individual args"

Naming:

Follows the equivalent method associated with python datetimes:

>>> from datetime import datetime, date, time
>>> datetime.combine( date(2020,12,31), time(10,30,45) )
datetime( 2020,12,31,10,30,45 )

Example:

If the underlying expression is a Datetime then its Time component is replaced, and if it is a Date then a new Datetime is created by combining the two components as-is.

from datetime import datetime, date, time
import polars as pl

df = pl.DataFrame({
    "dtm": [
        datetime(2022,12,31,10,30,45),
        datetime(2023,7,5,23,59,59),
    ],
    "dt": [
        date(2022,10,10),
        date(2022,7,5),
    ],
    "tm": [
         time(1,2,3,456000),
         time(7,8,9,101000),
     ],
})
# shape: (2, 3)
# ┌─────────────────────┬────────────┬──────────────┐
# │ dtm                 ┆ dt         ┆ tm           │
# │ ---                 ┆ ---        ┆ ---          │
# │ datetime[μs]        ┆ date       ┆ time         │
# ╞═════════════════════╪════════════╪══════════════╡
# │ 2022-12-31 10:30:45 ┆ 2022-10-10 ┆ 01:02:03.456 │
# │ 2023-07-05 23:59:59 ┆ 2022-07-05 ┆ 07:08:09.101 │
# └─────────────────────┴────────────┴──────────────┘
df.select(
    [
        pl.col("dtm").dt.combine( pl.col("tm") ).alias("d1"),
        pl.col("dt").dt.combine( pl.col("tm") ).alias("d2"),
        pl.col("dt").dt.combine( time(4,5,6) ).alias("d3"),
    ]
)
# shape: (2, 3)
# ┌─────────────────────────┬─────────────────────────┬─────────────────────┐
# │ d1                      ┆ d2                      ┆ d3                  │
# │ ---                     ┆ ---                     ┆ ---                 │
# │ datetime[μs]            ┆ datetime[μs]            ┆ datetime[μs]        │
# ╞═════════════════════════╪═════════════════════════╪═════════════════════╡
# │ 2022-12-31 01:02:03.456 ┆ 2022-10-10 01:02:03.456 ┆ 2022-10-10 04:05:06 │
# │ 2023-07-05 07:08:09.101 ┆ 2022-07-05 07:08:09.101 ┆ 2022-07-05 04:05:06 │
# └─────────────────────────┴─────────────────────────┴─────────────────────┘

Next:

This is only exposed via Python at the moment, but I'd definitely like to move it down into Rust; I had a quick look at doing so, but there was rather more indirection than I recall, so I thought I'd get it working here first and then move it down.

@ritchie46 - if you can suggest looking at an existing function that's at least vaguely similar in approach to this one, I'm sure I can work it out...🤣

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars labels Jan 8, 2023
@ritchie46
Copy link
Member

What if we just make date + time arithmetic work? That's most intuitive to me.

@alexander-beedie
Copy link
Collaborator Author

alexander-beedie commented Jan 8, 2023

What if we just make date + time arithmetic work? That's most intuitive to me.

Truthfully I don't like that; it has a large conceptual overhang with "datetime + duration", but it really isn't the same.

For instance, it assumes that "date + time" is equivalent to "datetime at 00:00:00 + time.cast(duration)", but a "date + (duration of less than 1 day)" is actually a no-op (because that's how date type duration offsets work - the minimum change has to be +/- 1 day), eg:

>>> df = pl.DataFrame( {"x":[date(2020,1,2)]} )
>>> df.select( pl.col("x") + timedelta(hours=23) ).item()
date( 2020,1,2 )  # << no change

So it's a syntax that looks simple at first glance but it opens you up to ambiguity and "well *this* behaves one way, but *that* behaves another way, but they both sort of look the same?" scenarios. Having the functionality sit behind a dedicated / discoverable method isolates you from all of that (including any potential for extra contextual spaghetti lower down in the codebase). Plus... it's explicit rather than being a bit "magic" :)

(Also the "datetime.combine(time)" behaviour whereby the new time part replaces the existing time part wouldn't be possible with a "+" syntax; there may also be some scope for additional params later).

Not sure how chrono handles combining date/time parts, but in python use of date + time gets you a TypeError - so if you're coming from python and using polars, you probably aren't going to try this, and it won't show up in the API docs (as an implicit operation mediated by an operator) so it's going to be somewhat undiscoverable.

@ritchie46
Copy link
Member

Alright. Shall I setup the skeleton for the rust expression so we can move it down?

@alexander-beedie
Copy link
Collaborator Author

Alright. Shall I setup the skeleton for the rust expression so we can move it down?

Sounds like a plan :)

@alexander-beedie
Copy link
Collaborator Author

alexander-beedie commented Jan 9, 2023

Had another crack at it after a night's sleep; feels like I'm about 90% the way there, with (I think) only one last issue blocking me from actually testing it.

Specifically, when registering the function in dsl/function_expr/datetime.rs...

#[derive(Clone, PartialEq, Debug, Eq, Hash)]
pub enum TemporalFunction {
    ....
    Combine(Series, TimeUnit),
    ...
}

...given the following function signature:

pub(super) fn combine(s: &Series, tm: &Series, tu: TimeUnit) -> PolarsResult<Series> {

(The new tu param comes about because I realised that you should ideally be able to control the final/output Datetime timeunit).

I've got all the rest of the plumbing in place, and an implementation, but it doesn't like Series here because of the macro expansion (Hash and Eq not satisfied).

What do you think might be the best way to handle this? (Assuming I didn't already jump the shark by having Series there at all :)

@stinodego
Copy link
Member

I just ran into a case where I'd need this. Hope this will be ready soon-ish 😸

@alexander-beedie
Copy link
Collaborator Author

alexander-beedie commented Jan 17, 2023

I just ran into a case where I'd need this. Hope this will be ready soon-ish 😸

I think it's close, if I can understand how best to handle the macro expansion issue with Series (the python version works, but really it should be moved down). @ritchie46, I think I need a few gems of your Rust wisdom here! 😅

@ritchie46
Copy link
Member

Ok, I will take a look at this one tomorrow (if the bugs allow me).

@alexander-beedie
Copy link
Collaborator Author

Ok, I will take a look at this one tomorrow (if the bugs allow me).

Bugs come first! Whenever you've actually got a spare moment is fine... ;)

@stinodego
Copy link
Member

Exactly, I wasn't trying to rush you! I am just using the underlying logic implemented by @alexander-beedie in this PR and it works fine.

I just thought it was interesting that I bumped into this while there's a PR open 😄

@ritchie46 ritchie46 merged commit cf2c583 into pola-rs:master Jan 18, 2023
@stinodego stinodego changed the title feat(python): add dt.combine for combining date and time components feat(rust, python): add dt.combine for combining date and time components Jan 19, 2023
@github-actions github-actions bot added the rust Related to Rust Polars label Jan 19, 2023
@alexander-beedie alexander-beedie deleted the combine-date-and-time branch January 26, 2023 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants