Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): Warn on inefficient use of map_elements for temporal attributes/methods #14529

Merged

Conversation

alexander-beedie
Copy link
Collaborator

@alexander-beedie alexander-beedie commented Feb 16, 2024

Ref: #9968.
(Implements the ".dt.month and other stdlib datetime functions" suggestion).

Saw somebody do this in our own codebase today (shame on that person 🤣), so have added new bytecode detection logic for temporal attrs/methods, along with the appropriate native expression mapping.

Covers the following temporal attributes...

date
day
hour
microsecond
minute
month
second
year

...and temporal methods:

isoweekday()
date()
time()

Before

import polars as pl

df = pl.DataFrame({"dtm": [datetime(2024,2,16,10,30,45)]})
df.with_columns(
    # shameful (and wildly inefficient) use of a lambda :)
    time = pl.col("dtm").map_elements(lambda d: d.time())
)
# shape: (1, 2)
# ┌─────────────────────┬──────────┐
# │ dtm                 ┆ time     │
# │ ---                 ┆ ---      │
# │ datetime[μs]        ┆ time     │
# ╞═════════════════════╪══════════╡
# │ 2024-02-16 10:30:45 ┆ 10:30:45 │
# └─────────────────────┴──────────┘

After

Can still do it, but now triggers a PolarsInefficientMapWarning:

# PolarsInefficientMapWarning: 
# Expr.map_elements is significantly slower than the native expressions API.
# Only use if you absolutely CANNOT implement your logic otherwise.
# Replace this expression...
#   - pl.col("dtm").map_elements(lambda d: ...)
# with this one instead:
#   + pl.col("dtm").dt.time()

Also

Now optimises out unnecessary implicit existence checks from the final suggestion, eg:

lambda x: x and x and x and x.date()
...becomes:
pl.col("d").dt.date()

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars labels Feb 16, 2024
@alexander-beedie alexander-beedie marked this pull request as draft February 16, 2024 06:51
@alexander-beedie alexander-beedie marked this pull request as ready for review February 16, 2024 08:04
@alexander-beedie alexander-beedie changed the title feat(python): warn on inefficient use of map_elements for temporal attribute access feat(python): warn on inefficient use of map_elements for temporal attributes/methods Feb 16, 2024
@alexander-beedie
Copy link
Collaborator Author

alexander-beedie commented Feb 16, 2024

Looks like we should better-handle the form lambda x: x and x.time(); will get on it ;)

Update: done 👌

@MarcoGorelli
Copy link
Collaborator

thanks for doing this

looks like 3.12 is failing?

@alexander-beedie
Copy link
Collaborator Author

alexander-beedie commented Feb 16, 2024

looks like 3.12 is failing?

Indeed; and yet... it isn't failing locally here on an M2 running on 3.12, hmm. Will dig in at home later; got access to a Ubuntu build running on my NAS - hopefully it replicates!

Ah! Tweaked my env and was able to replicate locally; should be a quick fix after all 🤞

@nameexhaustion
Copy link
Collaborator

This is some nice stuff. I took this for a spin and and came up with a more refined evil 😄

df.with_columns(time=pl.col("dtm").map_elements(datetime.time))

Copy link
Collaborator

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work!

just out of interest, why would someone write lambda x: x and x and x.date()?

py-polars/polars/utils/udfs.py Show resolved Hide resolved
py-polars/polars/utils/udfs.py Show resolved Hide resolved
@alexander-beedie
Copy link
Collaborator Author

just out of interest, why would someone write lambda x: x and x and x.date()?

They likely wouldn't, but x and x.date() was spotted in our unit tests, so it primarily handles that. I made it handle the general case on the off-chance that some compound statement may one day collapse to form more than one x and in a row. The test is written that way just to prove that it'll handle the edge-case.

@alexander-beedie
Copy link
Collaborator Author

df.with_columns(time=pl.col("dtm").map_elements(datetime.time))

Devious; I can tackle that in a follow-up ;)

Copy link
Collaborator

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, always happy to do more "butt-kicking" if people use map_elements unnecessarily :)

@alexander-beedie alexander-beedie merged commit 815732b into pola-rs:main Feb 16, 2024
13 checks passed
@alexander-beedie alexander-beedie deleted the temporal-bytecode-attrs branch February 16, 2024 20:50
@stinodego stinodego changed the title feat(python): warn on inefficient use of map_elements for temporal attributes/methods feat(python): Warn on inefficient use of map_elements for temporal attributes/methods Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants