Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

depr(python): Rename map_dict to replace and change default behavior #12599

Merged
merged 9 commits into from
Nov 22, 2023

Conversation

stinodego
Copy link
Member

@stinodego stinodego commented Nov 21, 2023

Closes #10755

This has come up many times (e.g. #10744, #12537, #12533, ...) and I've been putting off making a decision on this. I think this is the way to go.
See #10755 (comment) for justification.

Changes

  • Rename map_dict to replace
  • Change the default behavior to keep any values not mentioned in the mapping unchanged. To preserve existing behavior, pass default=None.

@stinodego stinodego marked this pull request as draft November 21, 2023 08:46
@github-actions github-actions bot added deprecation Add a deprecation warning to outdated functionality python Related to Python Polars labels Nov 21, 2023
@stinodego stinodego marked this pull request as ready for review November 21, 2023 09:08
@ritchie46
Copy link
Member

Nice, can we take this opportunity to sometimes decide to go to a when -> then -> otherwise?

@stinodego
Copy link
Member Author

Nice, can we take this opportunity to sometimes decide to go to a when -> then -> otherwise?

When would it be preferable, though? I guess for a single value you'd generally want to use when/then/otherwise. But I'm not sure where the threshold is at which replace becomes faster.

I'd like to just re-implement this whole function on the Rust side regardless. We can pick that up in a separate PR and include performance improvements then?

@ritchie46
Copy link
Member

For a single value indeed.

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Nov 21, 2023

When would it be preferable, though? I guess for a single value you'd generally want to use when/then/otherwise. But I'm not sure where the threshold is at which replace becomes faster.

Essentially immediately... I was timing this at work the other day and map_dict is significantly faster than when/then as soon as you go beyond ONE value (when/then is ~20% faster in that case though). With two values map_dict is ~2x faster, and the difference just gets more pronounced from there 😅 (which makes sense: O(1) vs O(n))

With a release build from current HEAD:

from codetiming import Timer
import polars as pl

df = pl.DataFrame({"x": range(10_000_000)})

with Timer("when/then"):
    res = df.with_columns(
        y = pl.when(pl.col("x") == 3).then("!!").when(pl.col("x") == 9_999_998).then("##")
    )
    # Elapsed time: 0.1122 seconds

with Timer("map_dict"):
    res = df.with_columns(
        y = pl.col("x").map_dict({3:"!!", 9_999_998:"##"})
    )
    # Elapsed time: 0.0581 seconds
shape: (10_000_000, 2)
┌─────────┬──────┐
│ x       ┆ y    │
│ ---     ┆ ---  │
│ i64     ┆ str  │
╞═════════╪══════╡
│ 0       ┆ null │
│ 1       ┆ null │
│ 2       ┆ null │
│ 3       ┆ !!   │
│ …       ┆ …    │
│ 9999996 ┆ null │
│ 9999997 ┆ null │
│ 9999998 ┆ ##   │
│ 9999999 ┆ null │
└─────────┴──────┘

We'd probably get more mileage from an optimisation that goes the other way, detecting patterns like this in when/then and converting to map_dict (or, as it will soon be known, replace) under the hood.

@ritchie46
Copy link
Member

Yeap, that's what I expect. A single value to when then otherwise, the rest via map_dict.

@stinodego stinodego added this to the 0.19.16 milestone Nov 22, 2023
@ritchie46 ritchie46 merged commit 74c6ba2 into main Nov 22, 2023
14 checks passed
@ritchie46 ritchie46 deleted the depr-map-dict branch November 22, 2023 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deprecation Add a deprecation warning to outdated functionality python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Change default behavior map_dict
3 participants