-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dict/Hashmap lookup expression #3789
Comments
We have this functionality. This is a |
Join is unwieldy for this operation, since can't be expressed in-line on a select/with_column. It's possible to perform this as an expression, but since grades.with_column(pl.col("class").map(lambda series: series.apply(lambda x: class_subject.get(x)))) EDIT: It turns out that you can use csv_data = grades.with_column(pl.col("class").apply(class_subject.get)) |
Yeap, apply is elementwise (in the select context). |
Yeap, apply is elementwise (in the select context). What??? |
This section from the User Guide might help:
|
@sm-Fifteen In [6]: grades.with_columns(pl.col("class").map_dict(class_subject, default="No Known Class").alias("class_code"))
Out[6]:
shape: (8, 5)
┌─────────┬─────────┬────────────┬──────────┬────────────────┐
│ student ┆ class ┆ test_score ┆ test_max ┆ class_code │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i64 ┆ i64 ┆ str │
╞═════════╪═════════╪════════════╪══════════╪════════════════╡
│ bas ┆ MAT-150 ┆ 10 ┆ 10 ┆ Mathematics │
│ laura ┆ MAT-150 ┆ 5 ┆ 10 ┆ Mathematics │
│ tim ┆ MAT-210 ┆ 6 ┆ 12 ┆ Mathematics │
│ jenny ┆ MAT-600 ┆ 8 ┆ 10 ┆ No Known Class │
│ bas ┆ COM-200 ┆ 7 ┆ 12 ┆ Programming │
│ laura ┆ COM-205 ┆ 6 ┆ 10 ┆ Programming │
│ tim ┆ COM-430 ┆ 10 ┆ 15 ┆ No Known Class │
│ jenny ┆ COM-200 ┆ 5 ┆ 12 ┆ Programming │
└─────────┴─────────┴────────────┴──────────┴────────────────┘ Closed by #5899. |
@ghuls : Oh, wow, thanks, that's great! I'd actually given up on this, but for my use cases, it's actually a huge improvements in ergonomics and readability. |
Since I found this feature via this thread, I'd like to mention that from 0.19.16 on this method is called "replace" |
Describe your feature request
Let's say I have a dataset like this:
And that I want to map
class
with their respective subject matter, so I can compare grades per subject instead of per class:With Pandas, I can use Series.map to create a series that maps the contents of the initial column with the key of a Python dictionary and contains the value.
Using Polars, that's doable, but a fair amount more involved, because I need to cast both columns as Categorical and perform a join within the same context manager:
A new expression method, maybe something like
Expr.lookup(map: dict[str | int, ...])
would make this sort of operation doable in a single step. An extra argument, likelookup(map, on_missing: Literal['omit','null','error'])
could also be useful to specify the behavior when the hashmap does not contain anything. Pandas instead relies on the use of DefaultDict and the user running a second pass to filter out the NaNs that were inserted for missing entries.If this is restricted to dicts and not lambda functions, it should be possible to copy the dict into a Rust HashMap and perform the operation without needing Python-owned resources.
The text was updated successfully, but these errors were encountered: