-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(python): Add map_dict
expression.
#5899
Conversation
WIP. Not sure about the name TODO:
#2377 would have been useful to have. .with_column(pli.lit(True).alias(is_remapped_column)),
...
pli.when(pli.col(is_remapped_column).is_not_null())
``` |
I think The nicest feature, but that is probably much more work, is if we could get this as an expression as well. Currently, you have four parameters, only two would be needed for that:
pl.col("input_col").map_dict({"hello": 1, "world", 2}, default="default text here").alias("output_col") As for the naming:
|
I also think
Instead of I like the following syntax more, but as we need join, I don't know how to get the whole dataframe from the column expression. I assume pl.col("input_col").map_dict({"hello": 1, "world", 2}, default="default text here").alias("output_col") |
Is it reasonable to say that |
First of all thanks for working on this <3 Naming: Implementation:
What do you think? |
So we have various
|
@ghuls could we do this with a The function that gets the |
I forgot about the possibility of making a DataFrame from a Series. It should indeed work. |
Definitely feels like an expression; also, rather than continuing to overload the name This could be my C++ past coming back to haunt me, but (structurally) it follows exactly same pattern, so there is a history/precedent for the name... #include <iostream>
using namespace std;
// trivial example
int main () {
char country_code = 'FR';
switch( country_code ) {
case 'FR' :
cout << "France" << endl;
break;
case 'ES' :
cout << "Spain" << endl;
break;
default :
cout << "Unknown" << endl;
}
return 0;
} ...and as a polars expression: pl.col('country_code').switch(
{'FR':'France', 'ES':'Spain'}, default='Unknown'
) If not, I think |
Yes, it should be an expression. At the time I couldn't figure out how to do it due the use of join (forgot that you can transform a series to a dataframe). I just looked for some synonyms:
|
I agree that |
@ghuls any updates on this one? I can take it from here if you like. |
Looking at it again. |
@ritchie46 I was looking back in Discord to check how to implement remap as an expression instead of on the dataframe, but it looks like currently it is not possible according to your comment at the time:
Or is there another way? (I tried |
02fb3c1
to
ab99bc6
Compare
Create a |
11c65fa
to
f6eadec
Compare
Thanks @ghuls. I took your code and converted it to an expression. I don't think it should be on the frames. This also removes the need for an output column name. |
map_dict
expression.
Ah, thanks. I was just working on the same. I also managed to get it to work if you select another column as default. |
@ritchie46: Can we change the name before release? I think |
I think |
Fair enough; I do not feel strongly about this, I'm just happy the feature exists :) (Thx again @ghuls!) |
Using other column expressions than the column that is map_dict-ed for the default value, is possible with: |
Regarding the name, I had suggested Being able to throw on missing value instead of defaulting could be practical in some cases, but this is already merged, so I guess it would have to be for a separate feature PR. |
No description provided.