Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename all apply functions to map_* #10744

Closed
9 tasks done
stinodego opened this issue Aug 26, 2023 · 21 comments
Closed
9 tasks done

Rename all apply functions to map_* #10744

stinodego opened this issue Aug 26, 2023 · 21 comments
Assignees
Labels
accepted Ready for implementation deprecation Add a deprecation warning to outdated functionality enhancement New feature or an improvement of an existing feature

Comments

@stinodego
Copy link
Contributor

stinodego commented Aug 26, 2023

In #10678, Ritchie mentions the reason for moving away from the apply naming:

Pandas apply will have opposite behavior of polars. We chose apply because people were so familiar with it in pandas.

Now that influence stops and we try to keep true to our own naming strategy. I favour map as functional name for mapping a function and if I ask what we map, the best name I could come with was elements.

We map scalar elements during projections and group elements during a groupby agg.

Ideally I also want to clarify what we map in map. That function get the whole column / state / batch, but we cannot guarantee that a column is complete, so I am still thinking about that one.

We will rename Series/Expr.apply to map_elements. Here's a list of other apply-related functions and proposed naming. Please use this topic to discuss the naming of these functions.

apply

The following functions use apply and should be renamed:

  • Series/Expr.apply -> map_elements
  • Series/Expr.rolling_apply -> rolling_map
  • DataFrame.apply -> map_rows
  • (Dynamic/Rolling)GroupBy.apply -> map_groups
  • pl.apply -> map_groups

map

The following functions use map and must be considered for a rename to avoid confusion:

  • Series/Expr.map -> map_batches
  • DataFrame/LazyFrame.map -> map_batches
  • Series/Expr.map_dict -> replace
  • Series/Expr.map_alias -> name.map
@stinodego stinodego added enhancement New feature or an improvement of an existing feature deprecation Add a deprecation warning to outdated functionality labels Aug 26, 2023
@ritchie46
Copy link
Member

ritchie46 commented Aug 26, 2023

Series/Expr.apply -> map_elements 👍
Series/Expr.rolling_apply -> map_rolling (?) 👍
DataFrame/LazyFrame.apply -> map_rows 👍
(Dynamic/Rolling)GroupBy.apply -> map_groups 👍
pl.apply -> map_groups (?) 👍

  • Series/Expr.map should become map_batch.
  • map_dict seems fine to me.
  • LazyFrame.map to map_frame?

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Aug 26, 2023

Looks good; out of all of them, map_batch is probably the only one that doesn't seem intuitive, as it's not that obvious what "batch" means here. Throwing a few more ideas out there: map_column, map_array, or map_series instead? 🤔

@ion-elgreco
Copy link
Contributor

Looks good; out of all of them, map_batch is probably the only one that doesn't seem intuitive, as it's not that obvious what "batch" means here. Throwing a few more ideas out there: map_column, map_array, or map_series instead? 🤔

map_series seems the best out of those 3 to me

@ritchie46
Copy link
Member

Looks good; out of all of them, map_batch is probably the only one that doesn't seem intuitive, as it's not that obvious what "batch" means here. Throwing a few more ideas out there: map_column, map_array, or map_series instead? 🤔

Yes, that is also because it is the hardest one to name. I can be any batch polars decides to give the function. Whereas map_elements gives you the whole group during aggregation. batch is allowed to give you a partial group.

That's why map_column is incorrect. Because we cannot guarantee you get a column.

@stinodego
Copy link
Contributor Author

stinodego commented Aug 27, 2023

The way I see it, the current map methods are fine as they are.

Methods are applied to the object they are called on. So if I call map on a Series, I expect to map the whole Series. Same with Expr/DataFrame/LazyFrame. Exceptions to the rule are marked explicitly, e.g. map_elements, map_rows, etc.

I mention map_dict here because map now triggers me to think "UDF". This function has nothing to do with UDFs.

I never liked the naming of map_dict. Am I mapping a dictionary? No. I am replacing specific values in my data according to some mapping. A better name would be replace or replace_values. The fact that the mapping is specified as a dictionary is an implementation detail that should not be part of the method name.

map_alias is also mildly confusing to me. I am not mapping an alias, I am mapping names to create an aliases. So map_names makes more sense.

@cmdlineluser
Copy link
Contributor

Am I mapping a dictionary? No.

Yeah, I suppose technically you are mapping keys, perhaps .map_keys could fit better?

It's a bit of an odd one as it can be defined in different ways, it's essentially an optimized/special-cased version of: .map_elements(my_dict.get)

But as you say (and the docstring), the emphasis could also be on the "replace" part instead of the "mapping".

Expr.map_dict

Replace values in column according to remapping dictionary.

@ritchie46
Copy link
Member

Methods are applied to the object they are called on. So if I call map on a Series, I expect to map the whole Series.

This is not guaranteed. Expr.map may get an incomplete batch. This happens for instance in the streaming engine.

@stinodego
Copy link
Contributor Author

Methods are applied to the object they are called on. So if I call map on a Series, I expect to map the whole Series.

This is not guaranteed. Expr.map may get an incomplete batch. This happens for instance in the streaming engine.

We can mention this with a note in the docstring / streaming docs, and it'll be fine I think.

@ritchie46
Copy link
Member

We can mention this with a note in the docstring / streaming docs, and it'll be fine I think.

Ok!

@stinodego stinodego self-assigned this Aug 27, 2023
@stinodego stinodego added the accepted Ready for implementation label Aug 27, 2023
@github-project-automation github-project-automation bot moved this to Ready in Backlog Aug 27, 2023
@alexander-beedie
Copy link
Collaborator

I never liked the naming of map_dict. Am I mapping a dictionary? No.

I always liked switch, but perhaps that's too much of a C++ism to catch on more broadly 😅

@orlp
Copy link
Collaborator

orlp commented Aug 28, 2023

Just fueling the bikeshed, what about transform?

@stinodego
Copy link
Contributor Author

I would say replace_values is much clearer than either of those.

@ion-elgreco
Copy link
Contributor

ion-elgreco commented Aug 28, 2023

I would say replace_values is much clearer than either of those.

Perhaps the default behavior should also be to use default=pl.first(). To me replace implies that instead of resulting to None where it doesn't match.

@stinodego
Copy link
Contributor Author

I would say replace_values is much clearer than either of those.

Perhaps the default behavior should also be to use default=pl.first(). To me replace implies that instead of resulting to None where it doesn't match.

Right - I did not realize that was the default behavior 🤔 let's do a separate issue to discuss map_dict and possible improvements. Apparently it's not as simple as I thought.

@ion-elgreco
Copy link
Contributor

I would say replace_values is much clearer than either of those.

Perhaps the default behavior should also be to use default=pl.first(). To me replace implies that instead of resulting to None where it doesn't match.

Right - I did not realize that was the default behavior 🤔 let's do a separate issue to discuss map_dict and possible improvements. Apparently it's not as simple as I thought.

Done: #10755

@orlp
Copy link
Collaborator

orlp commented Aug 28, 2023

@stinodego I meant transform instead of map_elements. Either way I don't feel strongly about it.

@baggiponte
Copy link
Contributor

baggiponte commented Aug 30, 2023

Series/Expr.rolling_apply -> map_rolling (?) 👍

I like this (I also dare to suggest map_window?)

@stinodego
Copy link
Contributor Author

Series/Expr.rolling_apply -> map_rolling (?) 👍

I like this (I also dare to suggest map_window?)

I ended up going with rolling_map to align it with rolling_sum, rolling_std, etc.

@sm-Fifteen
Copy link

I never liked the naming of map_dict. Am I mapping a dictionary? No. I am replacing specific values in my data according to some mapping. A better name would be replace or replace_values. The fact that the mapping is specified as a dictionary is an implementation detail that should not be part of the method name.

@stinodego: The initial feature request suggested lookup() or dict_lookup(). That might still be a suitable replacement if you're looking to free up the "map" term.

@stinodego
Copy link
Contributor Author

Closing in favor of #10755 as map_dict is the only part of this issue still relevant. It will be renamed to replace.

@alexanderfifefd
Copy link

I'm just commenting here so this thread may potentially be indexed by Google for the following error messages as I spent 20 minutes too long figuring out why my code stopped working after re-installing Polars.

The error messages I was getting was:
Polars AttributeError: 'DataFrame' object has no attribute 'apply'.
Polars AttributeError: 'Series' object has no attribute 'apply'.

As is clear in this thread apply is deprecated in Polars. Renaming apply to map_elements solved applying to a series.

The highest ranking documentation in the SERP for the apply function was version v0.18 which naturally doesn't mention that it's deprecated.

Hopefully this helps others out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation deprecation Add a deprecation warning to outdated functionality enhancement New feature or an improvement of an existing feature
Projects
Archived in project
Development

No branches or pull requests

9 participants