Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow dictionary in .str.replace() and .str.replace_all() #11418

Open
lmocsi opened this issue Sep 29, 2023 · 3 comments
Open

Allow dictionary in .str.replace() and .str.replace_all() #11418

lmocsi opened this issue Sep 29, 2023 · 3 comments
Labels
enhancement New feature or an improvement of an existing feature

Comments

@lmocsi
Copy link

lmocsi commented Sep 29, 2023

Description

It would be nice to have instead of chained replace_all's:

df = (df.with_columns(pl.col('field1').str.replace_all('aa','1').str.replace_all('bb','2'),
                      pl.col('field2').str.replace_all('aa','1').str.replace_all('bb','2')
      )

using a dict, something like:

conv = {'aa': '1','bb': '2'}
df = (df.with_columns(pl.col('field1').str.replace_all(conv),
                      pl.col('field2').str.replace_all(conv)
      )
@lmocsi lmocsi added the enhancement New feature or an improvement of an existing feature label Sep 29, 2023
@Julian-J-S
Copy link
Contributor

hmm interesting idea but I can also see how this could be confusing.
For example: should the replacements build on top of each other?
Like:

  • text = "ABC"
  • replacements = {'AB': '1', 'BC': '2'}
  • then the result would be 1C because after replacing AB then BC is not available anymore
  • and if the order of the dict changes to {'BC': '2', 'AB': '1'} the result would be A2
  • having chained replace_all is very clear to the reader what should happen and in what order

If you try to replace whole values (in your example 'aa' and 'bb') you can look into map_dict https://pola-rs.github.io/polars/py-polars/html/reference/expressions/api/polars.Expr.map_dict.html#polars-expr-map-dict

@cmdlineluser
Copy link
Contributor

As @JulianCologne points out, they are slightly different operations.

Passing multiple patterns/replacements usually implies the replacements happen in isolation / independently of each other.

Whereas with chaining, each call is processing the output from the previous call.

As for chaining, I was trying to figure out if it was possible to build a helper function.

i.e. turn

replacements = {'one': 'two', 'three': 'four'}

expr = pl.col('field1', 'field2').str.replace_all

into:

pl.col('field1', 'field2').str.replace_all('one', 'two').str.replace_all('three', 'four')

But I'm not sure if that is possible?

With expr and string, we could split on . and loop through getattr()

expr, function = pl.col('field1', 'field2'), 'str.replace_all'

The actual str.replace_all function seems to live in pl.expr.string.ExprNameSpace which we could functools.reduce

expr, function = pl.col('field1', 'field2'), pl.expr.string.ExprStringNameSpace.replace_all

But that's somewhat awkward.

@lmocsi
Copy link
Author

lmocsi commented Jul 8, 2024

It seems that standard replace allows dicts:

import polars as pl
dff = pl.DataFrame({'a':['a','b','c']})
mapping = {'a':'apple', 'b':'banana'}

dff.with_columns(pl.col('a').replace(mapping, default=pl.lit('***')).alias('aa'))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature
Projects
None yet
Development

No branches or pull requests

3 participants