Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): Warn on inefficient use of map_elements for additional string functions #14565

Merged

Conversation

alexander-beedie
Copy link
Collaborator

@alexander-beedie alexander-beedie commented Feb 17, 2024

Ref: #9968.

The BytecodeParser can now also detect/translate Python string ops such as...

  • s.strip()
  • s.lstrip()
  • s.rstrip()
  • s.endswith('x')
  • s.endswith(('x','y'))
  • s.startswith('x')
  • s.startswith(('x','y'))

Note that as we don't (currently ;) support multiple strings being passed in to starts_with and ends_with, such expressions are translated to a suitable contains regex instead.

Example

import polars as pl

df = pl.DataFrame({"s":["xxyx","123xx", "45yx?"]})
df.with_columns(
    match = pl.col("s").map_elements(lambda s: s.endswith(("?", "!"))),
)

# PolarsInefficientMapWarning: 
# Expr.map_elements is significantly slower than the native expressions API.
# Only use if you absolutely CANNOT implement your logic otherwise.
# Replace this expression...
#   - pl.col("s").map_elements(lambda s: ...)
# with this one instead:
#   + pl.col("s").str.contains(r'(\?|!)$')
```

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars labels Feb 17, 2024
Copy link

codecov bot commented Feb 17, 2024

Codecov Report

Attention: 18 lines in your changes are missing coverage. Please review.

Comparison is base (815732b) 80.76% compared to head (66fd1a1) 77.95%.
Report is 5 commits behind head on main.

Files Patch % Lines
py-polars/polars/utils/udfs.py 20.00% 15 Missing and 1 partial ⚠️
py-polars/polars/utils/various.py 33.33% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #14565      +/-   ##
==========================================
- Coverage   80.76%   77.95%   -2.81%     
==========================================
  Files        1326     1326              
  Lines      173035   173058      +23     
  Branches     2439     2446       +7     
==========================================
- Hits       139758   134916    -4842     
- Misses      32805    37698    +4893     
+ Partials      472      444      -28     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@stinodego stinodego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks!

@stinodego stinodego changed the title feat(python): warn on inefficient use of map_elements for additional string functions feat(python): Warn on inefficient use of map_elements for additional string functions Feb 18, 2024
@stinodego stinodego merged commit 7698c31 into pola-rs:main Feb 18, 2024
13 checks passed
@alexander-beedie alexander-beedie deleted the additional-bytecode-detection branch February 19, 2024 03:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants