"Inefficient map_*" warnings (tracking issue) #9968

MarcoGorelli · 2023-07-19T11:05:35Z

lucazanna · 2023-07-19T11:31:55Z

Can we also include uppercase, lowercase and title case ? I have seen cases of using the Python string methods for those with apply instead of the Polars expressions

alexander-beedie · 2023-07-20T09:04:13Z

Aha! Got a super-clean approach for handling the numpy, string, and json identification/mapping now, though may take a day or two to get around to it... :)

lucazanna · 2023-07-23T19:16:18Z

I just read an article with a code snippet using apply for a simple if condition. I wonder if ternary operators (value_if_true if condition else value_if_false) could also be included ?

alexander-beedie · 2023-07-24T07:53:42Z

I just an article with a code snippet using apply for a simple if condition. I wonder if ternary operators (value_if_true if condition else value_if_false) could also be included ?

Got a sample? I need to rework/extend the current handling of and/or logic (which is represented by various *JUMP* control flow ops in the bytecode) so it can also handle if/else, which is represented similarly...

lucazanna · 2023-07-24T08:14:51Z

Got a sample? I need to rework/extend the current handling of and/or logic (which is represented by various *JUMP* control flow ops in the bytecode) so it can also handle if/else, which is represented similarly...

Yes, here is an example:

df.select(
    pl.col('a').apply(lambda x: x*2 if x>=5 else x)
)

https://towardsdatascience.com/manipulating-values-in-polars-dataframes-1087d88dd436

Unrelated to this - will the recommendation engine also work on small non-lambda Python functions?

MarcoGorelli · 2023-07-24T08:20:52Z

will the recommendation engine also work on small non-lambda Python functions?

you mean like this?

In [2]: def func(value):
   ...:     return value **2
   ...:

In [3]: df.select(pl.col('a').apply(func))
<ipython-input-3-53ff52cbccd5>:1: PolarsInefficientApplyWarning:
Expr.apply is significantly slower than the native expressions API.
Only use if you absolutely CANNOT implement your logic otherwise.
In this case, you can replace your `apply` with an expression:
-  pl.col("a").apply(func)
+  pl.col("a") ** 2

  df.select(pl.col('a').apply(func))

If so, yup!

lucazanna · 2023-07-24T09:48:57Z

@MarcoGorelli nice!

what about:

def func(value):
   if value > 10:
      return "a"
   elif value > 0:
      return "b"
   else:
      return "c"

I imagine it will also work when if and else are added?

MarcoGorelli · 2023-07-24T09:57:50Z

I expect that if the respective lambda function were to work, then that one should work as well - but we'll make sure to test for it explicitly, thanks!

EDIT: this would actually require a little extra work, as the corresponding lambda would be equivalent to

def func(x):
    return 'a' if x>10 else 'b' if x>0 else 'c'

so thanks for having brought it up

ritchie46 · 2023-07-24T10:09:08Z

Note that polars speculatively evaluates branches in when -> then -> otherwise, so if one of the branches can fail, the apply is correct way to deal with that.

alexander-beedie · 2023-07-24T21:29:20Z

will the recommendation engine also work on small non-lambda Python functions?

Yes; it doesn't matter if you pass a lambda, function, or method on a class - they will all be disassembled down into the same primitive ops. However, I'm only considering single-return functions/lambdas, so multiple-return functions won't work (as they aren't quite the same thing as lambdas).

when if and else are added?

I'm liking the vote of confidence here 🤣 Control flow from bytecode can be tricky (and/or and if/else look quite similar since they both use flavours of *JUMP* ops and have to be disambiguated; starting to look into that, though will need some care to get it right) ;)

lucazanna · 2023-07-25T16:22:02Z

I'm liking the vote of confidence here 🤣

if there is one person who can I do it, I know it's you @alexander-beedie

henryharbeck · 2023-07-26T13:39:13Z

Not sure if this is pushing it too far / asking for too much (apologies in advance if it is), but I did think of some potential niceties around conditionals. Will leave it to you all in terms of whether you think it is reasonable and/or feasible. Just putting thoughts out there.

If a function only has checks for equality with an optional else clause, then that could be translated to map_dict with a default argument (and leave off the default if there is no else)
E.g.,

df = pl.DataFrame({"gender": ["M", "F", "M", "X"]})

def long_gender(row):
    if row == "M":
        rv = "Male"
    elif row == "F":
        rv = "Female"
    else:
        rv = "Unknown"
    return rv

df.with_columns(
    pl.col("gender").apply(long_gender).alias("bad_way"),
    pl.col("gender").map_dict({"M": "Male", "F": "Female"}, default="Unknown").alias("good_way")
)

If a function only has a single numerical comparison operator (i.e. only one of <, <=, >, >=) with an else clause, then that could be translated to cut`. Some processing of the operators and inputs would probably be required to
E.g.,

df = pl.DataFrame({"score": range(1, 11)})

def grade_score_le(row):
    if row <= 5:
        rv = "Fail"
    elif row <= 7:
        rv = "Pass"
    else:
        rv = "Distinction"
    return rv

def grade_score_ge(row):
    if row >= 8:
        rv = "Distinction"
    elif row >= 6:
        rv = "Pass"
    else:
        rv = "Fail"
    return rv

# others operators omitted for brevity

df.with_columns(
    # both functions return the same thing, so can be translated into the same `cut`
    *(pl.col("score").apply(fn).alias(f"bad_way_{fn.__name__}") for fn in [grade_score_le, grade_score_ge]),
    (
        pl.when(pl.col("score") <= 5).then("Fail")
        .when(pl.col("score") <= 7).then("Pass")
        .otherwise("Distinction")
    ).alias("good_way"),
    pl.col("score").cut([5, 7], ["Fail", "Pass", "Distinction"]).alias("better_way")
)

Unsure of any other optimisations, but I'm guessing the general rule would be that conditionals would be translated to when/then/otherwise?

lucazanna · 2023-07-31T16:39:47Z

Added an issue here: #10210 for the if/else recommendation

cmdlineluser · 2023-08-02T21:48:34Z

https://stackoverflow.com/questions/76822683/polars-apply-lambda-alternative

Could be an example/test-case for list lookups.

henryharbeck · 2023-08-05T12:53:30Z

Hi @alexander-beedie, @MarcoGorelli,

I notice that the issue description mentions (and has ticked) both bare numpy function and those used with a lambda

numpy functions which have expr equivalents (e.g. lambda x: np.sin(x) or bare np.sin)

At the moment, the lambda does not seem to warn, but the bare call does.

Example:

df = pl.DataFrame({"a": [1, 4]})
df.select(pl.col("a").apply(lambda x: np.sin(x))) # no warning raised
df.select(pl.col("a").apply(np.sin)) # warning is raised

Flagging here as I'm unsure if this is an issue, or just hasn't been implemented yet.

MarcoGorelli · 2023-08-05T12:59:25Z

thanks for the report - this warns for me:

In [4]: df = pl.DataFrame({"a": [1, 4]})
   ...: df.select(pl.col("a").apply(lambda x: np.sin(x))) # no warning raised
<ipython-input-4-e464a21bac84>:2: PolarsInefficientApplyWarning:
Expr.apply is significantly slower than the native expressions API.
Only use if you absolutely CANNOT implement your logic otherwise.
In this case, you can replace your `apply` with the following:
  - pl.col("a").apply(lambda x: ...)
  + pl.col("a").sin()

  df.select(pl.col("a").apply(lambda x: np.sin(x))) # no warning raised
Out[4]:
shape: (2, 1)
┌───────────┐
│ a         │
│ ---       │
│ f64       │
╞═══════════╡
│ 0.841471  │
│ -0.756802 │
└───────────┘

In [5]: pl.__version__
Out[5]: '0.18.11'

could you give you polars and python versions please?

henryharbeck · 2023-08-05T13:10:46Z

Thanks for the quick response
Python: 3.11.4
Polars: 0.18.12
and if it makes any difference at all
numpy: 1.24.3

MarcoGorelli · 2023-08-05T13:27:11Z

thanks! can reproduce, fix (and failing test) incoming!

MarcoGorelli · 2023-08-05T13:47:05Z

@henryharbeck are you running this in IPython / Jupyter?

I think it's that they apply some modifications and end up producing slightly different bytecode

If you make a Python script with just the following:

import numpy as np
import polars as pl

df = pl.DataFrame({"a": [1, 4]})
df.select(pl.col("a").apply(lambda x: np.sin(x)))

, do you get the warning?

I do, but don't when running via IPython (in Python 3.11)

henryharbeck · 2023-08-05T13:55:16Z

@MarcoGorelli, I was running it in Jupyter. Great stuff on figuring that out!

As a python script, the warning is produced. When running it as the first cell in a Jupyter notebook, no warning is produced.
Both are using the same venv with python 3.11

MarcoGorelli added the python Related to Python Polars label Jul 19, 2023

stinodego added the enhancement New feature or an improvement of an existing feature label Jul 19, 2023

alexander-beedie assigned alexander-beedie and MarcoGorelli Jul 19, 2023

alexander-beedie changed the title ~~Inefficient apply warnings: tracker~~ "Inefficient apply" warnings (tracking issue) Jul 19, 2023

stinodego added the accepted Ready for implementation label Jul 20, 2023

alexander-beedie mentioned this issue Jul 26, 2023

feat(python): BytecodeParser can now handle mixed/nested and/or control flow #10085

Merged

alexander-beedie mentioned this issue Jul 26, 2023

feat(python): enable "inefficient apply" warnings from Series #10104

Merged

MarcoGorelli mentioned this issue Aug 5, 2023

fix(python): show inefficient apply warning in ipython #10312

Merged

MarcoGorelli changed the title ~~"Inefficient apply" warnings (tracking issue)~~ "Inefficient map_*" warnings (tracking issue) Sep 30, 2023

MarcoGorelli mentioned this issue Sep 30, 2023

polars vet? astral-sh/ruff#7721

Open

MarcoGorelli mentioned this issue Dec 31, 2023

feat(python): emit suggestion for how to replace map_elements sigmoid function with expressions #13347

Merged

This was referenced Feb 16, 2024

feat(python): Warn on inefficient use of map_elements for temporal attributes/methods #14529

Merged

feat(python): Warn on inefficient use of map_elements for additional string functions #14565

Merged

alexander-beedie mentioned this issue Nov 6, 2024

feat(python): Identify inefficient use of Python string replace in map_elements #19668

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Inefficient map_*" warnings (tracking issue) #9968

"Inefficient map_*" warnings (tracking issue) #9968

MarcoGorelli commented Jul 19, 2023 •

edited by alexander-beedie

Loading

lucazanna commented Jul 19, 2023

alexander-beedie commented Jul 20, 2023

lucazanna commented Jul 23, 2023 •

edited

Loading

alexander-beedie commented Jul 24, 2023 •

edited

Loading

lucazanna commented Jul 24, 2023

MarcoGorelli commented Jul 24, 2023

lucazanna commented Jul 24, 2023 •

edited

Loading

MarcoGorelli commented Jul 24, 2023 •

edited

Loading

ritchie46 commented Jul 24, 2023 •

edited

Loading

alexander-beedie commented Jul 24, 2023 •

edited

Loading

lucazanna commented Jul 25, 2023

henryharbeck commented Jul 26, 2023

lucazanna commented Jul 31, 2023

cmdlineluser commented Aug 2, 2023

henryharbeck commented Aug 5, 2023

MarcoGorelli commented Aug 5, 2023

henryharbeck commented Aug 5, 2023

MarcoGorelli commented Aug 5, 2023

MarcoGorelli commented Aug 5, 2023

henryharbeck commented Aug 5, 2023

"Inefficient map_*" warnings (tracking issue) #9968

"Inefficient map_*" warnings (tracking issue) #9968

Comments

MarcoGorelli commented Jul 19, 2023 • edited by alexander-beedie Loading

lucazanna commented Jul 19, 2023

alexander-beedie commented Jul 20, 2023

lucazanna commented Jul 23, 2023 • edited Loading

alexander-beedie commented Jul 24, 2023 • edited Loading

lucazanna commented Jul 24, 2023

MarcoGorelli commented Jul 24, 2023

lucazanna commented Jul 24, 2023 • edited Loading

MarcoGorelli commented Jul 24, 2023 • edited Loading

ritchie46 commented Jul 24, 2023 • edited Loading

alexander-beedie commented Jul 24, 2023 • edited Loading

lucazanna commented Jul 25, 2023

henryharbeck commented Jul 26, 2023

lucazanna commented Jul 31, 2023

cmdlineluser commented Aug 2, 2023

henryharbeck commented Aug 5, 2023

MarcoGorelli commented Aug 5, 2023

henryharbeck commented Aug 5, 2023

MarcoGorelli commented Aug 5, 2023

MarcoGorelli commented Aug 5, 2023

henryharbeck commented Aug 5, 2023

MarcoGorelli commented Jul 19, 2023 •

edited by alexander-beedie

Loading

lucazanna commented Jul 23, 2023 •

edited

Loading

alexander-beedie commented Jul 24, 2023 •

edited

Loading

lucazanna commented Jul 24, 2023 •

edited

Loading

MarcoGorelli commented Jul 24, 2023 •

edited

Loading

ritchie46 commented Jul 24, 2023 •

edited

Loading

alexander-beedie commented Jul 24, 2023 •

edited

Loading