-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Inefficient map_*" warnings (tracking issue) #9968
Comments
Can we also include uppercase, lowercase and title case ? I have seen cases of using the Python string methods for those with apply instead of the Polars expressions |
Aha! Got a super-clean approach for handling the |
I just read an article with a code snippet using apply for a simple if condition. I wonder if ternary operators (value_if_true if condition else value_if_false) could also be included ? |
Got a sample? I need to rework/extend the current handling of |
Yes, here is an example:
https://towardsdatascience.com/manipulating-values-in-polars-dataframes-1087d88dd436 Unrelated to this - will the recommendation engine also work on small non-lambda Python functions? |
you mean like this? In [2]: def func(value):
...: return value **2
...:
In [3]: df.select(pl.col('a').apply(func))
<ipython-input-3-53ff52cbccd5>:1: PolarsInefficientApplyWarning:
Expr.apply is significantly slower than the native expressions API.
Only use if you absolutely CANNOT implement your logic otherwise.
In this case, you can replace your `apply` with an expression:
- pl.col("a").apply(func)
+ pl.col("a") ** 2
df.select(pl.col('a').apply(func)) If so, yup! |
@MarcoGorelli nice! what about: def func(value):
if value > 10:
return "a"
elif value > 0:
return "b"
else:
return "c" I imagine it will also work when if and else are added? |
I expect that if the respective lambda function were to work, then that one should work as well - but we'll make sure to test for it explicitly, thanks! EDIT: this would actually require a little extra work, as the corresponding lambda would be equivalent to def func(x):
return 'a' if x>10 else 'b' if x>0 else 'c' so thanks for having brought it up |
Note that polars speculatively evaluates branches in |
Yes; it doesn't matter if you pass a lambda, function, or method on a class - they will all be disassembled down into the same primitive ops. However, I'm only considering single-return functions/lambdas, so multiple-return functions won't work (as they aren't quite the same thing as lambdas).
I'm liking the vote of confidence here 🤣 Control flow from bytecode can be tricky ( |
if there is one person who can I do it, I know it's you @alexander-beedie |
Not sure if this is pushing it too far / asking for too much (apologies in advance if it is), but I did think of some potential niceties around conditionals. Will leave it to you all in terms of whether you think it is reasonable and/or feasible. Just putting thoughts out there. If a function only has checks for equality with an optional else clause, then that could be translated to df = pl.DataFrame({"gender": ["M", "F", "M", "X"]})
def long_gender(row):
if row == "M":
rv = "Male"
elif row == "F":
rv = "Female"
else:
rv = "Unknown"
return rv
df.with_columns(
pl.col("gender").apply(long_gender).alias("bad_way"),
pl.col("gender").map_dict({"M": "Male", "F": "Female"}, default="Unknown").alias("good_way")
) If a function only has a single numerical comparison operator (i.e. only one of df = pl.DataFrame({"score": range(1, 11)})
def grade_score_le(row):
if row <= 5:
rv = "Fail"
elif row <= 7:
rv = "Pass"
else:
rv = "Distinction"
return rv
def grade_score_ge(row):
if row >= 8:
rv = "Distinction"
elif row >= 6:
rv = "Pass"
else:
rv = "Fail"
return rv
# others operators omitted for brevity
df.with_columns(
# both functions return the same thing, so can be translated into the same `cut`
*(pl.col("score").apply(fn).alias(f"bad_way_{fn.__name__}") for fn in [grade_score_le, grade_score_ge]),
(
pl.when(pl.col("score") <= 5).then("Fail")
.when(pl.col("score") <= 7).then("Pass")
.otherwise("Distinction")
).alias("good_way"),
pl.col("score").cut([5, 7], ["Fail", "Pass", "Distinction"]).alias("better_way")
) Unsure of any other optimisations, but I'm guessing the general rule would be that conditionals would be translated to when/then/otherwise? |
Added an issue here: #10210 for the if/else recommendation |
https://stackoverflow.com/questions/76822683/polars-apply-lambda-alternative Could be an example/test-case for list lookups. |
Hi @alexander-beedie, @MarcoGorelli, I notice that the issue description mentions (and has ticked) both bare numpy function and those used with a
At the moment, the Example: df = pl.DataFrame({"a": [1, 4]})
df.select(pl.col("a").apply(lambda x: np.sin(x))) # no warning raised
df.select(pl.col("a").apply(np.sin)) # warning is raised Flagging here as I'm unsure if this is an issue, or just hasn't been implemented yet. |
thanks for the report - this warns for me: In [4]: df = pl.DataFrame({"a": [1, 4]})
...: df.select(pl.col("a").apply(lambda x: np.sin(x))) # no warning raised
<ipython-input-4-e464a21bac84>:2: PolarsInefficientApplyWarning:
Expr.apply is significantly slower than the native expressions API.
Only use if you absolutely CANNOT implement your logic otherwise.
In this case, you can replace your `apply` with the following:
- pl.col("a").apply(lambda x: ...)
+ pl.col("a").sin()
df.select(pl.col("a").apply(lambda x: np.sin(x))) # no warning raised
Out[4]:
shape: (2, 1)
┌───────────┐
│ a │
│ --- │
│ f64 │
╞═══════════╡
│ 0.841471 │
│ -0.756802 │
└───────────┘
In [5]: pl.__version__
Out[5]: '0.18.11' could you give you polars and python versions please? |
Thanks for the quick response |
thanks! can reproduce, fix (and failing test) incoming! |
@henryharbeck are you running this in IPython / Jupyter? I think it's that they apply some modifications and end up producing slightly different bytecode If you make a Python script with just the following: import numpy as np
import polars as pl
df = pl.DataFrame({"a": [1, 4]})
df.select(pl.col("a").apply(lambda x: np.sin(x))) , do you get the warning? I do, but don't when running via IPython (in Python 3.11) |
@MarcoGorelli, I was running it in Jupyter. Great stuff on figuring that out! As a python script, the warning is produced. When running it as the first cell in a Jupyter notebook, no warning is produced. |
@alexander-beedie let's start a new issue to keep track of ideas in this space
lambda x: x+1
)and/or
chains)lambda x: my_dict[x]
)lambda x: json.loads(x)
or barejson.loads
)lambda x: np.sin(x)
or barenp.sin
)if/else
)lambda lst: ' '.join([str(x) for x in lst])
to_datetime
:lambda x: dt.datetime.strptime(x, fmt)
, if possible (see this issue. Perhaps the warning could even suggest thepyarrow
workaround - better than nothing, and much better thanapply
)lambda x: my_list[x]
->.map_dict({idx: val for idx, val in enumerate(my_list)})
lambda a: 1 / (1 + np.exp(-a))
The text was updated successfully, but these errors were encountered: