You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.
Issue description
When including a numpy array or numpy float in a polars expression to create a new column, there is a quirk when the numpy array or float is the numerator of a quotient with a polars column in the denominator. The reciprocal of the correct answer is returned. Using pl.lit or casting the numerator to a float avoids this issue with the naive approach. There is no equivalent problem with multiplication (commutative) or with division problems in which the polars column is in the numerator. This suggests that the quirk has something to do with how numpy floats are treated as numerators in polars expression division.
Reproducible example
importpolarsasplimportnumpyasnpdata=pl.DataFrame({
"a": [0.5, 1.0, 2.0]
})
# examples of failures (compared to similar# approaches that yield the desired result)data.with_columns(
[
(np.float64(2.0) /pl.col("a")).alias(
"Fails with float, yields reciprocal"),
(np.array([2, 2, 2]) /pl.col("a")).alias(
"Fails with array of same size ""as polars column"),
(2.0/pl.col("a")).alias(
"works with regular float"),
(float(np.float64(2.0)) /pl.col("a")).alias(
"works if cast numpy to float"),
(pl.lit(np.float64(2.0)) /pl.col("a")).alias(
"works with polars literal")
])
# numpy floats work as expected in multiplicationdata.with_columns(
[
(np.float64(2.0) *pl.col("a")).alias(
"Works with multiplication"),
])
# numpy floats work as expected as denominatorsdata.with_columns(
[
(pl.col("a") /np.float64(2.0)).alias(
"Works with division by ""numpy float"),
])
# numpy int in numerator throws errordata.with_columns(
[
(np.int64(2) /pl.col("a")).alias(
"This throws an error"),
])
# python int in numerator behaves as expecteddata.with_columns(
[
(int(2) /pl.col("a")).alias(
"works with python int in numerator"),
])
# numpy array of ints gets coerced to float,# without error, and then has same reciprocal# issuedata.with_columns(
[
(np.array([2, 2, 2]).astype('int') /pl.col("a")).alias(
"same float division quirk"),
])
Output:
>>> # examples of failures (compared to similar
>>> # approaches that yield the desired result)
>>> data.with_columns(
... [
... (np.float64(2.0) / pl.col("a")).alias(
... "Fails with float, yields reciprocal"),
... (np.array([2, 2, 2]) / pl.col("a")).alias(
... "Fails with array of same size "
... "as polars column"),
... (2.0 / pl.col("a")).alias(
... "works with regular float"),
... (float(np.float64(2.0)) / pl.col("a")).alias(
... "works if cast numpy to float"),
... (pl.lit(np.float64(2.0)) / pl.col("a")).alias(
... "works with polars literal")
... ])
shape: (3, 6)
┌─────┬──────────────────────┬──────────────────────┬─────────────────────┬─────────────────────┬─────────────────────┐
│ a ┆ Fails with float, ┆ Fails with array of ┆ works with regular ┆ works if cast numpy ┆ works with polars │
│ --- ┆ yields recipro... ┆ same size as... ┆ float ┆ to float ┆ literal │
│ f64 ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 │
╞═════╪══════════════════════╪══════════════════════╪═════════════════════╪═════════════════════╪═════════════════════╡
│ 0.5 ┆ 0.25 ┆ 0.25 ┆ 4.0 ┆ 4.0 ┆ 4.0 │
│ 1.0 ┆ 0.5 ┆ 0.5 ┆ 2.0 ┆ 2.0 ┆ 2.0 │
│ 2.0 ┆ 1.0 ┆ 1.0 ┆ 1.0 ┆ 1.0 ┆ 1.0 │
└─────┴──────────────────────┴──────────────────────┴─────────────────────┴─────────────────────┴─────────────────────┘
>>>
>>> # numpy floats work as expected in multiplication
>>> data.with_columns(
... [
... (np.float64(2.0) * pl.col("a")).alias(
... "Works with multiplication"),
... ])
shape: (3, 2)
┌─────┬───────────────────────────┐
│ a ┆ Works with multiplication │
│ --- ┆ --- │
│ f64 ┆ f64 │
╞═════╪═══════════════════════════╡
│ 0.5 ┆ 1.0 │
│ 1.0 ┆ 2.0 │
│ 2.0 ┆ 4.0 │
└─────┴───────────────────────────┘
>>>
>>> # numpy floats work as expected as denominators
>>> data.with_columns(
... [
... (pl.col("a") / np.float64(2.0)).alias(
... "Works with division by "
... "numpy float"),
... ])
shape: (3, 2)
┌─────┬─────────────────────────────────────┐
│ a ┆ Works with division by numpy flo... │
│ --- ┆ --- │
│ f64 ┆ f64 │
╞═════╪═════════════════════════════════════╡
│ 0.5 ┆ 0.25 │
│ 1.0 ┆ 0.5 │
│ 2.0 ┆ 1.0 │
└─────┴─────────────────────────────────────┘
>>>
>>>
>>> # numpy int in numerator throws error
>>> data.with_columns(
... [
... (np.int64(2) / pl.col("a")).alias(
... "This throws an error"),
... ])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "***/frame.py", line 5792, in with_columns
self.lazy().with_columns(exprs, **named_exprs).collect(no_optimization=True)
File "***/frame.py", line 1146, in collect
return pli.wrap_df(ldf.collect())
exceptions.ComputeError: ValueError: Unsupported type <class 'numpy.int64'> for 2.
>>>
>>>
>>> # python int in numerator behaves as expected
>>> data.with_columns(
... [
... (int(2) / pl.col("a")).alias(
... "works with python int in numerator"),
... ])
shape: (3, 2)
┌─────┬─────────────────────────────────────┐
│ a ┆ works with python int in numerat... │
│ --- ┆ --- │
│ f64 ┆ f64 │
╞═════╪═════════════════════════════════════╡
│ 0.5 ┆ 4.0 │
│ 1.0 ┆ 2.0 │
│ 2.0 ┆ 1.0 │
└─────┴─────────────────────────────────────┘
>>>
>>>
>>> # numpy array of ints gets coerced to float,
>>> # without error, and then has same reciprocal
>>> # issue
>>> data.with_columns(
... [
... (np.array([2, 2, 2]).astype('int') / pl.col("a")).alias(
... "same float division quirk"),
... ])
shape: (3, 2)
┌─────┬───────────────────────────┐
│ a ┆ same float division quirk │
│ --- ┆ --- │
│ f64 ┆ f64 │
╞═════╪═══════════════════════════╡
│ 0.5 ┆ 0.25 │
│ 1.0 ┆ 0.5 │
│ 2.0 ┆ 1.0 │
└─────┴───────────────────────────┘
Expected behavior
These alternatives all yield the expected behavior:
data.with_columns(
[
(2.0 / pl.col("a")).alias(
"works with regular float"),
(float(np.float64(2.0)) / pl.col("a")).alias(
"works if cast numpy to float"),
(pl.lit(np.float64(2.0)) / pl.col("a")).alias(
"works with polars literal")
])
shape: (3, 4)
┌─────┬──────────────────────────┬──────────────────────────────┬───────────────────────────┐
│ a ┆ works with regular float ┆ works if cast numpy to float ┆ works with polars literal │
│ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 ┆ f64 │
╞═════╪══════════════════════════╪══════════════════════════════╪═══════════════════════════╡
│ 0.5 ┆ 4.0 ┆ 4.0 ┆ 4.0 │
│ 1.0 ┆ 2.0 ┆ 2.0 ┆ 2.0 │
│ 2.0 ┆ 1.0 ┆ 1.0 ┆ 1.0 │
└─────┴──────────────────────────┴──────────────────────────────┴───────────────────────────┘
dylanhmorris
changed the title
Quirk in polars expression division when numpy scalar in numerator
Quirk in polars expression division when numpy float in numerator
Feb 3, 2023
i.e. the other way around. The ufunc takes priority over Expr.__rtruediv__, using that directly works but is obviously not ideal.
It also explains why wrapping in a literal works, that avoids the ufunc.
I will think of a way to fix this, it seems we could fix this by not ignoring the position of the expression in the argument list.
Polars version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.
Issue description
When including a numpy array or numpy float in a polars expression to create a new column, there is a quirk when the numpy array or float is the numerator of a quotient with a polars column in the denominator. The reciprocal of the correct answer is returned. Using
pl.lit
or casting the numerator to afloat
avoids this issue with the naive approach. There is no equivalent problem with multiplication (commutative) or with division problems in which the polars column is in the numerator. This suggests that the quirk has something to do with how numpy floats are treated as numerators in polars expression division.Reproducible example
Output:
Expected behavior
These alternatives all yield the expected behavior:
Installed versions
The text was updated successfully, but these errors were encountered: