Quirk in polars expression division when numpy float in numerator #6666

dylanhmorris · 2023-02-03T21:56:33Z

Polars version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of Polars.

Issue description

When including a numpy array or numpy float in a polars expression to create a new column, there is a quirk when the numpy array or float is the numerator of a quotient with a polars column in the denominator. The reciprocal of the correct answer is returned. Using pl.lit or casting the numerator to a float avoids this issue with the naive approach. There is no equivalent problem with multiplication (commutative) or with division problems in which the polars column is in the numerator. This suggests that the quirk has something to do with how numpy floats are treated as numerators in polars expression division.

Reproducible example

import polars as pl
import numpy as np

data = pl.DataFrame({
    "a": [0.5, 1.0, 2.0]
})

# examples of failures (compared to similar
# approaches that yield the desired result)
data.with_columns(
    [
        (np.float64(2.0) / pl.col("a")).alias(
            "Fails with float, yields reciprocal"),
        (np.array([2, 2, 2]) / pl.col("a")).alias(
            "Fails with array of same size "
            "as polars column"),
        (2.0 / pl.col("a")).alias(
            "works with regular float"),
        (float(np.float64(2.0)) / pl.col("a")).alias(
            "works if cast numpy to float"),
        (pl.lit(np.float64(2.0)) / pl.col("a")).alias(
            "works with polars literal")
    ])

# numpy floats work as expected in multiplication
data.with_columns(
    [
        (np.float64(2.0) * pl.col("a")).alias(
            "Works with multiplication"),
    ])

# numpy floats work as expected as denominators
data.with_columns(
    [
        (pl.col("a") / np.float64(2.0)).alias(
            "Works with division by "
            "numpy float"),
    ])


# numpy int in numerator throws error
data.with_columns(
    [
        (np.int64(2) / pl.col("a")).alias(
            "This throws an error"),
    ])


# python int in numerator behaves as expected
data.with_columns(
    [
        (int(2) / pl.col("a")).alias(
            "works with python int in numerator"),
    ])


# numpy array of ints gets coerced to float,
# without error, and then has same reciprocal
# issue
data.with_columns(
    [
        (np.array([2, 2, 2]).astype('int') / pl.col("a")).alias(
            "same float division quirk"),
    ])

Output:

>>> # examples of failures (compared to similar
>>> # approaches that yield the desired result)
>>> data.with_columns(
...     [
...         (np.float64(2.0) / pl.col("a")).alias(
...             "Fails with float, yields reciprocal"),
...         (np.array([2, 2, 2]) / pl.col("a")).alias(
...             "Fails with array of same size "
...             "as polars column"),
...         (2.0 / pl.col("a")).alias(
...             "works with regular float"),
...         (float(np.float64(2.0)) / pl.col("a")).alias(
...             "works if cast numpy to float"),
...         (pl.lit(np.float64(2.0)) / pl.col("a")).alias(
...             "works with polars literal")
...     ])
shape: (3, 6)
┌─────┬──────────────────────┬──────────────────────┬─────────────────────┬─────────────────────┬─────────────────────┐
│ a   ┆ Fails with float,    ┆ Fails with array of  ┆ works with regular  ┆ works if cast numpy ┆ works with polars   │
│ --- ┆ yields recipro...    ┆ same size as...      ┆ float               ┆ to float            ┆ literal             │
│ f64 ┆ ---                  ┆ ---                  ┆ ---                 ┆ ---                 ┆ ---                 │
│     ┆ f64                  ┆ f64                  ┆ f64                 ┆ f64                 ┆ f64                 │
╞═════╪══════════════════════╪══════════════════════╪═════════════════════╪═════════════════════╪═════════════════════╡
│ 0.5 ┆ 0.25                 ┆ 0.25                 ┆ 4.0                 ┆ 4.0                 ┆ 4.0                 │
│ 1.0 ┆ 0.5                  ┆ 0.5                  ┆ 2.0                 ┆ 2.0                 ┆ 2.0                 │
│ 2.0 ┆ 1.0                  ┆ 1.0                  ┆ 1.0                 ┆ 1.0                 ┆ 1.0                 │
└─────┴──────────────────────┴──────────────────────┴─────────────────────┴─────────────────────┴─────────────────────┘
>>> 
>>> # numpy floats work as expected in multiplication
>>> data.with_columns(
...     [
...         (np.float64(2.0) * pl.col("a")).alias(
...             "Works with multiplication"),
...     ])
shape: (3, 2)
┌─────┬───────────────────────────┐
│ a   ┆ Works with multiplication │
│ --- ┆ ---                       │
│ f64 ┆ f64                       │
╞═════╪═══════════════════════════╡
│ 0.5 ┆ 1.0                       │
│ 1.0 ┆ 2.0                       │
│ 2.0 ┆ 4.0                       │
└─────┴───────────────────────────┘
>>> 
>>> # numpy floats work as expected as denominators
>>> data.with_columns(
...     [
...         (pl.col("a") / np.float64(2.0)).alias(
...             "Works with division by "
...             "numpy float"),
...     ])
shape: (3, 2)
┌─────┬─────────────────────────────────────┐
│ a   ┆ Works with division by numpy flo... │
│ --- ┆ ---                                 │
│ f64 ┆ f64                                 │
╞═════╪═════════════════════════════════════╡
│ 0.5 ┆ 0.25                                │
│ 1.0 ┆ 0.5                                 │
│ 2.0 ┆ 1.0                                 │
└─────┴─────────────────────────────────────┘
>>> 
>>> 
>>> # numpy int in numerator throws error
>>> data.with_columns(
...     [
...         (np.int64(2) / pl.col("a")).alias(
...             "This throws an error"),
...     ])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "***/frame.py", line 5792, in with_columns
    self.lazy().with_columns(exprs, **named_exprs).collect(no_optimization=True)
  File "***/frame.py", line 1146, in collect
    return pli.wrap_df(ldf.collect())
exceptions.ComputeError: ValueError: Unsupported type <class 'numpy.int64'> for 2.
>>> 
>>> 
>>> # python int in numerator behaves as expected
>>> data.with_columns(
...     [
...         (int(2) / pl.col("a")).alias(
...             "works with python int in numerator"),
...     ])
shape: (3, 2)
┌─────┬─────────────────────────────────────┐
│ a   ┆ works with python int in numerat... │
│ --- ┆ ---                                 │
│ f64 ┆ f64                                 │
╞═════╪═════════════════════════════════════╡
│ 0.5 ┆ 4.0                                 │
│ 1.0 ┆ 2.0                                 │
│ 2.0 ┆ 1.0                                 │
└─────┴─────────────────────────────────────┘
>>> 
>>> 
>>> # numpy array of ints gets coerced to float,
>>> # without error, and then has same reciprocal
>>> # issue
>>> data.with_columns(
...     [
...         (np.array([2, 2, 2]).astype('int') / pl.col("a")).alias(
...             "same float division quirk"),
...     ])
shape: (3, 2)
┌─────┬───────────────────────────┐
│ a   ┆ same float division quirk │
│ --- ┆ ---                       │
│ f64 ┆ f64                       │
╞═════╪═══════════════════════════╡
│ 0.5 ┆ 0.25                      │
│ 1.0 ┆ 0.5                       │
│ 2.0 ┆ 1.0                       │
└─────┴───────────────────────────┘

Expected behavior

These alternatives all yield the expected behavior:

data.with_columns(
    [
        (2.0 / pl.col("a")).alias(
            "works with regular float"),
        (float(np.float64(2.0)) / pl.col("a")).alias(
            "works if cast numpy to float"),
        (pl.lit(np.float64(2.0)) / pl.col("a")).alias(
            "works with polars literal")
    ])

shape: (3, 4)
┌─────┬──────────────────────────┬──────────────────────────────┬───────────────────────────┐
│ a   ┆ works with regular float ┆ works if cast numpy to float ┆ works with polars literal │
│ --- ┆ ---                      ┆ ---                          ┆ ---                       │
│ f64 ┆ f64                      ┆ f64                          ┆ f64                       │
╞═════╪══════════════════════════╪══════════════════════════════╪═══════════════════════════╡
│ 0.5 ┆ 4.0                      ┆ 4.0                          ┆ 4.0                       │
│ 1.0 ┆ 2.0                      ┆ 2.0                          ┆ 2.0                       │
│ 2.0 ┆ 1.0                      ┆ 1.0                          ┆ 1.0                       │
└─────┴──────────────────────────┴──────────────────────────────┴───────────────────────────┘

Installed versions

---Version info---
Polars: 0.16.2
Index type: UInt32
Platform: macOS-12.5.1-x86_64-i386-64bit
Python: 3.10.9 (main, Dec  7 2022, 02:03:23) [Clang 13.0.0 (clang-1300.0.29.30)]
---Optional dependencies---
pyarrow: 9.0.0
pandas: 1.4.4
numpy: 1.24.1
fsspec: <not installed>
connectorx: <not installed>
xlsx2csv: 0.8
deltalake: <not installed>
matplotlib: 3.6.1

The text was updated successfully, but these errors were encountered:

zundertj · 2023-02-04T18:45:20Z

This behaviour occurs because it ends up call Expr.__array_ufunc__. Taking the first example,

data.with_columns(np.float64(2.) / pl.col("a"))

this unrolls into:

s = pl.Series("a", [
	0.5
	1.0
	2.0
]])
pl.col("a").map(divide(s, 2))

i.e. the other way around. The ufunc takes priority over Expr.__rtruediv__, using that directly works but is obviously not ideal.
It also explains why wrapping in a literal works, that avoids the ufunc.

I will think of a way to fix this, it seems we could fix this by not ignoring the position of the expression in the argument list.

dylanhmorris added bug Something isn't working python Related to Python Polars labels Feb 3, 2023

dylanhmorris changed the title ~~Quirk in polars expression division when numpy scalar in numerator~~ Quirk in polars expression division when numpy float in numerator Feb 3, 2023

zundertj mentioned this issue Feb 4, 2023

fix(python): Support numpy ufunc when expression not first arg #6675

Merged

ritchie46 closed this as completed in #6675 Feb 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quirk in polars expression division when numpy float in numerator #6666

Quirk in polars expression division when numpy float in numerator #6666

dylanhmorris commented Feb 3, 2023 •

edited

Loading

zundertj commented Feb 4, 2023 •

edited

Loading

Quirk in polars expression division when numpy float in numerator #6666

Quirk in polars expression division when numpy float in numerator #6666

Comments

dylanhmorris commented Feb 3, 2023 • edited Loading

Polars version checks

Issue description

Reproducible example

Expected behavior

Installed versions

zundertj commented Feb 4, 2023 • edited Loading

dylanhmorris commented Feb 3, 2023 •

edited

Loading

zundertj commented Feb 4, 2023 •

edited

Loading