Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

diff() does not cast unsigned int to int #5923

Closed
2 tasks done
erinov1 opened this issue Dec 28, 2022 · 2 comments · Fixed by #6033
Closed
2 tasks done

diff() does not cast unsigned int to int #5923

erinov1 opened this issue Dec 28, 2022 · 2 comments · Fixed by #6033
Assignees
Labels
bug Something isn't working python Related to Python Polars

Comments

@erinov1
Copy link

erinov1 commented Dec 28, 2022

Polars version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Issue description

Using diff() on a column of unsigned ints causes wrap-around when the difference between two rows is negative. I guess there are cases where this is what you would want to compute, but it is not intuitive. For example:

df = pl.DataFrame(
    {
        "a": [20, 10, 30],
    }
)

df.select(pl.col("a").cast(pl.UInt64).diff())

shape: (3, 1)
┌──────────────────────┐
│ a                    │
│ ---                  │
│ u64                  │
╞══════════════════════╡
│ null                 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 18446744073709551606 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 20                   │
└──────────────────────┘

This is also confusing because pct_change correctly compute a negative percent change on this example:

df.select(pl.col("a").cast(pl.UInt64).pct_change())

shape: (3, 1)
┌──────┐
│ a    │
│ ---  │
│ f64  │
╞══════╡
│ null │
├╌╌╌╌╌╌┤
│ -0.5 │
├╌╌╌╌╌╌┤
│ 2.0  │
└──────┘

This seems dangerous, since it is easy to be unknowingly handed data that stores an integer column as unsigned.

Reproducible example

df = pl.DataFrame(
    {
        "a": [20, 10, 30],
    }
)

df.select(pl.col("a").cast(pl.UInt64).diff())

Expected behavior

I think that the result should be the intuitive one, namely

shape: (3, 1)
┌──────┐
│ a    │
│ ---  │
│ i64  │
╞══════╡
│ null │
├╌╌╌╌╌╌┤
│ -10  │
├╌╌╌╌╌╌┤
│ 20   │
└──────┘

Installed versions

---Version info---
Polars: 0.15.8
Index type: UInt32
Platform: macOS-10.15.7-x86_64-i386-64bit
Python: 3.8.12 (default, Oct 18 2021, 20:31:01) 
[Clang 10.0.1 (clang-1001.0.46.4)]
---Optional dependencies---
pyarrow: 10.0.0
pandas: 1.4.4
numpy: 1.21.5
fsspec: 2022.01.0
connectorx: 0.3.0
xlsx2csv: <not installed>
matplotlib: 3.6.2

@erinov1 erinov1 added bug Something isn't working python Related to Python Polars labels Dec 28, 2022
@zundertj
Copy link
Collaborator

Closing in favor of the more generic discussion about using unsigned integers in the Polars api in #4990. Please comment and/or upvote there.

@ritchie46
Copy link
Member

I think for diff it make sense to change output type to signed. I will take this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working python Related to Python Polars
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants