Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusing (& wrong) behavior when using with_columns incorrectly #6486

Closed
2 tasks done
mkleinbort-ic opened this issue Jan 27, 2023 · 3 comments · Fixed by #6497
Closed
2 tasks done

Confusing (& wrong) behavior when using with_columns incorrectly #6486

mkleinbort-ic opened this issue Jan 27, 2023 · 3 comments · Fixed by #6497
Assignees
Labels
bug Something isn't working python Related to Python Polars

Comments

@mkleinbort-ic
Copy link

mkleinbort-ic commented Jan 27, 2023

Polars version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Issue description

I accidentally wrote this code:

import polars as pl

df = pl.DataFrame({
    'x1': [1,2,4,8,16,32],
    'x2': [1,2,3,4,5,6]
})

df.with_columns(pctChange = pl.col(['x1', 'x2']).pct_change())

>>>
shape: (6, 3)
┌─────┬─────┬───────────┐
│ x1x2pctChange │
│ ---------       │
│ i64i64f64       │
╞═════╪═════╪═══════════╡
│ 11null      │
│ 221.0       │
│ 430.5       │
│ 840.333333  │
│ 1650.25      │
│ 3260.2       │
└─────┴─────┴───────────┘

This is the result I'd expect if I were taking the pct_change of the x2 column, but it quietly ignores x1.

Two behaviours seem appropiate to me:

  1. Raise an error when assigning a column using a dataframe
  2. Create a struct type column.
# Should behave like df.with_columns(pctChange = pl.struct(pl.col(['x1', 'x2']).pct_change()))
df.with_columns(pctChange = pl.col(['x1', 'x2']).pct_change())
>>>
shape: (6, 3)
┌─────┬─────┬────────────────┐
│ x1x2pctChange      │
│ ---------            │
│ i64i64struct[2]      │
╞═════╪═════╪════════════════╡
│ 11   ┆ {null,null}    │
│ 22   ┆ {1.0,1.0}      │
│ 43   ┆ {1.0,0.5}      │
│ 84   ┆ {1.0,0.333333} │
│ 165   ┆ {1.0,0.25}     │
│ 326   ┆ {1.0,0.2}      │
└─────┴─────┴────────────────┘

In either case, the current behavior definitively violated the "don't surprise programmers" mantra.

Reproducible example

import polars as pl

df = pl.DataFrame({
    'x1': [1,2,4,8,16,32],
    'x2': [1,2,3,4,5,6]
})

df.with_columns(pctChange = pl.col(['x1', 'x2']).pct_change())

Expected behavior

Should return the same as

import polars as pl

df = pl.DataFrame({
    'x1': [1,2,4,8,16,32],
    'x2': [1,2,3,4,5,6]
})

df.with_columns(pctChange = pl.struct(pl.col(['x1', 'x2']).pct_change()))

Or raise an error

Installed versions

---Version info---
Polars: 0.15.15
Index type: UInt32
Platform: Windows-10-10.0.22621-SP0
Python: 3.10.7 (tags/v3.10.7:6cc6b13, Sep  5 2022, 14:08:36) [MSC v.1933 64 bit (AMD64)]
---Optional dependencies---
pyarrow: 8.0.0
pandas: 1.5.2
numpy: 1.22.4
fsspec: 2022.8.2
connectorx: 0.3.1
xlsx2csv: <not installed>
matplotlib: 3.6.2
@mkleinbort-ic mkleinbort-ic added bug Something isn't working python Related to Python Polars labels Jan 27, 2023
@ritchie46
Copy link
Member

@alexander-beedie could you take this one? This is related to the keyword argument assignment.

@mkleinbort-ic You can use the explicit alias() until this is fixed.

@alexander-beedie alexander-beedie self-assigned this Jan 27, 2023
@mkleinbort-ic
Copy link
Author

I'm happy on my end, it's just a sharp corner I thought I'd raise

@alexander-beedie
Copy link
Collaborator

alexander-beedie commented Jan 27, 2023

I'm happy on my end, it's just a sharp corner I thought I'd raise

@mkleinbort-ic: and many thanks for that - I've found a way to automatically structify this type of call (which does look like the right way to handle things), so the hoped-for behaviour should work by default in an upcoming release.

Update:

  • Note that the auto-structify behaviour is considered experimental, and requires opt-in via...

    pl.Config.set_auto_structify(True)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working python Related to Python Polars
Projects
None yet
3 participants