Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Groupby/agg on pl.LazyFrame fails when a column is added using .with_column() #6054

Closed
2 tasks done
jkc1 opened this issue Jan 5, 2023 · 1 comment · Fixed by #6058
Closed
2 tasks done

Groupby/agg on pl.LazyFrame fails when a column is added using .with_column() #6054

jkc1 opened this issue Jan 5, 2023 · 1 comment · Fixed by #6058
Labels
bug Something isn't working python Related to Python Polars

Comments

@jkc1
Copy link

jkc1 commented Jan 5, 2023

Polars version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of Polars.

Issue description

I've got some code that essentially does the following:

  • Reads some data into a pl.LazyFrame
  • Appends a pl.Series to the pl.LazyFrame using pl.LazyFrame.with_column()
  • Does a .groupby().agg().collect()

This code worked fine until I updated my dependencies from 0.14.30 to 0.15. The documentation appears to indicate that this behavior should still work, so I believe this is a bug.

A minimal working example and the traceback is below.

Reproducible example

import polars as pl

df = pl.DataFrame({"col_1": [0] * 5 + [1] * 5})
ser = pl.Series("col_2", list(range(10)))

# these three lines below work as expected
#df.lazy().groupby("col_1").agg(pl.col("*").count()).collect()
#df.lazy().with_column(ser).collect()
#df.with_column(ser).groupby("col_1").agg(pl.col("*").count())

# this works in <0.15 but raises an exception in >=0.15
df.lazy().with_column(ser).groupby("col_1").agg(pl.col("*").count()).collect()

Expected behavior

Actual output:

Traceback (most recent call last):
  File ".../bug_example.py", line 15, in <module>
    df.lazy().with_column(ser).groupby("col_1").agg(pl.col("*").count()).collect()
  File ".../.venv/lib/python3.9/site-packages/polars/utils.py", line 337, in wrapper
    return fn(*args, **kwargs)
  File ".../.venv/lib/python3.9/site-packages/polars/internals/lazyframe/frame.py", line 1154, in collect
    return pli.wrap_df(ldf.collect())
exceptions.ShapeError: Could not add column. The Series length 10 differs from the DataFrame height: 1

Expected output: nothing

Installed versions

---Version info---
Polars: 0.15.11
Index type: UInt32
Platform: macOS-12.6.2-x86_64-i386-64bit
Python: 3.9.16 (main, Dec  8 2022, 10:02:15) 
[Clang 14.0.0 (clang-1400.0.29.202)]
---Optional dependencies---
pyarrow: <not installed>
pandas: <not installed>
numpy: <not installed>
fsspec: <not installed>
connectorx: <not installed>
xlsx2csv: <not installed>
matplotlib: <not installed>
None
@jkc1 jkc1 added bug Something isn't working python Related to Python Polars labels Jan 5, 2023
@jkc1
Copy link
Author

jkc1 commented Jan 5, 2023

Thanks for the extremely quick fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working python Related to Python Polars
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant