Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

concat_list raises an error or returns an empty list if one of the filtered cols inside is empty #15208

Open
2 tasks done
avlonder opened this issue Mar 21, 2024 · 3 comments
Open
2 tasks done
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@avlonder
Copy link

avlonder commented Mar 21, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

df = pl.LazyFrame({'a': [1, 2], 'b': [3, 4], 'c': [0, 0]})

df.group_by('c').agg([
  pl.concat_list(
      pl.col('a').filter(pl.col('a').eq(1)),
      pl.col('b').filter(pl.col('b').ge(0))
  ).flatten()
])

works as expected:
shape: (1, 2)
┌─────┬─────────────┐
│ c ┆ a │
│ --- ┆ --- │
│ i64 ┆ list[i64] │
╞═════╪═════════════╡
│ 0 ┆ [1, 3, … 4] │
└─────┴─────────────┘

However, if either the first filter or the second filter evaluates to nothing, the behavior is unexpected:

df.group_by('c').agg([
                pl.concat_list(
                    pl.col('a').filter(pl.col('a').eq(5)),
                    pl.col('b').filter(pl.col('b').ge(0))
                ).flatten()
            ])

raises polars.exceptions.ShapeError: series length 2 does not match expected length of 0

and if you add an extra first() operation, no error is thrown and it silently returns a wrong empty list

df.group_by('c').agg([
                pl.concat_list(
                    pl.col('a').filter(pl.col('a').eq(5)),
                    pl.col('b').filter(pl.col('b').ge(0)).first()
                ).flatten()
            ])

shape: (1, 2)
┌─────┬───────────┐
│ c ┆ a │
│ --- ┆ --- │
│ i64 ┆ list[i64] │
╞═════╪═══════════╡
│ 0 ┆ [] │
└─────┴───────────┘

Log output

No response

Issue description

...

Expected behavior

...

Installed versions

--------Version info---------
Polars:               0.20.14
Index type:           UInt32
Platform:             Linux-6.1.0-1035-oem-x86_64-with-glibc2.35
Python:               3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           <not installed>
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           <not installed>
numpy:                1.26.4
openpyxl:             <not installed>

@avlonder avlonder added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Mar 21, 2024
@avlonder avlonder changed the title concat_list returns an empty list if one of the filtered cols inside is empty concat_list raises an error or returns an empty list if one of the filtered cols inside is empty Mar 21, 2024
@cmdlineluser
Copy link
Contributor

The working version also appears to be broken.

The 1 from a ends up in b

df.group_by('c').agg(
    pl.col('a').filter(pl.col('a').eq(1)),
    pl.col('b').filter(pl.col('b').ge(0))
)
# shape: (1, 3)
# ┌─────┬───────────┬───────────┐
# │ c   ┆ a         ┆ b         │
# │ --- ┆ ---       ┆ ---       │
# │ i64 ┆ list[i64] ┆ list[i64] │
# ╞═════╪═══════════╪═══════════╡
# │ 0   ┆ [1]       ┆ [3, 4]    │
# └─────┴───────────┴───────────┘
df.group_by('c').agg(
    pl.concat_list(
        pl.col('a').filter(pl.col('a').eq(1)),
        pl.col('b').filter(pl.col('b').ge(0))
    )
)
# shape: (1, 2)
# ┌─────┬──────────────────┐
# │ c   ┆ a                │
# │ --- ┆ ---              │
# │ i64 ┆ list[list[i64]]  │
# ╞═════╪══════════════════╡
# │ 0   ┆ [[1, 3], [1, 4]] │ # <- ???
# └─────┴──────────────────┘

.append() may be a possible workaround.

df.group_by('c').agg(
    pl.col('a').filter(pl.col('a').eq(1)).append(
        pl.col('b').filter(pl.col('b').ge(0))
    )
)

# shape: (1, 2)
# ┌─────┬───────────┐
# │ c   ┆ a         │
# │ --- ┆ ---       │
# │ i64 ┆ list[i64] │
# ╞═════╪═══════════╡
# │ 0   ┆ [1, 3, 4] │
# └─────┴───────────┘

@avlonder
Copy link
Author

Thanks @cmdlineluser, append works for my use case.

@NickCrews
Copy link

IDK if this is the same issue, but in Ibis we need to construct a polars list based on a python list literal of length 0 to N. For length >=1 we can use concat_list, but for length 0 we have to use pl.lit. It would be great it there was one API that could do both. Could we add a keyword-only type argument to concat_list()? I can move this to a separate issue if you want. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

3 participants