-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
min/max operations on list with empty and/or None elements #13978
Labels
A-dtype-list/array
Area: list/array data type
bug
Something isn't working
P-medium
Priority: medium
python
Related to Python Polars
Comments
FBruzzesi
added
bug
Something isn't working
needs triage
Awaiting prioritization by a maintainer
python
Related to Python Polars
labels
Jan 25, 2024
FBruzzesi
changed the title
min/max operations on list empty and none elements
min/max operations on list with empty and/or None elements
Jan 25, 2024
can reproduce. Interesting and weird bug 😮 pl.DataFrame(
{"lst": [[1, 2, 3], []]},
).with_columns(max=pl.col("lst").list.max())
# shape: (2, 2)
# ┌───────────┬──────────────────────┐
# │ lst ┆ max │
# │ --- ┆ --- │
# │ list[i64] ┆ i64 │
# ╞═══════════╪══════════════════════╡
# │ [1, 2, 3] ┆ 3 │
# │ [] ┆ -9223372036854775808 │ <<<<<<<<<<<< whoooooopsi 😱
# └───────────┴──────────────────────┘
pl.DataFrame(
{"lst": [[], [None, 1], [1, 2, 3]]},
).with_columns(max=pl.col("lst").list.max())
# shape: (3, 2)
# ┌───────────┬──────┐
# │ lst ┆ max │
# │ --- ┆ --- │
# │ list[i64] ┆ i64 │
# ╞═══════════╪══════╡
# │ [] ┆ null │ <<<<<<<<<<<<<<<<<<<<<< same value but correct? 🤔
# │ [null, 1] ┆ 1 │
# │ [1, 2, 3] ┆ 3 │
# └───────────┴──────┘ |
I can reproduce in python, but not in rust: fn main() {
let file = fs::File::open("list.parquet").unwrap();
let df = ParquetReader::new(file).finish().unwrap();
println!("{df}");
// ==>
shape: (2, 1)
┌───────────┐
│ c │
│ --- │
│ list[i64] │
╞═══════════╡
│ [1, 2, 3] │
│ [] │
└───────────┘
// <==
let df = df.lazy().select([col("c").list().min()]).collect().unwrap();
println!("{df}");
// ==>
shape: (2, 1)
┌──────┐
│ c │
│ --- │
│ i64 │
╞══════╡
│ 1 │
│ null │
└──────┘
// <==
} |
There seems to be different codepaths if there are nulls: polars/crates/polars-ops/src/chunked_array/list/min_max.rs Lines 206 to 212 in 9381381
>>> pl.Series([[]], dtype=pl.List(pl.Int64)).list.max()
shape: (1,)
Series: '' [i64]
[
-9223372036854775808
]
|
reswqa
added
P-medium
Priority: medium
A-dtype-list/array
Area: list/array data type
and removed
needs triage
Awaiting prioritization by a maintainer
labels
Jan 26, 2024
Thanks for reporting this, will take a look. |
2 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-dtype-list/array
Area: list/array data type
bug
Something isn't working
P-medium
Priority: medium
python
Related to Python Polars
Checks
Reproducible example
Log output
Issue description
Breaking down to two cases:
Expected behavior
I would expect the first case to behave as the second, namely output null for empty lists
Installed versions
The text was updated successfully, but these errors were encountered: