Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when/then/otherwise silently converts values to nulls for Enum series #14953

Closed
2 tasks done
mcrumiller opened this issue Mar 9, 2024 · 4 comments · Fixed by #15052
Closed
2 tasks done

when/then/otherwise silently converts values to nulls for Enum series #14953

mcrumiller opened this issue Mar 9, 2024 · 4 comments · Fixed by #15052
Assignees
Labels
A-dtype-categorical Area: categorical data type accepted Ready for implementation bug Something isn't working python Related to Python Polars

Comments

@mcrumiller
Copy link
Contributor

mcrumiller commented Mar 9, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Issue description

When an Enum is used in a when/then chain, and the result is not in the Enum's categories, the result is silently converted to null, when it should raise an Exception.

import polars as pl
from polars import col, when

df = pl.DataFrame({
    "a": pl.Series(["a", "b"], dtype=pl.Enum(["a", "b"]))
})
df.with_columns(
    when(col("a") == "a").then(col("a"))
    .otherwise(pl.lit("c"))
)
shape: (2, 1)
┌──────┐
│ a    │
│ ---  │
│ enum │
╞══════╡
│ a    │
│ null │
└──────┘

Installed versions

--------Version info---------
Polars:               0.20.14
Index type:           UInt32
Platform:             Windows-10-10.0.19045-SP0
Python:               3.11.7 (tags/v3.11.7:fa7a6f2, Dec  4 2023, 19:24:49) [MSC v.1937 64 bit (AMD64)]

----Optional dependencies----
adbc_driver_manager:  <not installed>
cloudpickle:          <not installed>
connectorx:           0.3.2
deltalake:            <not installed>
fastexcel:            <not installed>
fsspec:               <not installed>
gevent:               <not installed>
hvplot:               <not installed>
matplotlib:           3.8.2
numpy:                1.26.2
openpyxl:             3.1.2
pandas:               2.1.4
pyarrow:              14.0.1
pydantic:             <not installed>
pyiceberg:            <not installed>
pyxlsb:               <not installed>
sqlalchemy:           2.0.23
xlsx2csv:             0.8.2
xlsxwriter:           3.1.9
@mcrumiller mcrumiller added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Mar 9, 2024
@mcrumiller
Copy link
Contributor Author

Here is an open question: should the supertype of an enum be a categorical?

@c-peters
Copy link
Collaborator

No, I would not be in favor of casting Enum's to categoricals that would lose the purpose of the validation.

The goal of the Enum datatype is to provide validation and allow for fast operations on other Enum columns with the same encoding. This should raise an error since c is not a member of categories. I haven't had a time to look, but I think our recent change in non-strict casting might have caused this.

I will have a look later this week.

@c-peters c-peters self-assigned this Mar 11, 2024
@c-peters c-peters added A-dtype-categorical Area: categorical data type accepted Ready for implementation and removed needs triage Awaiting prioritization by a maintainer labels Mar 11, 2024
@mcrumiller
Copy link
Contributor Author

No, I would not be in favor of casting Enum's to categoricals that would lose the purpose of the validation.

I agree, was just raising the question.

I haven't had a time to look, but I think our recent change in non-strict casting might have caused this.

Are you referring to #14910? That went into 20.15, this issue existed in 20.14 prior to that PR.

@c-peters
Copy link
Collaborator

I'm referring to #14728. I'm assuming there is a non strict cast somewhere that leads to the null

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-dtype-categorical Area: categorical data type accepted Ready for implementation bug Something isn't working python Related to Python Polars
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants