Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(rust, python): allow nonstrict cast of categorical/enum to enum #14910

Merged
merged 5 commits into from
Mar 10, 2024

Conversation

mcrumiller
Copy link
Contributor

@mcrumiller mcrumiller commented Mar 7, 2024

Resolves #14900.

@c-peters the downside of this PR is that it no longer prints your nice error messages when a value falls outside of the target Enum's categories, and instead assigns it a null value. Strict (normal) casts will still properly fail the casts because of the new null values introduced, but non-strict casts now convert values outside to null. On the plus side, this reduces the code a bit and removes two levels of option/result indirection. Here is the example from the issue after this PR:

>>> import polars as pl
>>> pl.Series(["a", "b"]).cast(pl.Enum(["a"]), strict=False)
shape: (2,)
Series: '' [enum]
[
        "a"
        null
]
>>> pl.Series(["a", "b"], dtype=pl.Categorical).cast(pl.Enum(["a"]), strict=False)
shape: (2,)
Series: '' [enum]
[
        "a"
        null
]
>>> pl.Series(["a", "b"], dtype=pl.Enum(["a", "b"])).cast(pl.Enum(["a"]), strict=False)
shape: (2,)
Series: '' [enum]
[
        "a"
        null
]

strict (regular) casting still fails:

>>> import polars as pl
>>> pl.Series(["a", "b"], dtype=pl.Categorical).cast(pl.Enum(["a"]))
polars.exceptions.ComputeError: conversion from `cat` to `enum` failed in column '' for 1 out of 2 values: ["b"]

@github-actions github-actions bot added fix Bug fix rust Related to Rust Polars labels Mar 7, 2024
@mcrumiller mcrumiller changed the title fix(rust): allow nonstrict of categorical/enum to enum fix(rust, python): allow nonstrict cast of categorical/enum to enum Mar 7, 2024
@github-actions github-actions bot added the python Related to Python Polars label Mar 7, 2024
@deanm0000
Copy link
Collaborator

I'm not sure how strictness works. It seems like typically during a cast you just return None when it fails to cast for any particular value but then I'm guessing (b/c I don't see it here) that elsewhere it compares the input and output for nulls and if there are nulls in the output that weren't null in the input then that's when the strictness flag comes into play. In this case the error was baked into the casting so that's why it didn't respect strict?

Copy link

codecov bot commented Mar 7, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.01%. Comparing base (d9dead6) to head (239e184).
Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #14910      +/-   ##
==========================================
- Coverage   81.01%   81.01%   -0.01%     
==========================================
  Files        1332     1332              
  Lines      172870   172843      -27     
  Branches     2458     2458              
==========================================
- Hits       140054   140031      -23     
+ Misses      32348    32344       -4     
  Partials      468      468              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mcrumiller
Copy link
Contributor Author

In this case the error was baked into the casting so that's why it didn't respect strict?

Yes. strict=False means if the value cannot be casted, a null value is returned:

>>> s = pl.Series(["a"])
>>> s.cast(pl.Date, strict=False)
shape: (1,)
Series: '' [date]
[
        null
]
>>> s.cast(pl.Int32, strict=False)
shape: (1,)
Series: '' [i32]
[
        null
]

There are some situations where strict=False doesn't return null:

>>> pl.Series([[1]]).cast(pl.Utf8, strict=False)
polars.exceptions.ComputeError: cannot cast List type (inner: 'Int64', to: 'String')

However, I feel it's reasonable to expecte that a non-strict cast from one enum to another should be treated the same way as a cast of string to enum--values outside of the categories are converted to null.

@deanm0000
Copy link
Collaborator

Sorry I know what strict is supposed to do. I meant I don't know about the implementation of it in rust.

@mcrumiller
Copy link
Contributor Author

Ahh, sorry. Yes, here is the Series.strict_cast, which calls first calls regular cast (which is fallible and returns a Result, but usually can succeed). Then it checks if strict is True and, if there are more nulls after the cast, it means something failed, and it errors.

@mcrumiller mcrumiller marked this pull request as draft March 8, 2024 14:44
@mcrumiller mcrumiller marked this pull request as ready for review March 8, 2024 14:54
@mcrumiller
Copy link
Contributor Author

Failures are related chrono update and should be fixed by #14928.

Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap, good one. Thank you @mcrumiller .

@ritchie46 ritchie46 merged commit 419b891 into pola-rs:main Mar 10, 2024
23 checks passed
@mcrumiller mcrumiller deleted the enum-nonstrict-cast branch March 12, 2024 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix Bug fix python Related to Python Polars rust Related to Rust Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cast(pl.Enum, strict=False) fails when casting from another enum/categorical
3 participants