Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

assert_series_equal with categorical_as_str does not work on nested Categoricals #16196

Closed
2 tasks done
stinodego opened this issue May 13, 2024 · 0 comments · Fixed by #16700
Closed
2 tasks done

assert_series_equal with categorical_as_str does not work on nested Categoricals #16196

stinodego opened this issue May 13, 2024 · 0 comments · Fixed by #16700
Assignees
Labels
A-dtype-categorical Area: categorical data type A-dtype-struct Area: struct data type accepted Ready for implementation bug Something isn't working P-low Priority: low python Related to Python Polars

Comments

@stinodego
Copy link
Contributor

stinodego commented May 13, 2024

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl
from polars.testing import assert_series_equal

# Global
with pl.StringCache():
    s1 = pl.Series(["c0"], dtype=pl.Categorical)
    s2 = pl.Series(["c1"], dtype=pl.Categorical)

    s_global = pl.DataFrame([s1, s2]).to_struct("col0")

# Local
s1 = pl.Series(["c0"], dtype=pl.Categorical)
s2 = pl.Series(["c1"], dtype=pl.Categorical)
s_local = pl.DataFrame([s1, s2]).to_struct("col0")

assert_series_equal(s_global, s_local, categorical_as_str=True)

Log output

Traceback (most recent call last):
  File "/home/stijn/code/polars/py-polars/repro.py", line 16, in <module>
    assert_series_equal(s_global, s_local, categorical_as_str=True)
  File "/home/stijn/code/polars/py-polars/polars/testing/asserts/series.py", line 105, in assert_series_equal
    _assert_series_values_equal(
  File "/home/stijn/code/polars/py-polars/polars/testing/asserts/series.py", line 174, in _assert_series_values_equal
    raise_assertion_error(
  File "/home/stijn/code/polars/py-polars/polars/testing/asserts/utils.py", line 12, in raise_assertion_error
    raise AssertionError(msg) from cause
AssertionError: Series are different (exact value mismatch)
[left]:  [{'column_0': 'c0', 'column_1': 'c1'}]
[right]: [{'column_0': 'c0', 'column_1': 'c1'}]

Issue description

With categorical_as_str=True, categoricals should be treated as their string representation. This is currently not the case for nested types.

The offending code is here:

# Handle categoricals
if categorical_as_str:
if left.dtype == Categorical:
left = left.cast(String)
if right.dtype == Categorical:
right = right.cast(String)

Clearly this will not work for nested types.

Expected behavior

Assertion should pass

Installed versions

main

@stinodego stinodego added bug Something isn't working python Related to Python Polars A-dtype-categorical Area: categorical data type P-low Priority: low A-dtype-struct Area: struct data type labels May 13, 2024
@github-project-automation github-project-automation bot moved this to Ready in Backlog May 13, 2024
@github-project-automation github-project-automation bot moved this from Ready to Done in Backlog Jun 4, 2024
@c-peters c-peters added the accepted Ready for implementation label Jun 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-dtype-categorical Area: categorical data type A-dtype-struct Area: struct data type accepted Ready for implementation bug Something isn't working P-low Priority: low python Related to Python Polars
Projects
Archived in project
2 participants