Handle Categorical Boolean values #3960

bchen1116 · 2023-01-26T17:10:01Z

Previously, if we passed in a categorical column (ie 'yes', 'no') that gets inferred as boolean through WW's new boolean typing, the infer_feature_types line in LabelEncoder.inverseTransform will transform the mapped column back into boolean.

This PR puts up a fix for that.

codecov · 2023-01-26T17:20:23Z

Codecov Report

Merging #3960 (b8e2497) into main (10a4980) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #3960     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        347     347             
  Lines      36776   36790     +14     
=======================================
+ Hits       36656   36670     +14     
  Misses       120     120

Impacted Files	Coverage Δ
.../components/transformers/encoders/label_encoder.py	`100.0% <100.0%> (ø)`
evalml/tests/component_tests/test_label_encoder.py	`100.0% <100.0%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Cmancuso · 2023-01-26T18:12:41Z

evalml/pipelines/components/transformers/encoders/label_encoder.py

@@ -46,6 +47,7 @@ def fit(self, X, y):
        if y is None:
            raise ValueError("y cannot be None!")
        y_ww = infer_feature_types(y)
+        self.original_typing = str(y_ww.ww.logical_type)


I don't know if we want to just put a note into some places to remove this once we fully deprecate typelib.

@Cmancuso I think we need to keep this functionality as long as woodwork transforms 1/0, yes/no etc. to True/False unless that change was made for typelib

I thought we weren't seeing this issue in schemaUpdate?

on the EvalML side we'd still want to output the original form of the target instead of outputting True/False if the target is boolean or boolean inferable.

Yeah, this will only be an issue for OS users when they use it in this specific scenario. If users pass in a yes/no dataset, they should still receive yes/no predictions, so I think this is needed.

jeremyliweishih

LGTM

jeremyliweishih · 2023-01-26T18:15:50Z

evalml/tests/component_tests/test_label_encoder.py

+    X = pd.DataFrame({})
+    # binary
+    y = pd.Series(["yes", "yes", "no", "yes"])
+    y = ww.init_series(y, logical_type="Categorical")


does it matter if we init this series as Boolean or Categorical?

The behavior is the same, but if the type is passed as categorical, we would expect it to remain categorical rather than bool

I agree that we should probably parametrize this over both Boolean and `Categorical.

…abel_encoder_bool

chukarsten · 2023-01-26T18:28:00Z

evalml/tests/component_tests/test_label_encoder.py

@@ -221,3 +221,17 @@ def test_label_encoder_with_positive_label_with_custom_indices():
    y_with_custom_indices = pd.Series(["b", "a", "a"], index=[5, 6, 7])
    _, y_transformed = encoder.transform(None, y_with_custom_indices)
    assert_index_equal(y_with_custom_indices.index, y_transformed.index)
+
+
+def test_label_encoder_categorical_boolean_values():


I would repeat the same advice for this test as I would for Becca's - what motivated the adding of this test case? I am not sure the test name as-is reflects the "why" of why this test exists. I think it's worthwhile to add a little context via a docstring to the test function...perhaps link it to a woodwork version upgrade.

chukarsten · 2023-01-26T18:28:17Z

evalml/tests/component_tests/test_label_encoder.py

+    X = pd.DataFrame({})
+    # binary
+    y = pd.Series(["yes", "yes", "no", "yes"])
+    y = ww.init_series(y, logical_type="Categorical")


I agree that we should probably parametrize this over both Boolean and `Categorical.

chukarsten

Thanks @bchen1116 !

update label encoder

f83e97f

bchen1116 self-assigned this Jan 26, 2023

update release notes

10e7113

Merge branch 'main' into label_encoder_bool

26c2875

Cmancuso reviewed Jan 26, 2023

View reviewed changes

jeremyliweishih approved these changes Jan 26, 2023

View reviewed changes

bchen1116 added 2 commits January 26, 2023 13:33

add comment

d7acfe9

Merge branch 'label_encoder_bool' of github.com:alteryx/evalml into l…

4903775

…abel_encoder_bool

eccabay approved these changes Jan 26, 2023

View reviewed changes

update comment

0677c72

bchen1116 enabled auto-merge (squash) January 26, 2023 18:54

chukarsten suggested changes Jan 26, 2023

View reviewed changes

bchen1116 added 2 commits January 26, 2023 14:08

lint

5a0783f

update release

b8e2497

chukarsten approved these changes Jan 26, 2023

View reviewed changes

bchen1116 merged commit 2dcdcba into main Jan 26, 2023

bchen1116 deleted the label_encoder_bool branch January 26, 2023 20:40

chukarsten mentioned this pull request Jan 26, 2023

Release v0.66.1 #3961

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle Categorical Boolean values #3960

Handle Categorical Boolean values #3960

bchen1116 commented Jan 26, 2023

codecov bot commented Jan 26, 2023 •

edited

Loading

Cmancuso Jan 26, 2023

jeremyliweishih Jan 26, 2023

Cmancuso Jan 26, 2023

jeremyliweishih Jan 26, 2023

bchen1116 Jan 26, 2023

jeremyliweishih left a comment

jeremyliweishih Jan 26, 2023

bchen1116 Jan 26, 2023

chukarsten Jan 26, 2023

chukarsten Jan 26, 2023

chukarsten Jan 26, 2023

chukarsten left a comment

Handle Categorical Boolean values #3960

Handle Categorical Boolean values #3960

Conversation

bchen1116 commented Jan 26, 2023

codecov bot commented Jan 26, 2023 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeremyliweishih left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chukarsten left a comment

Choose a reason for hiding this comment

codecov bot commented Jan 26, 2023 •

edited

Loading