[Feature] `WithoutLiersCV` model selection #595

FBruzzesi · 2023-11-07T12:10:33Z

Description

Introduces WithoutLiersCV as discussed in #307. To be able to follow different cross validation strategies, the idea is to take a CV object as input and exclude the anomalous samples from the training indexes. All the splitting logic is delegated to the cv object.

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality): Fixes [FEATURE] CV solution for anomaly detection without outliers during training #307
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

My code follows the style guidelines (flake8)
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation (also to the readme.md)
I have added tests that prove my fix is effective or that my feature works
I have added tests to check whether the new feature adheres to the sklearn convention
New and existing unit tests pass locally with my changes

koaning · 2023-11-07T19:45:26Z

sklego/model_selection.py

+    cv = WithoutLiersCV(
+        cv=KFold(n_splits=3),
+        anomalous_label=1
+    )


I think I'd want @MBrouns to weight in on the name 😅 just to make sure.

But I'm also wondering if it's perhaps easier to the enduser to not require an anomalous label ... wouldn't it perhaps be better to pass in an outlier model? this outlier model could then internally train on X and determine which items are outliers. Or am I overthinking?

From the conversation in the issue my understanding is slightly different. The goal of the CV is to validate anomaly detectors that do not train with different labels, namely the novelty detection ones. Therefore passing a novelty detection model would not be possible in the first place.

Now I agree that the name would suit both implementations 😁

@koaning Potentially we could have two CV strategies:

WithoutLiersCV: takes any outlier detection model, train on X, and excludes outliers from train_indexes

NoveltyDetectorCV: what's in this PR, to be able to train a novelty detection algorithm on non-anomalous labels and evaluate on both anomalous and not.

withoutlierscv

06dfdaa

koaning reviewed Nov 7, 2023

View reviewed changes

Merge branch 'main' into feature/withoutliers

7ff906c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] `WithoutLiersCV` model selection #595

[Feature] `WithoutLiersCV` model selection #595

FBruzzesi commented Nov 7, 2023 •

edited

Loading

koaning Nov 7, 2023

FBruzzesi Nov 8, 2023

FBruzzesi Nov 10, 2023

[Feature] WithoutLiersCV model selection #595

Are you sure you want to change the base?

[Feature] WithoutLiersCV model selection #595

Conversation

FBruzzesi commented Nov 7, 2023 • edited Loading

Description

Type of change

Checklist:

koaning Nov 7, 2023

Choose a reason for hiding this comment

FBruzzesi Nov 8, 2023

Choose a reason for hiding this comment

FBruzzesi Nov 10, 2023

Choose a reason for hiding this comment

[Feature] `WithoutLiersCV` model selection #595

[Feature] `WithoutLiersCV` model selection #595

FBruzzesi commented Nov 7, 2023 •

edited

Loading