Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] WithoutLiersCV model selection #595

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

FBruzzesi
Copy link
Collaborator

@FBruzzesi FBruzzesi commented Nov 7, 2023

Description

Introduces WithoutLiersCV as discussed in #307. To be able to follow different cross validation strategies, the idea is to take a CV object as input and exclude the anomalous samples from the training indexes. All the splitting logic is delegated to the cv object.

Type of change

Checklist:

  • My code follows the style guidelines (flake8)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation (also to the readme.md)
  • I have added tests that prove my fix is effective or that my feature works
  • I have added tests to check whether the new feature adheres to the sklearn convention
  • New and existing unit tests pass locally with my changes

Comment on lines +675 to +678
cv = WithoutLiersCV(
cv=KFold(n_splits=3),
anomalous_label=1
)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd want @MBrouns to weight in on the name 😅 just to make sure.

But I'm also wondering if it's perhaps easier to the enduser to not require an anomalous label ... wouldn't it perhaps be better to pass in an outlier model? this outlier model could then internally train on X and determine which items are outliers. Or am I overthinking?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the conversation in the issue my understanding is slightly different. The goal of the CV is to validate anomaly detectors that do not train with different labels, namely the novelty detection ones. Therefore passing a novelty detection model would not be possible in the first place.

Now I agree that the name would suit both implementations 😁

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@koaning Potentially we could have two CV strategies:

  • WithoutLiersCV: takes any outlier detection model, train on X, and excludes outliers from train_indexes
  • NoveltyDetectorCV: what's in this PR, to be able to train a novelty detection algorithm on non-anomalous labels and evaluate on both anomalous and not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] CV solution for anomaly detection without outliers during training
2 participants