Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to use a relative score threshold, which can detect identical points passed to FPS #129

Merged

Conversation

Luthaf
Copy link
Collaborator

@Luthaf Luthaf commented May 3, 2022

Fixes #128

I'm not sure if this is the best approach to do so, but I think this is a case we want to detect automatically by default.

Score thresholding seems like it would be useful here as well, but since it works using absolute values, we can not enable it by default without knowing about the magnitude of features.

I could also modify score thresholding to work with relative scores, and use that instead.

@Luthaf Luthaf requested a review from rosecers May 3, 2022 10:57
Copy link
Collaborator

@rosecers rosecers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall there's still some messy edits in here -- most of my comments are "Why is this change part of this PR?"

I think this might be a good use for the score_threshold parameter, what do you think?

docs/source/conf.py Outdated Show resolved Hide resolved
examples/PlotLFRE.ipynb Outdated Show resolved Hide resolved
examples/PlotLFRE.ipynb Outdated Show resolved Hide resolved
skcosmo/_selection.py Outdated Show resolved Hide resolved
skcosmo/_selection.py Outdated Show resolved Hide resolved
skcosmo/decomposition/_kernel_pcovr.py Outdated Show resolved Hide resolved
skcosmo/metrics/_reconstruction_measures.py Outdated Show resolved Hide resolved
skcosmo/sample_selection/_voronoi_fps.py Outdated Show resolved Hide resolved
skcosmo/utils/_pcovr_utils.py Outdated Show resolved Hide resolved
@Luthaf
Copy link
Collaborator Author

Luthaf commented May 13, 2022

I think this might be a good use for the score_threshold parameter, what do you think?

The issue with thresholding is that it works using absolute values (as far as I understand); and here I would like to use relative values (i.e. stop when the score goes below a fraction of the first selection).

@ceriottm
Copy link
Collaborator

ceriottm commented May 14, 2022 via email

@Luthaf
Copy link
Collaborator Author

Luthaf commented May 16, 2022

One quick comment: I wouldn't throw an error if the same points are given as input, only if one starts selecting identical points.

Eliminating duplicates might be one of the reasons to use FPS in the first place.

The PR title was not very clear, but that's exactly what the code is doing: trying to detect identical point being selected & raising an error in this case.

@Luthaf Luthaf force-pushed the fps-selection-identical-points branch from 5780576 to 4afb616 Compare May 16, 2022 11:56
@Luthaf Luthaf force-pushed the fps-selection-identical-points branch from 4afb616 to 7b9b775 Compare May 16, 2022 13:27
@Luthaf Luthaf changed the title Detect identical points being fed to FPS and raise an error Allow to use a relative score threshold, which can detect identical points passed to FPS May 16, 2022
@Luthaf Luthaf requested a review from rosecers May 16, 2022 13:28
Copy link
Collaborator

@rosecers rosecers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I am good with this bar a few thoughts.

  1. I think it should be score_threshold_type, not score_threshold_kind. Kind is a bit of an atypical word to use in this context.
  2. I changed the docstring for clarity (this would need to be propagated throughout).

skcosmo/_selection.py Outdated Show resolved Hide resolved
skcosmo/_selection.py Outdated Show resolved Hide resolved
@Luthaf Luthaf force-pushed the fps-selection-identical-points branch from 7b9b775 to bff8d2c Compare May 16, 2022 14:34
@Luthaf Luthaf requested a review from rosecers May 16, 2022 14:58
@Luthaf
Copy link
Collaborator Author

Luthaf commented May 20, 2022

@rosecers is this good to go?

Copy link
Collaborator

@rosecers rosecers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to approve once there is a corresponding test!

This allows to stop selection early when identical points are fed to
FPS or another selector by using something like score_threshold=1e-12
and score_threshold_type="relative".
@Luthaf Luthaf force-pushed the fps-selection-identical-points branch from bff8d2c to 6038ace Compare May 23, 2022 09:51
@Luthaf Luthaf merged commit f7cf1d6 into scikit-learn-contrib:main May 25, 2022
@Luthaf Luthaf deleted the fps-selection-identical-points branch May 25, 2022 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Same index is present multiple times in FPS output during sample selection
3 participants