-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metric used for reducing number of features? [GAFeatureSelectionCV] #151
Comments
Hi @nonajet, the metric used is the total number of selected features, so here is roughly how it goes First, we register what is called the FitnessMax, which is the optimization criteria if criteria not in Criteria.list():
raise ValueError(f"Criteria must be one of {Criteria.list()}, got {criteria} instead")
# Minimization is handle like an optimization problem with a change in the score sign
elif criteria == Criteria.max.value:
self.criteria_sign = 1.0
elif criteria == Criteria.min.value:
self.criteria_sign = -1.0
def _register(self):
"""
This function is the responsible for registering the DEAPs necessary methods
and create other objects to hold the hof, logbook and stats.
"""
self.toolbox = base.Toolbox()
# Criteria sign to set max or min problem
# And -1.0 as second weight to minimize number of features
creator.create("FitnessMax", base.Fitness, weights=[self.criteria_sign, -1.0])
creator.create("Individual", list, fitness=creator.FitnessMax) As you can see I added two weights one that depends if we want to maximize or minimize the metric (accuracy, f1-score, etc), and the second one is -1 indicating we want to minimize the second criterion, which is the number of features Then there is a function called evaluate, which is used to judge the quality of each solution, this is register to use as the scoring function self.toolbox.register("evaluate", self.evaluate) The function returns a tuple (actually a list) with the cv-score and the number of features selected return [score, n_selected_features] This is what the model takes as input to calculate the overall fitness |
Thanks for the swift reply @rodrigo-arenas! |
Hi @nonajet, to figure out which solution is better it does use a Pareto dominance criterion, which is handled by the underlying DEAP package You can check more details here: https://deap.readthedocs.io/en/master/api/base.html#fitness |
I'm closing this issue for now, let me know if you have further questions |
The doc states that 'The features (variables) used by the estimator are found by optimizing the cv-scores and by minimizing the number of features' in the GAFeatureSelectionCV class. For the cv-score I guess the scoring metric (e.g. accuracy, F1...) is used.
What metric is used to reduce the number of features and how is that achieved? I have not found anything in the source code yet.
Appreciate the help!
The text was updated successfully, but these errors were encountered: