Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metric used for reducing number of features? [GAFeatureSelectionCV] #151

Closed
nonajet opened this issue May 31, 2024 · 4 comments
Closed

Metric used for reducing number of features? [GAFeatureSelectionCV] #151

nonajet opened this issue May 31, 2024 · 4 comments

Comments

@nonajet
Copy link

nonajet commented May 31, 2024

The doc states that 'The features (variables) used by the estimator are found by optimizing the cv-scores and by minimizing the number of features' in the GAFeatureSelectionCV class. For the cv-score I guess the scoring metric (e.g. accuracy, F1...) is used.
What metric is used to reduce the number of features and how is that achieved? I have not found anything in the source code yet.
Appreciate the help!

@rodrigo-arenas
Copy link
Owner

Hi @nonajet, the metric used is the total number of selected features, so here is roughly how it goes

First, we register what is called the FitnessMax, which is the optimization criteria

        if criteria not in Criteria.list():
            raise ValueError(f"Criteria must be one of {Criteria.list()}, got {criteria} instead")
        # Minimization is handle like an optimization problem with a change in the score sign
        elif criteria == Criteria.max.value:
            self.criteria_sign = 1.0
        elif criteria == Criteria.min.value:
            self.criteria_sign = -1.0

    def _register(self):
        """
        This function is the responsible for registering the DEAPs necessary methods
        and create other objects to hold the hof, logbook and stats.
        """
        self.toolbox = base.Toolbox()

        # Criteria sign to set max or min problem
        # And -1.0 as second weight to minimize number of features
        creator.create("FitnessMax", base.Fitness, weights=[self.criteria_sign, -1.0])
        creator.create("Individual", list, fitness=creator.FitnessMax)

As you can see I added two weights one that depends if we want to maximize or minimize the metric (accuracy, f1-score, etc), and the second one is -1 indicating we want to minimize the second criterion, which is the number of features

Then there is a function called evaluate, which is used to judge the quality of each solution, this is register to use as the scoring function

self.toolbox.register("evaluate", self.evaluate)

The function returns a tuple (actually a list) with the cv-score and the number of features selected

return [score, n_selected_features]

This is what the model takes as input to calculate the overall fitness

@nonajet
Copy link
Author

nonajet commented May 31, 2024

Thanks for the swift reply @rodrigo-arenas!
Alright, now I see where the two objectives (maximize score/metric and minimize no. of features) are set.
But what I still not quite understand is how the two objectives are evaluated for one individual. As I see it, this is then an multi-objective problem meaning that we should get something like a Pareto front, don't we?
To be more specific, let's take this example:
We have two individual which are to be evaluated: A has a score/metric value of 80 and 5 features used, B has score of 85 and 7 features.
How would the algorithm value each of these individuals in regards to the two objectives (max. score & min. no. of features)?
Or in other words: Which individual is better? A or B?
Thanks in advance for taking the time!

@rodrigo-arenas
Copy link
Owner

Hi @nonajet, to figure out which solution is better it does use a Pareto dominance criterion, which is handled by the underlying DEAP package

You can check more details here:

https://deap.readthedocs.io/en/master/api/base.html#fitness
https://deap.readthedocs.io/en/master/api/tools.html#deap.tools

@rodrigo-arenas
Copy link
Owner

I'm closing this issue for now, let me know if you have further questions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants