Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extracting features importance #71

Closed
faisalaleissa opened this issue Oct 12, 2018 · 2 comments
Closed

Extracting features importance #71

faisalaleissa opened this issue Oct 12, 2018 · 2 comments
Labels

Comments

@faisalaleissa
Copy link

  • MLBlocks version: '0.2.0'
  • Python version: Python 3.6.3
  • Operating System: MacOS High Sierra

Description

I have been trying to produce the features importance from a classifier (RandomForrest). The feature_importances function doesn't take arguments. So, the MLPipeline doesn't execute (TypeError: 'numpy.ndarray' object is not callable).

What I Did

I left the arguments empty in the primitive definition file in the produce part.


{
    "name": "sklearn.ensemble.RandomForestClassifier1",
    "documentation": "http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html",
    "description": "Scikit-learn RandomForestClassifier.",
    "classifiers": {
        "type": "estimator",
        "subtype": "classifier"
    },
    "modalities": [],
    "primitive": "sklearn.ensemble.RandomForestClassifier",
    "fit": {
        "method": "fit",
        "args": [
            {
                "name": "X",
                "type": "DataFrame"
            },
            {
                "name": "y",
                "type": "Series"
            }
        ]
    },
    "produce": {
        "method": "feature_importances_",
        "args": [],
        "output": [
            {
                "name": "y",
                "type": "Series"
            }
        ]
    },
    "hyperparameters": {
        "fixed": {
            "n_jobs": {
                "type": "int",
                "default": -1
            }
        },
        "tunable": {
            "criterion": {
                "type": "str",
                "default": "entropy",
                "values": ["entropy", "gini"]
            },
            "max_features": {
                "type": "str",
                "default": null,
                "range": [null, "auto", "log2"]
            },
            "max_depth": {
                "type": "int",
                "default": 10,
                "range": [1, 30]
            },
            "min_samples_split": {
                "type": "float",
                "default": 0.1,
                "range": [0.0001, 0.5]
            },
            "min_samples_leaf": {
                "type": "float",
                "default": 0.1,
                "range": [0.0001, 0.5]
            },
            "n_estimators": {
                "type": "int",
                "default": 30,
                "values": [2, 500]
            },
            "class_weight": {
                "type": "str",
                "default": null,
                "range": [null, "balanced"]
            }
        }
    }
}

@csala
Copy link
Contributor

csala commented Oct 12, 2018

Thanks for reporting this @faisalaleissa

However, I think that the problem is that feature_importances_ is not a method, but rather an attribute, so instead of calling it (which is what MLBlocks will try when you set it as the method), what you would need to do is just "read it", which is not something natively supported by MLBlocks.

So, in order to do this, the only option that you have is to write a python primitive in MLPrimitives that in its fit method calls the RandomForest fit method and in its produce method just takes the attribute from the RandomForest object and returns it.

However, a question arises: what do you intentend to do with the feature_importances_ value?

If what you want to do is a feature selection based on that, I recommend you to have a look at the SelectFromModel class from scikit-learn and the corresponding MLBlocks integration, which you can find here:

@faisalaleissa
Copy link
Author

Thanks for the quick reply. I'm working on a classifier reporting library, in which, we compute metrics and evaluate the classifiers. One part is to rank the features that are most affective for the classifier.

I think the feature selector you mentioned will be suitble.

Thanks again.

@csala csala closed this as completed Oct 23, 2018
@csala csala added the question label Dec 13, 2018
gsheni pushed a commit that referenced this issue Aug 29, 2022
…tributors

Change the "author" entry in the JSONs to a "contributors" list
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants