Extracting features importance #71

faisalaleissa · 2018-10-12T18:05:48Z

MLBlocks version: '0.2.0'
Python version: Python 3.6.3
Operating System: MacOS High Sierra

Description

I have been trying to produce the features importance from a classifier (RandomForrest). The feature_importances function doesn't take arguments. So, the MLPipeline doesn't execute (TypeError: 'numpy.ndarray' object is not callable).

What I Did

I left the arguments empty in the primitive definition file in the produce part.


{
    "name": "sklearn.ensemble.RandomForestClassifier1",
    "documentation": "http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html",
    "description": "Scikit-learn RandomForestClassifier.",
    "classifiers": {
        "type": "estimator",
        "subtype": "classifier"
    },
    "modalities": [],
    "primitive": "sklearn.ensemble.RandomForestClassifier",
    "fit": {
        "method": "fit",
        "args": [
            {
                "name": "X",
                "type": "DataFrame"
            },
            {
                "name": "y",
                "type": "Series"
            }
        ]
    },
    "produce": {
        "method": "feature_importances_",
        "args": [],
        "output": [
            {
                "name": "y",
                "type": "Series"
            }
        ]
    },
    "hyperparameters": {
        "fixed": {
            "n_jobs": {
                "type": "int",
                "default": -1
            }
        },
        "tunable": {
            "criterion": {
                "type": "str",
                "default": "entropy",
                "values": ["entropy", "gini"]
            },
            "max_features": {
                "type": "str",
                "default": null,
                "range": [null, "auto", "log2"]
            },
            "max_depth": {
                "type": "int",
                "default": 10,
                "range": [1, 30]
            },
            "min_samples_split": {
                "type": "float",
                "default": 0.1,
                "range": [0.0001, 0.5]
            },
            "min_samples_leaf": {
                "type": "float",
                "default": 0.1,
                "range": [0.0001, 0.5]
            },
            "n_estimators": {
                "type": "int",
                "default": 30,
                "values": [2, 500]
            },
            "class_weight": {
                "type": "str",
                "default": null,
                "range": [null, "balanced"]
            }
        }
    }
}

The text was updated successfully, but these errors were encountered:

csala · 2018-10-12T18:21:43Z

Thanks for reporting this @faisalaleissa

However, I think that the problem is that feature_importances_ is not a method, but rather an attribute, so instead of calling it (which is what MLBlocks will try when you set it as the method), what you would need to do is just "read it", which is not something natively supported by MLBlocks.

So, in order to do this, the only option that you have is to write a python primitive in MLPrimitives that in its fit method calls the RandomForest fit method and in its produce method just takes the attribute from the RandomForest object and returns it.

However, a question arises: what do you intentend to do with the feature_importances_ value?

If what you want to do is a feature selection based on that, I recommend you to have a look at the SelectFromModel class from scikit-learn and the corresponding MLBlocks integration, which you can find here:

faisalaleissa · 2018-10-12T18:43:47Z

Thanks for the quick reply. I'm working on a classifier reporting library, in which, we compute metrics and evaluate the classifiers. One part is to rank the features that are most affective for the classifier.

I think the feature selector you mentioned will be suitble.

Thanks again.

…tributors Change the "author" entry in the JSONs to a "contributors" list

csala mentioned this issue Oct 12, 2018

Support reading an attribute from a class instead of calling a method? #72

Open

csala closed this as completed Oct 23, 2018

csala added the question label Dec 13, 2018

gsheni pushed a commit that referenced this issue Aug 29, 2022

Merge pull request #71 from HDI-Project/issue_68_change_author_to_con…

b7bc2bf

…tributors Change the "author" entry in the JSONs to a "contributors" list

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extracting features importance #71

Extracting features importance #71

faisalaleissa commented Oct 12, 2018

csala commented Oct 12, 2018

faisalaleissa commented Oct 12, 2018

Extracting features importance #71

Extracting features importance #71

Comments

faisalaleissa commented Oct 12, 2018

Description

What I Did

csala commented Oct 12, 2018

faisalaleissa commented Oct 12, 2018