We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The code below is okay to get importance_df.
import pandas as pd from sklearn.ensemble import RandomForestClassifier from sklearn.datasets import load_breast_cancer, load_iris from sklearn.model_selection import KFold from lofo import LOFOImportance, Dataset, plot_importance data = load_breast_cancer(as_frame=True)# load as dataframe df = data.data df['target']=data.target.values # model model = RandomForestClassifier() # dataset dataset = Dataset(df=df, target="target", features=[col for col in df.columns if col != 'target']) # get feature importance cv = KFold(n_splits=5, shuffle=True, random_state=666) lofo_imp = LOFOImportance(dataset, cv=cv, scoring="f1",model=model) importance_df = lofo_imp.get_importance() print(importance_df)
But if we modify load_breast_cancer to load_iris, the importance_df values are all NaN.
load_breast_cancer
load_iris
Is the lofo-importance only support binary classification?
The text was updated successfully, but these errors were encountered:
FLOFO multiclass classification result is correct.
from lofo import FLOFOImportance import pandas as pd from sklearn.ensemble import RandomForestClassifier from sklearn.datasets import load_breast_cancer, load_iris from sklearn.model_selection import KFold from lofo import LOFOImportance, Dataset, plot_importance # step-01: prepare data data = load_iris(as_frame=True)# load as dataframe x_data = data.data.to_numpy() y_data = data.target.values df = data.data df['target']=data.target.values # repeat more data since FLOFO need > 1000 data df=pd.DataFrame(pd.np.repeat(df.values,10,axis=0),columns=df.columns) # step-02: train model model = RandomForestClassifier() model.fit(x_data,y_data) # step-03: fast-lofo lofo_imp = FLOFOImportance(validation_df=df, target="target", features=[col for col in df.columns if col != 'target'],scoring="f1_macro",trained_model=model) importance_df = lofo_imp.get_importance() print(importance_df)
Sorry, something went wrong.
Modify scoring="f1" to scoring="f1_macro" fixed the issue. Since multiclass f1 value should calculated by f1_macro or f1_micro.
scoring="f1"
scoring="f1_macro"
No branches or pull requests
The code below is okay to get importance_df.
But if we modify
load_breast_cancer
toload_iris
, the importance_df values are all NaN.Is the lofo-importance only support binary classification?
The text was updated successfully, but these errors were encountered: