Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError when running model.predict(X_test) in 02-1-train-and-test-sherlock.ipynb #48

Open
KentonParton opened this issue Apr 12, 2022 · 5 comments

Comments

@KentonParton
Copy link

Hello!

I am trying to use the pre-built 'sherlock' model to make predictions. As suggested in the readme, I have run some of the cells in the 02-1-train-and-test-sherlock.ipynb file but get a KeyError when model.predict(X_test) is run.

Code to Reproduce:

model_id = 'sherlock'

from ast import literal_eval
from collections import Counter
from datetime import datetime

import numpy as np
import pandas as pd

from sklearn.metrics import f1_score, classification_report

from sherlock.deploy.model import SherlockModel

start = datetime.now()
print(f'Started at {start}')

X_test = pd.read_parquet('../data/processed/X_test.parquet')
y_test = pd.read_parquet('../data/raw/test_labels.parquet').values.flatten()

y_test = np.array([x.lower() for x in y_test])

print(f'Finished at {datetime.now()}, took {datetime.now() - start} seconds')

start = datetime.now()
print(f'Started at {start}')

model = SherlockModel();
model.initialize_model_from_json(with_weights=True, model_id="sherlock");

print('Initialized model.')
print(f'Finished at {datetime.now()}, took {datetime.now() - start} seconds')

predicted_labels = model.predict(X_test)
predicted_labels = np.array([x.lower() for x in predicted_labels])

When model.predict(X_test) is run the following KeyError occurs:

KeyError                                  Traceback (most recent call last)
/var/folders/66/cbb21km104n7d7t9qf61q8rmrsjdc8/T/ipykernel_21846/2316637303.py in <module>
----> 1 predicted_labels = model.predict(X_test)
      2 predicted_labels = np.array([x.lower() for x in predicted_labels])

~/ebsco_repos/sherlock-project/sherlock/deploy/model.py in predict(self, X, model_id)
    118         Array with predictions for X.
    119         """
--> 120         y_pred = self.predict_proba(X, model_id)
    121         y_pred_classes = helpers._proba_to_classes(y_pred, model_id)
    122 

~/ebsco_repos/sherlock-project/sherlock/deploy/model.py in predict_proba(self, X, model_id)
    141         y_pred = self.model.predict(
    142             [
--> 143                 X[feature_cols_dict["char"]].values,
    144                 X[feature_cols_dict["word"]].values,
    145                 X[feature_cols_dict["par"]].values,

KeyError: "['n_[^]-agg-sum', 'n_[^]-agg-max', 'n_[\\\\]-agg-kurtosis', 'n_[^]-agg-var', 'n_[\\\\]-agg-median', 'n_[^]-agg-kurtosis', 'n_[\\\\]-agg-mean', 'n_[\\\\]-agg-all', 'n_[^]-agg-min', 'n_[\\\\]-agg-sum', 'n_[^]-agg-median', 'n_[^]-agg-mean', 'n_[^]-agg-all', 'n_[\\\\]-agg-min', 'n_[\\\\]-agg-max', 'n_[^]-agg-any', 'n_[\\\\]-agg-var', 'n_[\\\\]-agg-any', 'n_[^]-agg-skewness', 'n_[\\\\]-agg-skewness'] not in index"

Is there something that I am missing or need to do prior to running the above code?

Appreciate the help!

@KentonParton
Copy link
Author

@lowecg @madelonhulsebos would you mind providing some guidance, please?

@lowecg
Copy link
Contributor

lowecg commented Apr 22, 2022

Hi Kenton,

Sorry for the delay - I missed your original post. I'll have a look at this in the morning.

To get a lay of the land:

It sounds like you've initialised the project and just run 02-1-train-and-test-sherlock.ipynb? Was that all you ran?

Could you confirm what version of Python you're running?

Cheers,

Chris.

@madelonhulsebos
Copy link
Collaborator

madelonhulsebos commented Apr 22, 2022 via email

@madelonhulsebos
Copy link
Collaborator

Hi @KentonParton,

I ran your code and it works for me once I use the test data file that was created by running the notebook 01-data-processing.ipynb (this file is named test.parquet). Did you generate X_test.parquet with that as well? What does it contain? Its head should be as follows:

Screenshot 2022-04-23 at 10 29 09

If you just want to test the model with some custom input, I recommend using the notebook: 00-use-sherlock-out-of-the-box.ipynb.

@madelonhulsebos
Copy link
Collaborator

Hi @KentonParton, did you solve the issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants