Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong predictions while testing new data #47

Open
SivaNagendra-sn opened this issue Apr 4, 2022 · 7 comments
Open

Wrong predictions while testing new data #47

SivaNagendra-sn opened this issue Apr 4, 2022 · 7 comments

Comments

@SivaNagendra-sn
Copy link

I have trained a Sherlock model and it is performing well in test data. But, when I tried test the model and passing the data to model as per ' Sherlock-out-of-the-box' notebook, then it is giving wrong predictions ( even passing training data(in the same way) also results in wrong predictions). Any separate l approach need to be taken for testing the data ?
Note : I have created my own paragraph vector w.r.to data I have and using that for training Sherlock model as well.

@madelonhulsebos
Copy link
Collaborator

Hi @SivaNagendra-sn,

Thanks for reporting your problem here. Did you change the identifiers when initializing the model and making inferences with it? The "sherlock" identifiers in the respective parts of the notebook should be replaced with the identifier that you gave to the newly trained model.

Madelon

@SivaNagendra-sn
Copy link
Author

Hi @madelonhulsebos
Thanks for the reply. I have replaced the paragraph vector file(.pkl) for extracting features and training Sherlock model.
Identifiers means which are under 'feature column identifiers(.tsv files)''. If so, we have not changed anything in that .tsv files, Also can u elaborate on what needs to be changed there? If not, can u explain what actually those identifiers are ?

@madelonhulsebos
Copy link
Collaborator

Hi @SivaNagendra-sn,

To use the model retrained with the new paragraph vector files, the model_id occurrences in the notebook ("sherlock" in the attached screen shot) should be replaced with the identifier of the new model:

Screenshot 2022-04-05 at 16 19 29

No changes should be made to the feature identifiers in the .tsv files. I hope that helps.

@SivaNagendra-sn
Copy link
Author

Yeah, I have actually done that. While training the Sherlock model I have mentioned the model_id as "retrained_sherlock". While working with predict function also we are mentioning model_id as "retrained_sherlock". For test data it is giving the results with good accuracy. But testing with new data( i.e, extracting features with 'extract_features' function then using predict function with model_id mentioned as retrained_sherlock also) the predictions were totally wrong ☹️.

@iganand
Copy link

iganand commented Apr 5, 2022

I have retraining and prediction working on new data but if it's mostly text type of fields. For numeric fields and length of size 12 or more it does not work well.. prediction vector returned is null even though classification score and output for test data looks good . Do you have any suggetions? @madelonhulsebos

@madelonhulsebos
Copy link
Collaborator

madelonhulsebos commented Apr 6, 2022

OK, that should be alright then @SivaNagendra-sn. Is your training data formatted exactly as the original training data (as downloaded through the data download)? The feature extraction pipeline expects "stringified" lists. The input data may be wrong in your case as well, @iganand.

@iganand
Copy link

iganand commented Apr 7, 2022

I am getting null in prediction vector. Although classification report for that specific field looks to F1 score .87. What might be the reason

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants