Wrong predictions while testing new data #47

SivaNagendra-sn · 2022-04-04T14:30:38Z

I have trained a Sherlock model and it is performing well in test data. But, when I tried test the model and passing the data to model as per ' Sherlock-out-of-the-box' notebook, then it is giving wrong predictions ( even passing training data(in the same way) also results in wrong predictions). Any separate l approach need to be taken for testing the data ?
Note : I have created my own paragraph vector w.r.to data I have and using that for training Sherlock model as well.

madelonhulsebos · 2022-04-04T17:13:09Z

Hi @SivaNagendra-sn,

Thanks for reporting your problem here. Did you change the identifiers when initializing the model and making inferences with it? The "sherlock" identifiers in the respective parts of the notebook should be replaced with the identifier that you gave to the newly trained model.

Madelon

SivaNagendra-sn · 2022-04-05T03:53:43Z

Hi @madelonhulsebos
Thanks for the reply. I have replaced the paragraph vector file(.pkl) for extracting features and training Sherlock model.
Identifiers means which are under 'feature column identifiers(.tsv files)''. If so, we have not changed anything in that .tsv files, Also can u elaborate on what needs to be changed there? If not, can u explain what actually those identifiers are ?

madelonhulsebos · 2022-04-05T14:24:52Z

Hi @SivaNagendra-sn,

To use the model retrained with the new paragraph vector files, the model_id occurrences in the notebook ("sherlock" in the attached screen shot) should be replaced with the identifier of the new model:

No changes should be made to the feature identifiers in the .tsv files. I hope that helps.

SivaNagendra-sn · 2022-04-05T14:53:27Z

Yeah, I have actually done that. While training the Sherlock model I have mentioned the model_id as "retrained_sherlock". While working with predict function also we are mentioning model_id as "retrained_sherlock". For test data it is giving the results with good accuracy. But testing with new data( i.e, extracting features with 'extract_features' function then using predict function with model_id mentioned as retrained_sherlock also) the predictions were totally wrong ☹️.

iganand · 2022-04-05T18:57:59Z

I have retraining and prediction working on new data but if it's mostly text type of fields. For numeric fields and length of size 12 or more it does not work well.. prediction vector returned is null even though classification score and output for test data looks good . Do you have any suggetions? @madelonhulsebos

madelonhulsebos · 2022-04-06T19:27:55Z

OK, that should be alright then @SivaNagendra-sn. Is your training data formatted exactly as the original training data (as downloaded through the data download)? The feature extraction pipeline expects "stringified" lists. The input data may be wrong in your case as well, @iganand.

iganand · 2022-04-07T18:17:31Z

I am getting null in prediction vector. Although classification report for that specific field looks to F1 score .87. What might be the reason

stranger-codebits mentioned this issue Sep 20, 2022

Nan probabilities prediction on datasets with (almost) constant data #56

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong predictions while testing new data #47

Wrong predictions while testing new data #47

SivaNagendra-sn commented Apr 4, 2022

madelonhulsebos commented Apr 4, 2022

SivaNagendra-sn commented Apr 5, 2022

madelonhulsebos commented Apr 5, 2022

SivaNagendra-sn commented Apr 5, 2022

iganand commented Apr 5, 2022 •

edited

Loading

madelonhulsebos commented Apr 6, 2022 •

edited

Loading

iganand commented Apr 7, 2022

Wrong predictions while testing new data #47

Wrong predictions while testing new data #47

Comments

SivaNagendra-sn commented Apr 4, 2022

madelonhulsebos commented Apr 4, 2022

SivaNagendra-sn commented Apr 5, 2022

madelonhulsebos commented Apr 5, 2022

SivaNagendra-sn commented Apr 5, 2022

iganand commented Apr 5, 2022 • edited Loading

madelonhulsebos commented Apr 6, 2022 • edited Loading

iganand commented Apr 7, 2022

iganand commented Apr 5, 2022 •

edited

Loading

madelonhulsebos commented Apr 6, 2022 •

edited

Loading