NLP Model Performance #649

filipposkar · 2024-05-15T16:53:18Z

filipposkar
May 15, 2024

Ok, I have this problem.
I created a deep learning NLP address parsing model and saved it.
I am using a dataset of 32.000 addresses.
The accuracy is about 96%.
I want to train it with more data.
I prepared a new dataset (2.800 addresses) and trained the model further (all layers trainable). The new dataset contains significantly different street names and municipalities from the first one. I mean that the new street names and municipalities names do not exist in the original dataset.
Accuracy improves by 2%.
I tested specific addresses with no success.
I then train the model with both datasets (merged).
Accuracy is the same as the first time.
I tested the specific addresses with 100% success.
The question is why is this happening?
I guess, the first time the model learns a pattern and the tries to improve the already learned pattern. The second time the model learns a different pattern, slightly better than the first time, since it has more data to train on.
Is that correct?
Any help to realize why this is happening is greatly appreciated.
Thanks
Phil

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NLP Model Performance #649

{{title}}

Replies: 0 comments

Select a reply

NLP Model Performance #649

filipposkar May 15, 2024

Replies: 0 comments

filipposkar
May 15, 2024