Overfitting with CNN model? #13

datistiquo · 2019-08-04T12:54:37Z

Hey,

I try the CNN model for my own data and I don't know what is going on there. I really hope you can get me some advices.

I use the model for sentences Matching for IR. I get good reuslts for the trained data but for out of scope I get very high confidences with not related sentences. Even for an empty string I get confidences of 1 for several sentences!

I have not so much data so I do augmenation. Do you have any recipe for the augmenation?

Thank you!

tlatkowski · 2019-08-18T12:40:39Z

Hi @datistiquo ,

sorry for the late response,
have you tried any regularization techniques?
and have you faced with overfitting only for CNNs?

Looking into the model configuration you can see that the dropout is disabled by default for CNNs,
During the implementation i was not sure if dropout is a good regularization technique for this kinds of models (siamese-nets) so i disabled it by default. It is also possible that dropout can be useful but only for specific layers but i haven't investigated it.

The second important think that comes to my mind is the maximum length of training sequence. Imagine situation when you have a small training dataset and one or only several sentences are very long, like 50 tokens and the rest sentences are short (also those from tests). In this case short sentences are padded by a lot of placeholder tokens and it can be a strong signal in making the final decision. This area is also worth investigating.

I hope it will help,
BR
Tomasz

datistiquo · 2019-10-18T13:09:37Z

I will check this.

I also think that the margin plays a huge role with contrastive loss.

Actually, have you normalized your word vectors before input? Maybe that is my issue too since I have not normalized them. maybe I try this out.

Frank-Sin99 · 2019-10-20T17:26:09Z

Right now I use a simple MSE or simple contrastive loss. But I feel that I need to do a pairwise or triplet or even a listwise loss to do better?

Also, my metric to evaluae is just precision but ranking metric like precision at k is more reasonable for IR I think!

Frank-Sin99 · 2019-11-03T16:55:44Z

Hey @tlatkowski Why are you using in your CNN Network just the distance as output? Have you tried feeding the distance to a sigmoid layer? Or instead of using distance using directly the sigmoid layer?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overfitting with CNN model? #13

Overfitting with CNN model? #13

datistiquo commented Aug 4, 2019

tlatkowski commented Aug 18, 2019

datistiquo commented Oct 18, 2019

Frank-Sin99 commented Oct 20, 2019

Frank-Sin99 commented Nov 3, 2019

Overfitting with CNN model? #13

Overfitting with CNN model? #13

Comments

datistiquo commented Aug 4, 2019

tlatkowski commented Aug 18, 2019

datistiquo commented Oct 18, 2019

Frank-Sin99 commented Oct 20, 2019

Frank-Sin99 commented Nov 3, 2019