Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overfitting with CNN model? #13

Open
datistiquo opened this issue Aug 4, 2019 · 4 comments
Open

Overfitting with CNN model? #13

datistiquo opened this issue Aug 4, 2019 · 4 comments

Comments

@datistiquo
Copy link

Hey,

I try the CNN model for my own data and I don't know what is going on there. I really hope you can get me some advices.

I use the model for sentences Matching for IR. I get good reuslts for the trained data but for out of scope I get very high confidences with not related sentences. Even for an empty string I get confidences of 1 for several sentences!

I have not so much data so I do augmenation. Do you have any recipe for the augmenation?

Thank you!

@tlatkowski
Copy link
Owner

Hi @datistiquo ,

sorry for the late response,
have you tried any regularization techniques?
and have you faced with overfitting only for CNNs?

Looking into the model configuration you can see that the dropout is disabled by default for CNNs,
During the implementation i was not sure if dropout is a good regularization technique for this kinds of models (siamese-nets) so i disabled it by default. It is also possible that dropout can be useful but only for specific layers but i haven't investigated it.

The second important think that comes to my mind is the maximum length of training sequence. Imagine situation when you have a small training dataset and one or only several sentences are very long, like 50 tokens and the rest sentences are short (also those from tests). In this case short sentences are padded by a lot of placeholder tokens and it can be a strong signal in making the final decision. This area is also worth investigating.

I hope it will help,
BR
Tomasz

@datistiquo
Copy link
Author

I will check this.

I also think that the margin plays a huge role with contrastive loss.

Actually, have you normalized your word vectors before input? Maybe that is my issue too since I have not normalized them. maybe I try this out.

@Frank-Sin99
Copy link

Right now I use a simple MSE or simple contrastive loss. But I feel that I need to do a pairwise or triplet or even a listwise loss to do better?

Also, my metric to evaluae is just precision but ranking metric like precision at k is more reasonable for IR I think!

@Frank-Sin99
Copy link

Hey @tlatkowski Why are you using in your CNN Network just the distance as output? Have you tried feeding the distance to a sigmoid layer? Or instead of using distance using directly the sigmoid layer?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants