Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single Class Multiple Clusters #35

Open
TilakD opened this issue Feb 18, 2019 · 4 comments
Open

Single Class Multiple Clusters #35

TilakD opened this issue Feb 18, 2019 · 4 comments

Comments

@TilakD
Copy link

TilakD commented Feb 18, 2019

Hi @omoindrot I am utilizing your foundation code on a custom dataset and I'm getting multiple clusters for same class when used tsne to visualize. My embeddings are 128 dimensions. Am I doing something wrong or there might be a single cluster for each class and when dimension is reduced it is moving into 3 different cluster??

image

@omoindrot
Copy link
Owner

Hi @TilakD

It's pretty weird indeed, maybe this is because of your data?
For instance maybe you have data coming from three different sources (ex: grayscale images, RGB images and another type), so the embeddings are naturally clustered by type before class.

It's also possible that t-SNE is not perfectly clustering the data? See this paper for more on tSNE: https://distill.pub/2016/misread-tsne/

I would plot the different images in each cluster for a single class to understand what differentiates them.

@TilakD
Copy link
Author

TilakD commented Mar 4, 2019

Hi @omoindrot Thanks for the reply.

All the data are coming from the same source (RGB images). 7 classes contains combination of 3 different subject images. 3 clusters for each class indicate 3 subjects.

When I check intra cluster distance in 128 dimension, I'm getting very low value for each class. When I do the same in 2D/3D after tsne, intra cluster distance in huge. I confused as to why tsne is considering features of subjects along with features of classes.

Please let me know your thoughts.

@omoindrot
Copy link
Owner

I'm not sure what your exact data is, but consider this (related?) example: you have 3 people, and you ask them to take 7 different poses (standing up, sitting...).

Now you train embeddings with triplet loss according to the 7 poses.

Of course the embeddings will also reflect the 3 different people you use, because by default their embeddings will be different. So even if you train perfectly with triplet loss, each cluster will likely contain 3 different sub-clusters.

Even in face recognition, the cluster of a person can contain clusters (one where the person wears glasses, one where the person is older...).

@TilakD
Copy link
Author

TilakD commented Mar 5, 2019

Thanks a lot @omoindrot. Got my doubts clarified!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants