Skip to content

Collect similar looking data points into groups of clusters with Tensorflow implementation of "Unsupervised Deep Embedded for Clustering Analysis"

Notifications You must be signed in to change notification settings

lamthienphuc/Clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep Embedded Clustering (DEC)

Introduction

Deep embedded clustering is a machine learning technique that combines the strengths of deep learning and clustering to automatically group data points based on their intrinsic similarities. Here's the gist:

Deep learning extracts high-dimensional, meaningful representations (embeddings) from your data, capturing underlying patterns and relationships. Imagine it as finding a more informative way to describe each data point beyond its raw features. Clustering groups similar data points together based on these embeddings. Think of it as organizing your data points into meaningful categories based on their extracted features.

1-s2 0-S0167865521002816-gr2

A. The original data points can be anything. In this example, they are images of handwritten digits.

B. The embeddings are lower-dimensional representations of the data points. They are created by a deep neural network.

C. The clusters are formed by grouping together similar embeddings. In this example, the clusters correspond to the different digits.

Deep embedded clustering is a powerful technique that can be used to cluster data points based on their similarities. It is a valuable tool for data analysis and machine learning.

Training

usage: train.py [-h] [--batch-size BATCH_SIZE] [--gpu-index GPU_INDEX]

optional arguments:
  -h, --help            show this help message and exit
  --batch-size BATCH_SIZE
                        Train Batch Size
  --gpu-index GPU_INDEX
                        GPU Index Number

Visualized

The inference.py returns the latent representation ($z$), and exports the z.tsv, meta.tsv (label information).

usage: inference.py [-h] [--gpu-index GPU_INDEX]

optional arguments:
  -h, --help            show this help message and exit
  --gpu-index GPU_INDEX
                        GPU Index Number

For visualization, we use t-SNE by importing z.tsv, meta.tsv into Tensorboard. The visualization using MNIST shows as follow.

About

Collect similar looking data points into groups of clusters with Tensorflow implementation of "Unsupervised Deep Embedded for Clustering Analysis"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages