Deep embedded clustering is a machine learning technique that combines the strengths of deep learning and clustering to automatically group data points based on their intrinsic similarities. Here's the gist:
Deep learning extracts high-dimensional, meaningful representations (embeddings) from your data, capturing underlying patterns and relationships. Imagine it as finding a more informative way to describe each data point beyond its raw features. Clustering groups similar data points together based on these embeddings. Think of it as organizing your data points into meaningful categories based on their extracted features.
A. The original data points can be anything. In this example, they are images of handwritten digits.
B. The embeddings are lower-dimensional representations of the data points. They are created by a deep neural network.
C. The clusters are formed by grouping together similar embeddings. In this example, the clusters correspond to the different digits.
Deep embedded clustering is a powerful technique that can be used to cluster data points based on their similarities. It is a valuable tool for data analysis and machine learning.
usage: train.py [-h] [--batch-size BATCH_SIZE] [--gpu-index GPU_INDEX]
optional arguments:
-h, --help show this help message and exit
--batch-size BATCH_SIZE
Train Batch Size
--gpu-index GPU_INDEX
GPU Index Number
The inference.py
returns the latent representation (z.tsv
, meta.tsv
(label information).
usage: inference.py [-h] [--gpu-index GPU_INDEX]
optional arguments:
-h, --help show this help message and exit
--gpu-index GPU_INDEX
GPU Index Number
For visualization, we use t-SNE by importing z.tsv
, meta.tsv
into Tensorboard.
The visualization using MNIST shows as follow.