How does unsupervised machine learning work?

In supervised machine learning tasks, the data is assigned to some set of classes. For example, here we are given a dataset wherein each observation is a set of physical attributes of an object. In an supervised task, the object column acts as the labels. The algorithm then uses these existing separations in the data to develop criteria for classifying unknown observations in the data.

label	Height	Width	Color	Mass	Round ?
Apple	6cm	7cm	Red	330g	TRUE
Orange	6cm	7cm	Orange	330g	TRUE
Lemon	5cm	4cm	Yellow	150g	FALSE

In contrast, in an unsupervised machine learning task there either are no labels or that information is just treated as another attribute of the observation. In our fruit example, the object type is now just another characteristic of the observation, and often is altogether unknown:

object	Height	Width	Color	Mass	Round ?
Apple	6cm	7cm	Red	330g	TRUE
Orange	6cm	7cm	Orange	330g	TRUE
Lemon	5cm	4cm	Yellow	150g	FALSE

An unsupervised algorithm is not told how the data is structured or separated (barring parameter tuning); instead the algorithm goes looking for stucture and separation in the data.

Clustering algorithms aim to group the observations in the data into categories (classes) based on some notion of how similar the observations are to each other. For example, given a basket of fruit, a clustering algorithm tries to group what it thinks are apples together into one class, and what it thinks are oranges into another.

Dimension reduction techniques aim to decrease the number of rows and columns in a dataset based on some criteria such as which variables most separate the observations. For example, given the height, width, color, mass, and roundness of the fruit attributes, one dimension reduction algorithm will try to determine the minimum number of attributes needed to tell the fruit apart - can we tell it's an apple with just the mass and color?

Generally speaking, in an unsupervised task there is no existing labeling to compare the results of the algorithm to; instead we often evaluate reliability through repeated experiments, computing the odds of our data being generated by our model, and visualizations.

<<< Previous | Next >>>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unsupervised.md

unsupervised.md

How does unsupervised machine learning work?

Files

unsupervised.md

Latest commit

History

unsupervised.md

File metadata and controls

How does unsupervised machine learning work?