Skip to content

Latest commit

 

History

History
32 lines (20 loc) · 2.58 KB

unsupervised.md

File metadata and controls

32 lines (20 loc) · 2.58 KB

<<< Previous | Next >>>

How does unsupervised machine learning work?

In supervised machine learning tasks, the data is assigned to some set of classes. For example, here we are given a dataset wherein each observation is a set of physical attributes of an object. In an supervised task, the object column acts as the labels. The algorithm then uses these existing separations in the data to develop criteria for classifying unknown observations in the data.

label Height Width Color Mass Round ?
Apple 6cm 7cm Red 330g TRUE
Orange 6cm 7cm Orange 330g TRUE
Lemon 5cm 4cm Yellow 150g FALSE

In contrast, in an unsupervised machine learning task there either are no labels or that information is just treated as another attribute of the observation. In our fruit example, the object type is now just another characteristic of the observation, and often is altogether unknown:

object Height Width Color Mass Round ?
Apple 6cm 7cm Red 330g TRUE
Orange 6cm 7cm Orange 330g TRUE
Lemon 5cm 4cm Yellow 150g FALSE

An unsupervised algorithm is not told how the data is structured or separated (barring parameter tuning); instead the algorithm goes looking for stucture and separation in the data.

Clustering algorithms aim to group the observations in the data into categories (classes) based on some notion of how similar the observations are to each other. For example, given a basket of fruit, a clustering algorithm tries to group what it thinks are apples together into one class, and what it thinks are oranges into another.

Dimension reduction techniques aim to decrease the number of rows and columns in a dataset based on some criteria such as which variables most separate the observations. For example, given the height, width, color, mass, and roundness of the fruit attributes, one dimension reduction algorithm will try to determine the minimum number of attributes needed to tell the fruit apart - can we tell it's an apple with just the mass and color?

Generally speaking, in an unsupervised task there is no existing labeling to compare the results of the algorithm to; instead we often evaluate reliability through repeated experiments, computing the odds of our data being generated by our model, and visualizations.

algorithms_cheatsheet

<<< Previous | Next >>>