Dataset can be obtained from https://webscope.sandbox.yahoo.com/catalog.php?datatype=r&did=49
The project is mainly to demonstrate the performance in terms of convergence for Random Initialisation and K++ for K-Means Algorithm.
K-Means is a clustering algorithm commonly used in unsupervised learning.
It aims at partitioning the dataset into K partitions.
- The cluster centroids are picked at random from the data instances.
-
Picks points that are as far away as possible
-
This helps in picking points in a smarter way