Skip to content

anmol-anand/k-means

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

K-means

Overview

This project contains the implementation of Lloyd's k-means algorithm, which is used for clustering data into K distinct clusters.

Link to Project Report

Experiment Details

Lloyd's algorithm is run using two different initialization methods of cluster centroids: D2 sampling initialization (green line plots) and Metropolis Hastings initialization (red line plots). The experiment is conducted with three different values of K: 10, 100, and 500. When using Metropolis Hastings initialization, the experiment varies in terms of the lengths of the Markov chain, which is represented on the x-axis of the plots.

Results

The results reveal that the D2 Sampling approach consistently outperforms Metropolis Hastings in terms of accuracy. Performance is measured on the y-axis, which represents the sum of squared distances of samples from their respective cluster centroids. Smaller values on the y-axis indicate better performance.

Figure 1: K=10
Experiment 1: K=10

Figure 1: K=100
Experiment 2: K=100

Figure 1: K=500
Experiment 3: K=500

Running the Code

To run the code, follow these steps:

  1. Clone this repository locally
  2. Change directory to the cloned repository
  3. Set the desired value of K by modifying NUM_CLUSTERS variable in utils.py
  4. Run the python script k_means.py

Once the script completes, a plot like the ones above will be generated.

Note: You might have to install the necessary dependencies like NumPy, Matplotlib, etc.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages