Skip to content

DragosTana/kmeans

Repository files navigation

KMeans

This repository contains a C++ implementation of the K-Means clustering algorithm parallelized using OpenMP. K-Means is a popular unsupervised machine learning algorithm used for clustering data points into a predefined number of clusters. Parallelizing the algorithm using OpenMP allows for significant speedup on multi-core processors.

Getting Started

Prerequisites

This implementation requires the following dependencies:

  • C++ compiler with OpenMP support (e.g., g++)
  • The OpenMP library

Compilation

  1. Clone this repository to your local machine.
  2. Navigate to the repository's directory.
  3. Run g++ main.cpp -o kmean -fopenmp -march=native

Usage

Input

The algorithm takes as input a csv file containing one observation in each row. The feature must be separated by commas. Some inputs generated with dataset_gen.py are provided.

Running the Program

To run use ./kmean <dataset_file_name> <number_of_clusters> <thread_number> <algorithm_type>

  • <dataset_file_name> path of the dataset as a csv file
  • <number_of_clusters> number of clusters
  • <thread_number> number of thread to be used
  • <algorithm_type> can be: rand or pp for random or kmeans pp initializer respectivly

For example: ./kmean 100000_3_6.csv 6 16 rand

Generate datasets

The dataset_gen.py module allows the user to create custom datasets and visualize the result of the kmeans algorithm. To use run python3 dataset_gen.pyand follow the instructions.

Results

The result of the kmean algorithm is shown below

Raw Data After kmeans

Releases

No releases published

Packages

No packages published