KMeans

This repository contains a C++ implementation of the K-Means clustering algorithm parallelized using OpenMP. K-Means is a popular unsupervised machine learning algorithm used for clustering data points into a predefined number of clusters. Parallelizing the algorithm using OpenMP allows for significant speedup on multi-core processors.

Getting Started

Prerequisites

This implementation requires the following dependencies:

C++ compiler with OpenMP support (e.g., g++)
The OpenMP library

Compilation

Clone this repository to your local machine.
Navigate to the repository's directory.
Run g++ main.cpp -o kmean -fopenmp -march=native

Usage

Input

The algorithm takes as input a csv file containing one observation in each row. The feature must be separated by commas. Some inputs generated with dataset_gen.py are provided.

Running the Program

To run use ./kmean <dataset_file_name> <number_of_clusters> <thread_number> <algorithm_type>

<dataset_file_name> path of the dataset as a csv file
<number_of_clusters> number of clusters
<thread_number> number of thread to be used
<algorithm_type> can be: rand or pp for random or kmeans pp initializer respectivly

For example: ./kmean 100000_3_6.csv 6 16 rand

Generate datasets

The dataset_gen.py module allows the user to create custom datasets and visualize the result of the kmeans algorithm. To use run python3 dataset_gen.pyand follow the instructions.

Results

The result of the kmean algorithm is shown below

Raw Data	After kmeans

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

KMeans

Getting Started

Prerequisites

Compilation

Usage

Input

Running the Program

Generate datasets

Results

Files

README.md

Latest commit

History

README.md

File metadata and controls

KMeans

Getting Started

Prerequisites

Compilation

Usage

Input

Running the Program

Generate datasets

Results