KMeans

This repository contains a C++ implementation of the K-Means clustering algorithm parallelized using OpenMP. K-Means is a popular unsupervised machine learning algorithm used for clustering data points into a predefined number of clusters. Parallelizing the algorithm using OpenMP allows for significant speedup on multi-core processors.

Getting Started

Prerequisites

This implementation requires the following dependencies:

C++ compiler with OpenMP support (e.g., g++)
The OpenMP library

Compilation

Clone this repository to your local machine.
Navigate to the repository's directory.
Run g++ main.cpp -o kmean -fopenmp -march=native

Usage

Input

The algorithm takes as input a csv file containing one observation in each row. The feature must be separated by commas. Some inputs generated with dataset_gen.py are provided.

Running the Program

To run use ./kmean <dataset_file_name> <number_of_clusters> <thread_number> <algorithm_type>

<dataset_file_name> path of the dataset as a csv file
<number_of_clusters> number of clusters
<thread_number> number of thread to be used
<algorithm_type> can be: rand or pp for random or kmeans pp initializer respectivly

For example: ./kmean 100000_3_6.csv 6 16 rand

Generate datasets

The dataset_gen.py module allows the user to create custom datasets and visualize the result of the kmeans algorithm. To use run python3 dataset_gen.pyand follow the instructions.

Results

The result of the kmean algorithm is shown below

Raw Data	After kmeans

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.idea		.idea
.vscode		.vscode
data		data
images		images
100000_3_10.csv		100000_3_10.csv
10000_3_10.csv		10000_3_10.csv
1000_3_10.csv		1000_3_10.csv
50000_3_10.csv		50000_3_10.csv
5000_3_10.csv		5000_3_10.csv
K-means.odp		K-means.odp
K_means_clustering_with_OpenMP.pdf		K_means_clustering_with_OpenMP.pdf
LICENSE.txt		LICENSE.txt
Point.h		Point.h
README.md		README.md
dataset_gen.py		dataset_gen.py
initializer.cpp		initializer.cpp
kmean		kmean
kmeans.cpp		kmeans.cpp
main.cpp		main.cpp
output.csv		output.csv
profiler.py		profiler.py
results.txt		results.txt
utils.cpp		utils.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KMeans

Getting Started

Prerequisites

Compilation

Usage

Input

Running the Program

Generate datasets

Results

About

Releases

Packages

Languages

License

DragosTana/kmeans

Folders and files

Latest commit

History

Repository files navigation

KMeans

Getting Started

Prerequisites

Compilation

Usage

Input

Running the Program

Generate datasets

Results

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages