BigData-KMeans-Explorations 📊

Project Overview 🌟

This repository reflects a pivotal element of my academic exploration in Machine Learning for Data Science M2 MLDS/AMSD master. It provides a deep dive into the k-means clustering algorithm through sequential, streaming, and distributed processing paradigms. These implementations embody the challenges and solutions encountered in large-scale data analysis, a testament to the hands-on, problem-solving approach fostered through my studies.

Implementation Highlights

Sequential k-means in Python: Generates a synthetic dataset and implements k-means to classify data points with minimal memory footprint.
Streaming k-means in Python: Adapts k-means for data streams, enabling dynamic cluster updates without the need to reprocess the entire dataset when new data arrives.
Distributed k-means with Apache Beam: Scales k-means to work with massive datasets that exceed single-machine memory capabilities, using Apache Beam for efficient parallel processing.

Table of Contents

Installation
Usage
Contributing
License

Installation

Provide instructions on setting up the project locally. For example:

git clone https://github.com/yourusername/BigData-KMeans-Explorations.git
cd BigData-KMeans-Explorations
pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
2023-2024-m2-mlds-big-data-Projet-tmp.pdf		2023-2024-m2-mlds-big-data-Projet-tmp.pdf
Projet_Big_Data.ipynb		Projet_Big_Data.ipynb
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BigData-KMeans-Explorations 📊

Installation

About

Releases

Packages

Languages

AbirOumghar/BigData-KMeans-Explorations

Folders and files

Latest commit

History

Repository files navigation

BigData-KMeans-Explorations 📊

Installation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages