Skip to content

Implementation of the CCF Algorithm to find connected components in a graph, 1st year project of the X-HEC DS Master

Notifications You must be signed in to change notification settings

CharlesDc9/Graph-Components-Pyspark-Scala

Repository files navigation

Graph Connected Components Finder

Overview

This repository contains implementations of the Connected Components Finding (CCF) algorithm in both PySpark and Scala. The algorithm efficiently identifies connected components within large-scale graphs using Apache Spark's RDD (Resilient Distributed Dataset) framework. You can find enclosed our report with the results of the implementation and the link to the original research paper at the core of the project.

Implementations

  • PySpark implementation (through Databricks)
  • Scala implementation (through Databricks)

Algorithm

The Connected Components Finding (CCF) algorithm is used to identify groups of vertices that are connected to each other in an undirected graph. Two vertices are in the same connected component if there exists a path between them.

About

Implementation of the CCF Algorithm to find connected components in a graph, 1st year project of the X-HEC DS Master

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages