This repository contains implementations of the Connected Components Finding (CCF) algorithm in both PySpark and Scala. The algorithm efficiently identifies connected components within large-scale graphs using Apache Spark's RDD (Resilient Distributed Dataset) framework. You can find enclosed our report with the results of the implementation and the link to the original research paper at the core of the project.
- PySpark implementation (through Databricks)
- Scala implementation (through Databricks)
The Connected Components Finding (CCF) algorithm is used to identify groups of vertices that are connected to each other in an undirected graph. Two vertices are in the same connected component if there exists a path between them.