An implementation of K-means algorithm using Spark MLlib and Scala
-
Updated
May 30, 2020 - Scala
An implementation of K-means algorithm using Spark MLlib and Scala
Solving Kaggle Titanic with Pyspark libraries
Big Data Analytics Project using Apache Spark for Predicting Severity of Car Accidents in the USA
Intra-course Homeworks and final homework for Big Data Engineering course. Include KPMG Hackaton 'University Trends' documentation
User, Event, and Predictive Metric Dashboard on 2GB/month of log files from Brackets IDE
An easy to use Snowflake-based text clustering or LLM, tool/framework
Predicting the Song Download number, given Artist name and Title of the Song
EverAnalyzer is my thesis in the Department of Digital Systems of the University of Piraeus. EverAnalyzer is a platform for collecting, preprocessing, processing and analyzing Big Data from the Twitter platform.
Developing a basic protoype of distributed computing engine for processing of EEG data
Created a SparkML RandomForest model to predict total employee compensation. Queried data with SparkSQL, ran PySpark scripts to run EDA, pre-process data, and train model achieving with 0.98 R2 score.
This is a repository i have created to put up some of the knowledge i have gained around Big Data Technologies especially Spark, GraphX etc.
SparkMLib ALS(Writed by Scala&Java) used in commodity recommendation system
Add a description, image, and links to the sparkmllib topic page so that developers can more easily learn about it.
To associate your repository with the sparkmllib topic, visit your repo's landing page and select "manage topics."