Skip to content

Here I attempt to learn to manipulate large and realistic datasets with Spark to engineer relevant features for predicting churn.

Notifications You must be signed in to change notification settings

abdulelahsm/SparkifyChurn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

predicting_churn_with_spark

Here I attempt to learn to manipulate large and realistic datasets with Spark to engineer relevant features for predicting churn.

I used Spark MLlib to build machine learning models with large datasets, far beyond what could be done with non-distributed technologies like scikit-learn.

Predicting churn rates is a challenging and common problem that data scientists and analysts regularly encounter in any customer-facing business. Additionally, the ability to efficiently manipulate large datasets with Spark is one of the highest-demand skills in the field of data.

About

Here I attempt to learn to manipulate large and realistic datasets with Spark to engineer relevant features for predicting churn.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published