This repository contains the thesis I wrote for my postgraduate studies at the University of Warsaw Faculty of Economic Sciences Data Science program. This paper should be a major source of knowledge for anyone dealing with the customer churn prediction problem in practice. If this type of model is something you are working on in your work, then buckle up for three quite detailed chapters:
- The first chapter defines the churn prediction problem and establishes business constraints any algorithm has to follow in order to aid a customer retention campaign.
- A review of data preparation methods, classification algorithms and model evaluation metrics are presented. What is unique to this work is that it delves into churn specific literature to present tools specialized for this domain (such as Logit Leaf Model or Maximum Profit Criterion).
- The third part puts the above-mentioned methods to a test on a dataset provided by a SaaS B2B company. The results are remarkably interesting, especially when we start to consider churn-specific frameworks.
In the scripts folder, you can find the R code used to generate the experiment and charts from the third chapter.
Download here or from the list of files.
Client churn prediction became an increasingly important classification problem, as an offering-related competitive advantage is yielding its place to Customer Relationship Management for securing long-term growth. This study presents recent findings of churn prediction literature ordered by model creation phases and their application in SaaS B2B experiment. Results indicate that ways of data preprocessing devised by cited scholars offer improvement of up to 17% over standard approaches, validate novel Maximum Profit criterion (outputting direct profits from a model) as an evaluation metric, and highlight Logit Leaf Model featuring enchanted interpretability as a viable algorithm for churn prediction.