Arvato Customer Segmentation Project

Dependencies

Description

This Project is part of Data Science Nanodegree Program by Udacity in collaboration with Arvato Bertelsmann.

The Project is divided in the following Sections:

Customer Segmentation Report: In this section,the unsupervised learning technique is used to identify few characteristics for company's existing customers compared to the general population of Germany.
Supervised Learning Model: In this section, supervised Learning model is used to investigate mailout_train and mailout_test dataset to predict which individuals are most likely to respond to a mailout campaign.
Kaggle Competition: After chosing the best model, the results submitted to kaggle competition.

Data files

azdias: demographics data for the general population of Germany; 891 211 persons (rows) x 366 features.
customers: demographics data for customers of a mail-order company; 191 652 persons (rows) x 369 features.
mailout_train: Demographics data for individuals who were targets of a marketing campaign; 42982 persons and 367 features including response of people.
mailout_test: Demographics data for individuals who were targets of a marketing campaign; 42833 persons and 366 features.

There are two more files which describes the attributes and its values. But the main datasets files are not available because of privacy of Arvato comapny's data.

Project Motivation

The main goal of this project is to characterize the customer segment of the population, and to build a model that will be able to predict customers for Arvato Financial Solutions

File Description

There are mainly two Notebooks available,

• Arvato Project Customer Segmentation Report.ipynb : It includes Data analysis and Unsurvised learning techinques to compare general population to the company's customers.

• Arvato Project ML prediction.ipynb : It includes Supervised learning techniques to predict which individuals are most likely to respond to a mailout campaign.

And two python files,

• cleaning.py : It describes the data preprocessing and cleaning functions of azdias and customers dataset and unsupervised learning function.

• ml.py : It describles the data preprocessing and cleaning functions of mailout_train and mailout_test dataset and model evaluation functions.

Results

The main findings of the code can be found at this Customer Segemnetaion Report available here.

Licensing, Authors, Acknowledgements

Udacity for providing such a Amazing project
Arvato Bertelsmann for providing datasets

References

https://www.dataschool.io/roc-curves-and-auc-explained/

https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html#sphx-glr-auto-examples-model-selection-plot-roc-py

https://www.kaggle.com/alexisbcook/titanic-tutorial

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
data		data
segmentation report part 1		segmentation report part 1
supervised model part 2		supervised model part 2
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Arvato Customer Segmentation Project

Table of Contents

Dependencies

Description

Data files

Project Motivation

File Description

Results

Licensing, Authors, Acknowledgements

References

About

Releases

Packages

Languages

poojapatel26/Arvato-Bertelsmann-Customer-Segmentation-Project

Folders and files

Latest commit

History

Repository files navigation

Arvato Customer Segmentation Project

Table of Contents

Dependencies

Description

Data files

Project Motivation

File Description

Results

Licensing, Authors, Acknowledgements

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages