Privacy-Preserving Data Mining in Banking Applications

Classification performance with k-Anonymity

The jupyter notebook summarizes the second part of my participation in the 4th mini project. It presents the classifiaction algorithms for evaluation of performance before and after applying anonymization to the data.
K-Anonymity has been applied in the first part using freeware software, ARX which could be downloaded from here, please refer to the report and the presentation for further details on the anonymization operation
Anononymized data could be found in csv format for diffenet values of k in k_anonymity folder.
I am testing out 3 common machine learning algorithms, namely Logistic regression, Naive Bayes and Random Forest, for data classification before and after applying data privacy technique k-Anonymity. For this purpose, the dataset is divided into train and test sets. Stratified sampling is used as our target value is unbalanced. In each set, I am maintain the ratio of zeros over ones, the same ratio as it is in the full dataset (About 3/4). Accuracy (ACC), area-under-curve (AUC), Precision (PRE) and Recall (REC) are used as performance metrics.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Raylene_hui_code		Raylene_hui_code
data		data
.gitattributes		.gitattributes
Mini_Project_4_k_anonymity.ipynb		Mini_Project_4_k_anonymity.ipynb
Project_4_updated.pdf		Project_4_updated.pdf
Report.pdf		Report.pdf
readme.md		readme.md