Skip to content

Data privacy - classification performance after application of k-Anonymity

Notifications You must be signed in to change notification settings

mn9891/classification-performance-k-anonymity

Repository files navigation

Privacy-Preserving Data Mining in Banking Applications

Classification performance with k-Anonymity

Mini Project 4 - Applied Machine Learning [COMP-551] - McGill - Winter 2017

  • The jupyter notebook summarizes the second part of my participation in the 4th mini project. It presents the classifiaction algorithms for evaluation of performance before and after applying anonymization to the data.
  • K-Anonymity has been applied in the first part using freeware software, ARX which could be downloaded from here, please refer to the report and the presentation for further details on the anonymization operation
  • Anononymized data could be found in csv format for diffenet values of k in k_anonymity folder.
  • I am testing out 3 common machine learning algorithms, namely Logistic regression, Naive Bayes and Random Forest, for data classification before and after applying data privacy technique k-Anonymity. For this purpose, the dataset is divided into train and test sets. Stratified sampling is used as our target value is unbalanced. In each set, I am maintain the ratio of zeros over ones, the same ratio as it is in the full dataset (About 3/4). Accuracy (ACC), area-under-curve (AUC), Precision (PRE) and Recall (REC) are used as performance metrics.

About

Data privacy - classification performance after application of k-Anonymity

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published