- The jupyter notebook summarizes the second part of my participation in the 4th mini project. It presents the classifiaction algorithms for evaluation of performance before and after applying anonymization to the data.
- K-Anonymity has been applied in the first part using freeware software, ARX which could be downloaded from here, please refer to the report and the presentation for further details on the anonymization operation
- Anononymized data could be found in csv format for diffenet values of k in k_anonymity folder.
- I am testing out 3 common machine learning algorithms, namely Logistic regression, Naive Bayes and Random Forest, for data classification before and after applying data privacy technique k-Anonymity. For this purpose, the dataset is divided into train and test sets. Stratified sampling is used as our target value is unbalanced. In each set, I am maintain the ratio of zeros over ones, the same ratio as it is in the full dataset (About 3/4). Accuracy (ACC), area-under-curve (AUC), Precision (PRE) and Recall (REC) are used as performance metrics.
-
Notifications
You must be signed in to change notification settings - Fork 0
mn9891/classification-performance-k-anonymity
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Data privacy - classification performance after application of k-Anonymity
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published