This project applies various data mining techniques to analyze internet firewall data collected from a university's traffic records.
-
Preprocessing
- Data cleaning and preparation
- Principal Component Analysis (PCA) for dimensionality reduction
-
Association Rules Mining
- Discovering interesting relationships in the data
- Implemented using the Apriori algorithm in IBM SPSS Modeler
-
Classification
- Decision Tree algorithm
- K-Nearest Neighbors (KNN) algorithm
-
Clustering
- K-means clustering
- Agglomerative clustering
├── association rules
│ ├── association_rules.str
│ └── preprocessed_cat.csv
├── classification
│ ├── decision_tree.ipynb
│ ├── KNN_cat.ipynb
│ └── KNN.ipynb
├── clustering
│ ├── agglomerative.ipynb
│ └── kmeans.ipynb
├── dataset
│ ├── clustering.csv
│ ├── dataset.csv
│ ├── pca.csv
│ ├── preprocessed_cat.csv
│ └── preprocessed.csv
├── preprocessing
│ ├── PCA.ipynb
│ └── preprocessing.ipynb
└── reports/
- Python
- Jupyter Notebooks
- Scikit-learn for machine learning algorithms
- Pandas and NumPy for data manipulation
- Matplotlib and Seaborn for data visualization
Findings and insights can be found in the Report.pdf
file in the reports
directory.