GitHub - PradeepThapa/nsl_kdd_classification: Cyber-attack classification in the network traffic database using NSL-KDD dataset

Cyber-attack classification in the network traffic database using NSL-KDD dataset

Classification is the process of dividing the data elements into specific classes based on their values. It is a type of supervised learning which means data are labelled. This project “Cyber-attack classification in the network traffic database” uses commonly used machine learning classification algorithms to solve the problem by identifying the normal network traffics and attack classes. We know there are 5 different classes to KDD dataset so this comes under the classification problem where classification algorithms can be used to identify different classes based on attributes. In this report, I will discuss the various classification algorithms like logistic regression, naive bayes, decision trees, random forests and many more. The performance of models will be evaluated with different performance measures such as accuracy, precision and recall. In addition, I will be comparing their performance and time complexity. I am using Macintosh with 2.9 GHz Quad-Core Intel Core i7 CPU and 16 GB RAM.

Summary of algorithms

Algorithm Accuracy Precision Recall F1 Time (s)

Random Forest 77% 82% 77% 73% 5.88
KNN 77% 82% 76% 73% 34.23
SVM 76% 81% 76% 73% 207.3
Logistic Regression 76% 79% 75% 71% 5.97
SGD 75% 81% 75% 70% 16.35
MLP (NN) 77% 75% 77% 74% 420.20
AdaBoost 68% 63% 67% 63% 6.61
XGBoost 77.1% 82% 77% 74% 494
Voting Classifier 77% 82% 77% 73% 30.1
Lightgbm 76% 84% 63% 72% 188
Catboost 81% 85% 81% 77% 71.2

KDD dataset has 5 different classes which are benign, dos, probe, r2l and u2r. All 5 classes are not balanced as benign and dos have far more samples than r2l and u2r which could be the issue related to oversampling. After trying several machine learning algorithms, Catboost came out on top with the 81% accuracy. Most of the other algorithms were around 78%. This could be because catboost uses unique gradient boosting technique to trees.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Data/NSL-KDD-Dataset		Data/NSL-KDD-Dataset
.gitignore		.gitignore
NSL_KDD_dataset_code.py		NSL_KDD_dataset_code.py
README.MD		README.MD
Report.pdf		Report.pdf
myplot.png		myplot.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

PradeepThapa/nsl_kdd_classification

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages