This repository is the implementation of a final project in Projects with the industry course taken by Guy Freund & Rotem Shalev from The department of Computer Science, Reichman University, Israel.
This project is jointly guided by the Reichman University and the Nucleon Cyber company.
The project's goal is to create a classifier that will provide a prediction on whether an IP address will attack again or not.
The data that is being used was given by the Nucleon Cyber company.
The structure of the data we use is a json, where each entry in it is part of a "Session" representing an attack.
We aggregate each attack session into a single entry, preprocess it using the preprocessor,
and label the data such that each ip address that attacks more than once, gets a label of 1, or 0 otherwise.
Pipfile
&Pipfile.lock
- Virtual environment, optional.requirements.txt
- The requirements file.constans.py
- Shared constants file.utils.py
- Shared functions file.finalized_model.joblib
- The weights of the final model (after training).preprocessor.py
- A class that represents a Preprocessor object (used for processing raw data).train.py
- A Python script used to train the model from scracth (raw data to final model).predict.py
- A Python script used to give a prediction on a single example (database entry).EDA.ipynb
- A Jupyter Notebook that contains all the Exploratory Data Analysis and Model Selection (with plots).
- Install Python (version>=3.8.0)
- Run:
pip install -r requirements.txt
orpipenv shell
- For training run:
python3 train.py -p <path_to_json>
(add -sm, -sd if you want to change the defaults, see train.py for more information) - For prediction using existing model run:
python3 predict.py -p <path_to_json>
(add -sd true if you want to save the processed data)