Data Exfiltration Detection System

A machine learning-powered system for detecting potential data exfiltration attempts in network traffic. This project uses Random Forest classification to identify suspicious network activities and provides a real-time monitoring interface.

Features

Real-time Detection: Analyze network traffic patterns in real-time
High Accuracy: 94% precision in detecting potential data exfiltration attempts
User-friendly Interface: Web-based dashboard for easy monitoring
Visualizations: Interactive charts showing traffic patterns and threat probabilities
Easy Integration: RESTful API for seamless integration with existing security systems

System Architecture

The system consists of three main components:

Data Generation & Training Module
- Simulates network traffic data
- Trains Random Forest model
- Performs feature engineering
Detection Engine
- Processes incoming network logs
- Makes real-time predictions
- Calculates threat probabilities
Web Interface
- Real-time monitoring dashboard
- Interactive visualizations
- Alert system

Installation

Prerequisites

Python 3.8+
pip
Virtual environment (recommended)

Setup

Clone the repository

git clone https://github.com/vishalsharma1987/data_exfiltration_detection.git
cd data_exfiltration_detection

Create and activate virtual environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies

pip install -r requirements.txt

Usage

Training the Model

python src/train_model.py

This will:

Generate synthetic network traffic data
Train the Random Forest model
Save the model and necessary files

Running the Application

python src/app.py

Access the web interface at: http://localhost:5000

Using the Dashboard

Input network log data:
- Source/Destination IP
- Ports
- Protocol
- Bytes transferred
- Action taken
View Results:
- Real-time prediction
- Threat probability
- Historical trend chart

Model Performance

Our model achieves the following metrics:

Precision: 0.94
Recall: 0.92
F1-Score: 0.93
Accuracy: 0.99

Top predictive features:

Bytes sent
Destination port
Time of day
Protocol type
Source port

API Reference

Prediction Endpoint

POST /predict

Request body:

{
    "timestamp": "2023-11-02T10:30:00",
    "source_ip": "192.168.1.100",
    "destination_ip": "203.0.113.1",
    "source_port": 54321,
    "destination_port": 443,
    "protocol": "TCP",
    "bytes_sent": 1500,
    "bytes_received": 500,
    "action": "allow"
}

Response:

{
    "prediction": 0,
    "probability": 0.12
}

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the project
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

Vishal Sharma - your.email@example.com

Project Link: https://github.com/vishalsharma1987/data_exfiltration_detection

Acknowledgments

Scikit-learn team for their excellent machine learning library
Flask team for the web framework
All contributors who spend time to help improve this project

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
models		models
src		src
templates		templates
tests		tests
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Exfiltration Detection System

Table of Contents

Features

System Architecture

Installation

Prerequisites

Setup

Usage

Training the Model

Running the Application

Using the Dashboard

Model Performance

API Reference

Prediction Endpoint

Contributing

License

Contact

Acknowledgments

About

Releases

Packages

Languages

cyberdefense42/data_exfiltration_detection

Folders and files

Latest commit

History

Repository files navigation

Data Exfiltration Detection System

Table of Contents

Features

System Architecture

Installation

Prerequisites

Setup

Usage

Training the Model

Running the Application

Using the Dashboard

Model Performance

API Reference

Prediction Endpoint

Contributing

License

Contact

Acknowledgments

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages