A machine learning-powered system for detecting potential data exfiltration attempts in network traffic. This project uses Random Forest classification to identify suspicious network activities and provides a real-time monitoring interface.
- Features
- System Architecture
- Installation
- Usage
- Model Performance
- API Reference
- Contributing
- License
- Contact
- Real-time Detection: Analyze network traffic patterns in real-time
- High Accuracy: 94% precision in detecting potential data exfiltration attempts
- User-friendly Interface: Web-based dashboard for easy monitoring
- Visualizations: Interactive charts showing traffic patterns and threat probabilities
- Easy Integration: RESTful API for seamless integration with existing security systems
The system consists of three main components:
-
Data Generation & Training Module
- Simulates network traffic data
- Trains Random Forest model
- Performs feature engineering
-
Detection Engine
- Processes incoming network logs
- Makes real-time predictions
- Calculates threat probabilities
-
Web Interface
- Real-time monitoring dashboard
- Interactive visualizations
- Alert system
- Python 3.8+
- pip
- Virtual environment (recommended)
- Clone the repository
git clone https://github.com/vishalsharma1987/data_exfiltration_detection.git
cd data_exfiltration_detection
- Create and activate virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies
pip install -r requirements.txt
python src/train_model.py
This will:
- Generate synthetic network traffic data
- Train the Random Forest model
- Save the model and necessary files
python src/app.py
Access the web interface at: http://localhost:5000
-
Input network log data:
- Source/Destination IP
- Ports
- Protocol
- Bytes transferred
- Action taken
-
View Results:
- Real-time prediction
- Threat probability
- Historical trend chart
Our model achieves the following metrics:
- Precision: 0.94
- Recall: 0.92
- F1-Score: 0.93
- Accuracy: 0.99
Top predictive features:
- Bytes sent
- Destination port
- Time of day
- Protocol type
- Source port
POST /predict
Request body:
{
"timestamp": "2023-11-02T10:30:00",
"source_ip": "192.168.1.100",
"destination_ip": "203.0.113.1",
"source_port": 54321,
"destination_port": 443,
"protocol": "TCP",
"bytes_sent": 1500,
"bytes_received": 500,
"action": "allow"
}
Response:
{
"prediction": 0,
"probability": 0.12
}
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the project
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
Vishal Sharma - your.email@example.com
Project Link: https://github.com/vishalsharma1987/data_exfiltration_detection
- Scikit-learn team for their excellent machine learning library
- Flask team for the web framework
- All contributors who spend time to help improve this project