Transaction Fraud Detection Project

Overview

During my research on fraud detection models, I encountered several data challenges that inspired the need for a custom data generation tool. Many available open-source datasets, particularly in financial domains, feature encoded variable names like v1, v2, ... v22, which obscure the nature of each feature. This lack of interpretability makes it difficult to identify and explain which features are significant in detecting fraud, limiting model transparency and deployment potential.

To overcome these limitations, I developed a synthetic banking data generator that produces realistic transaction data with meaningful feature names, such as merchant_name instead of an anonymous code like v2. This approach facilitates an end-to-end fraud detection model with enhanced interpretability and practical applications. The custom generator allows control over the number of customer transactions, transaction range, and fraud percentage, making it ideal for understanding fraud detection workflows and exploring model performance in a controlled environment.

Project Structure

.
├── README.md                    # Documentation of project setup, structure, usage, and details
├── app.py                       # Flask app for deploying the fraud detection model as a web service
├── data_gen.py                  # Script for generating synthetic transaction data
├── fraud_detection_model.pkl     # Trained and saved fraud detection model
├── model.ipynb                  # Jupyter notebook for data generation, model training, and evaluation
├── requirements.txt             # List of Python libraries required for the project
├── static                       # Directory for static files (CSS)
│   └── style.css                # CSS styling for the web application interface
├── synthetic_fraud_data.csv     # Generated synthetic dataset for model training/testing
└── templates                    # Directory for HTML templates
    └── index.html               # HTML template for the web application interface

Explanation of Project Files

README.md: Project documentation, including setup, structure, usage, and performance details.
app.py: Flask web application that loads the model, accepts transaction data, and returns predictions for fraud detection.
data_gen.py: Script to generate synthetic transaction data. This enables custom datasets with realistic attributes and control over fraud prevalence.
fraud_detection_model.pkl: Pre-trained fraud detection model saved using joblib.
model.ipynb: Notebook that contains the end-to-end workflow, from data generation and preprocessing to model training, evaluation, and saving.
requirements.txt: Lists required packages (e.g., pandas, flask) for setting up the environment.
static/style.css: CSS file for styling the web interface, ensuring a responsive, user-friendly design.
synthetic_fraud_data.csv: Sample dataset generated by data_gen.py, used for training and testing the model.
templates/index.html: HTML template for the web application’s user interface, allowing input of transaction data and displaying predictions.

Installation and Setup

To set up this project locally, follow these steps:

Clone the Repository

git clone https://github.com/Ismat-Samadov/fraud_detection.git
cd fraud_detection

Install Requirements

Ensure that you have Python 3 installed. Install the necessary libraries with:

pip install -r requirements.txt

Run the Application

To start the Flask app:

python app.py

The application should now be running on http://127.0.0.1:5000. Open this URL in your browser to access the fraud detection interface.

Data Generation

Why Synthetic Data Generation? The data generator was developed to create realistic, interpretable financial transaction datasets. Instead of dealing with obscure feature names (v1, v2), the generator produces descriptive attributes (e.g., merchant_category, device) that clarify each feature's role in fraud detection. This interpretability improves both the model-building process and user experience when deploying fraud detection applications.

Example Data Generation

To generate a dataset, specify the number of customers, range of transactions, and fraud percentage:

if __name__ == "__main__":
    generator = TransactionDataGenerator()
    df = generator.generate_dataset(
        num_customers=500,
        transactions_per_customer=(100, 200),
        fraud_percentage=0.1
    )
    df.to_csv("synthetic_fraud_data.csv", index=False)
    print("Dataset saved to synthetic_fraud_data.csv")

Model Performance

The fraud detection model trained on this dataset demonstrates robust performance, achieving high precision and recall in identifying fraud.

Classification Report:

Metric	Class	Precision	Recall	F1-Score	Support
Non-Fraud	False (0)	0.98	1.00	0.99	20,308
Fraud	True (1)	0.96	0.77	0.86	2,186
Accuracy		0.97			22,494
Macro Avg		0.97	0.88	0.92
Weighted Avg		0.97	0.97	0.97

ROC-AUC Score: 0.88

The high ROC-AUC score and precision metrics underscore the model’s effectiveness in distinguishing fraudulent from non-fraudulent transactions, making it a reliable solution for fraud detection.

Model Prediction Example

To make predictions using the trained model, load it, prepare sample transaction data, and interpret the prediction as shown:

# Import necessary libraries
import joblib
import pandas as pd

# Load the saved model
model = joblib.load('fraud_detection_model.pkl')

# Sample input data for prediction
new_data = pd.DataFrame({
    'amount': [500.0],
    'distance_from_home': [1],
    'transaction_hour': [13],
    'merchant_category': ['Retail'],
    'merchant_type': ['online'],
    'merchant': ['Amazon'],
    'currency': ['USD'],
    'country': ['United States'],
    'city': ['New York'],
    'city_size': ['large'],
    'card_type': ['debit'],
    'device': ['Chrome'],
    'channel': ['web'],
    'day_of_week': [2],
    'is_weekend': [0],
    'num_transactions_last_hour': [3],
    'total_amount_last_hour': [1500.0],
})

# Make prediction
prediction = model.predict(new_data)

# Interpret prediction
if prediction[0] == 1:
    print("Prediction: Fraudulent Transaction")
else:
    print("Prediction: Non-Fraudulent Transaction")
    
# Get probabilities for each class
proba = model.predict_proba(new_data)

print(f"Probability of Non-Fraudulent Transaction: {proba[0][0]:.2f}")
print(f"Probability of Fraudulent Transaction: {proba[0][1]:.2f}")

Interpretation

Prediction: The model returns whether a transaction is likely fraudulent or not.
Probability: Probabilities for each class provide insight into prediction confidence, allowing for risk-based thresholds in fraud detection.

Demo

You can test the fraud detection app via this deployed demo: https://fraud-detection-gyhe.onrender.com/

Note: Since the app is hosted on a free server, initial loading may take 1-2 minutes as the server "wakes up."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transaction Fraud Detection Project

Overview

Project Structure

Explanation of Project Files

Installation and Setup

Clone the Repository

Install Requirements

Run the Application

Data Generation

Example Data Generation

Model Performance

Model Prediction Example

Interpretation

Demo

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
static		static
templates		templates
.gitignore		.gitignore
README.md		README.md
app.py		app.py
data_gen.py		data_gen.py
fraud_detection_model.pkl		fraud_detection_model.pkl
model.ipynb		model.ipynb
requirements.txt		requirements.txt
synthetic_fraud_data.csv		synthetic_fraud_data.csv

Ismat-Samadov/fraud_detection

Folders and files

Latest commit

History

Repository files navigation

Transaction Fraud Detection Project

Overview

Project Structure

Explanation of Project Files

Installation and Setup

Clone the Repository

Install Requirements

Run the Application

Data Generation

Example Data Generation

Model Performance

Model Prediction Example

Interpretation

Demo

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages