Credit card fraud detection using Neural networks

Overview:

Our objective is to create the classifier for credit card fraud detection. To do it, we'll compare classification models from different methods :

Logistic regression
Support Vector Machine
Bagging (Random Forest)
Boosting (XGBoost)
Neural Network (tensorflow/keras)

Dataset:

Credit Card Fraud Detection

The datasets contains transactions made by credit cards in September 2013 by european cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions. I decided to proceed to an undersampling strategy to re-balance the class.

It contains only numerical input variables which are the result of a PCA transformation. Unfortunately, due to confidentiality issues, we cannot provide the original features and more background information about the data.

Implementation:

Libraries: NumPy pandas pylab matplotlib sklearn seaborn plotly tensorflow keras imblearn

Data Exploration:

Only 492 (or 0.172%) of transaction are fraudulent. That means the data is highly unbalanced with respect with target variable Class.

The dataset is highly imbalanced ! It's a big problem because classifiers will always predict the most common class without performing any analysis of the features and it will have a high accuracy rate, obviously not the correct one. To change that, I will proceed to random undersampling.

The simplest undersampling technique involves randomly selecting examples from the majority class and deleting them from the training dataset. This is referred to as random undersampling.

Although simple and effective, a limitation of this technique is that examples are removed without any concern for how useful or important they might be in determining the decision boundary between the classes. This means it is possible, or even likely, that useful information will be deleted.

For undersampling, we can use the package imblearn with RandomUnderSampler function.

import imblearn
from imblearn.under_sampling import RandomUnderSampler 
undersample = RandomUnderSampler(sampling_strategy=0.5)

Machine Learning Model Evaluation and Prediction:

Logistic Regression:

Accuracy : 0.94
F1 score : 0.92
AUC : 0.96

Support Vector Machine:

Accuracy : 0.94
F1 score : 0.92
AUC : 0.97

Random Forest:

Accuracy : 0.95
F1 score : 0.93
AUC : 0.97

XGBoost:

The sequential ensemble methods, also known as “boosting”, creates a sequence of models that attempt to correct the mistakes of the models before them in the sequence. The first model is built on training data, the second model improves the first model, the third model improves the second, and so on.

Accuracy : 0.95
F1 score : 0.93
AUC : 0.97

Multi Layer Perceptron:

The layers of a neural network are made of nodes. A node combines input from the data with a set of coefficients and bias, that either amplify or dampen that input, thereby assigning significance to inputs with regard to the task the algorithm is trying to learn. These input-weight products are summed and then the sum is passed through a node’s so-called activation function, to determine whether and to what extent that signal should progress further through the network to affect the ultimate outcome, say, an act of classification. If the signals passes through, the neuron has been “activated.”

Accuracy : 0.95
F1 score : 0.94
AUC : 0.98

Neural Networks:

model = Sequential()
model.add(Dense(32, input_shape=(29,), activation='relu')),
model.add(Dropout(0.2)),
model.add(Dense(16, activation='relu')),
model.add(Dropout(0.2)),
model.add(Dense(8, activation='relu')),
model.add(Dropout(0.2)),
model.add(Dense(4, activation='relu')),
model.add(Dropout(0.2)),
model.add(Dense(1, activation='sigmoid'))
opt = tf.keras.optimizers.Adam(learning_rate=0.001) #optimizer
model.compile(optimizer=opt, loss=tf.keras.losses.BinaryCrossentropy(), metrics=['accuracy'])

earlystopper = tf.keras.callbacks.EarlyStopping(monitor='val_accuracy', min_delta=0, patience=15, verbose=1,mode='auto', baseline=None, restore_best_weights=False)
history = model.fit(X_train.values, y_train.values, epochs = 6, batch_size=5, validation_split = 0.15, verbose = 0, callbacks = [earlystopper])

The hidden layers are composed of an activation function called ReLU. It'is a piecewise linear function that will output the input directly if it is positive, otherwise, it will output zero. The last node has a sigmoid function that turns values to 0 or 1 (for binary classification).

Accuracy : 0.95
F1 score : 0.94
AUC : 0.98

Lessons Learned

Neural Networks Undersampling Callbacks in Keras Classification Algorithms Multilayer Perceptrons XGBoost classifier Bagging Boosting

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
output		output
Credit Card Fraud Detection Using Neural Networks.ipynb		Credit Card Fraud Detection Using Neural Networks.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit card fraud detection using Neural networks

Overview:

Dataset:

Implementation:

Data Exploration:

Machine Learning Model Evaluation and Prediction:

Logistic Regression:

Support Vector Machine:

Random Forest:

XGBoost:

Multi Layer Perceptron:

Neural Networks:

Lessons Learned

Related:

Feedback

🚀 About Me

Hi, I'm Pradnya! 👋

About

Releases

Packages

Languages

Pradnya1208/Credit-Card-Fraud-Detection-Using-Neural-Networks

Folders and files

Latest commit

History

Repository files navigation

Credit card fraud detection using Neural networks

Overview:

Dataset:

Implementation:

Data Exploration:

Machine Learning Model Evaluation and Prediction:

Logistic Regression:

Support Vector Machine:

Random Forest:

XGBoost:

Multi Layer Perceptron:

Neural Networks:

Lessons Learned

Related:

Feedback

🚀 About Me

Hi, I'm Pradnya! 👋

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages