forked from DunngenMaster/credit-card-fraud-analysis
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcredit-card-fraud-analysis.Rmd
95 lines (77 loc) · 2.01 KB
/
credit-card-fraud-analysis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
---
title: "Credit card fraud analysis"
output:
pdf_document: default
html_document: default
date: "2023-12-03"
Author: "Ashutosh, Pratiksha, Huichuan"
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
```{r,warning=FALSE}
library(dplyr)
library(tidyverse)
library(ggplot2)
library(gtsummary)
```
```{r}
ccf<- read.csv("ccf_simulation.csv")
```
```{r}
ccf<- ccf %>%
mutate(
isFraud=factor(isFraud),
type= factor(type)
)
```
```{r}
#Missing values:
ccf<- ccf %>% filter(amount <8000000)
sum(is.na(ccf))
```
```{r}
ccf_red<- ccf %>%
filter(type == c("CASH_OUT","TRANSFER"))
write.csv(ccf_red, "ccf_red.csv")
```
```{r}
ccf <- ccf %>%
filter(isFraud == 1) %>%
slice_sample(n = 1000, replace = FALSE) %>%
bind_rows(ccf %>%
filter(isFraud == 0) %>%
slice_sample(n = 4000, replace = FALSE))
ccf %>%
select(isFraud,type) %>%
tbl_summary(
by=isFraud,
digits= list(
all_continuous()~ c(2,2)
),
statistic= list(
all_continuous()~"{mean} ({sd})"
)
)
```
```{r}
ggplot(ccf,aes(fill=type, x= isFraud, group=type))+
geom_bar(width=1,show.legend = TRUE,col = "red", position= "dodge", aes(y = (..count..)/sum(..count..) * 100)) +
labs(title = "Distribution of Transaction Types by Fraud Status",
x = "Transaction Type",
y = "Percentage") +
scale_y_continuous(
labels = scales::percent_format(scale = 1)
)+scale_x_discrete(labels =c("Not Fraud","Fraud"))
```
Hypotheses Testing
$H_0: \text{The transaction type and fraud status are independant }$
$H_A: \text{The transaction type and fraud status are dependant}$
```{r}
contigency_table <- table(ccf$type,ccf$isFraud)
chi_sq_result <- chisq.test(contigency_table)
print(contigency_table)
print(chi_sq_result)
```
Decision: We reject the null hypothesis
Conclusion: Their is any strong evidence that the transaction type and fraud status are dependant.