-
Notifications
You must be signed in to change notification settings - Fork 0
/
RP2.Rmd
144 lines (105 loc) · 5.67 KB
/
RP2.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---
title: "SDS 358 Group5 RP#2"
output: word_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## I. Group Member
Xiaohan Wu (xw4822)
Ningxin Liu (nl8385)
Binglin Zhang (bz3856)
## II. Research Topic
This research focuses on investigating the effect of Attack, Defense, Special Attack, Special Defense, and Speed on the HP and understanding the way of designers balancing the HP of each kind of Pokemon within other characteristics.
## III. Variables
| Predictor | Description |
|:---------: |:------------------------------------------------------------------------------------:|
| Attack | The base modifier for normal attacks (eg. Scratch, Punch) |
| Defense | The base damage resistance against normal attacks |
| Sp..Atk | Special attack, the base modifier for special attacks (e.g. fire blast, bubble beam) |
| Sp..Def | The base damage resistance against special attacks |
| Speed | Determines which Pokemon attacks first each round |
| Response | Description |
|:--------: |:--------------------------------------------------------------------------------------: |
| HP | hit points, or health, defines how much damage a Pokemon can withstand before fainting |
## IV. Methods of collecting the data
The original data set comes from Kaggle: [pokemon dataset](https://www.kaggle.com/abcsds/pokemon)[1]
It contains 721 Pokemon, including their number, name, first and second type, and basic stats: HP, Attack, Defense, Special Attack, Special Defense, and Speed. The data is about the pokemon game and comes from[1]:
* pokemon.com
* pokemondb
* bulbapedia
## V. Predictions of the Relationship
The research aims to run a multiple linear regression analysis on predictors and the response. We are expecting to see a negative association between each predictor and the response.
## VI. Snippet of data
We only keep the data of five predictors and the response we need.
```{r install, echo=FALSE, results='hide', message=FALSE}
install.packages("tidyverse")
install.packages("psych")
```
```{r load lib&data, echo=FALSE, results='hide', message=FALSE}
library(tidyverse)
library(psych)
raw_data <- read.csv("Pokemon.csv")
data <- raw_data[c("HP", "Attack", "Defense", "Sp..Atk", "Sp..Def", "Speed")]
```
## VII. Univariate Analysis
### Summary of Data
```{r data head}
head(data, n=5)
```
```{r Mean and SD of Attack, Defense, Special Attack, Special Defense, Speed}
summary.extended <- data %>%
select(Attack, Defense, Sp..Atk, Sp..Def, Speed) %>%
psych::describe(fast = TRUE) %>%
as_tibble(rownames="rowname") %>%
print(summary.extended)
```
### Histograms
```{r histogram1}
# histogram for Attack
ggplot(raw_data, aes(Attack))+geom_histogram(fill="lightblue")+labs(title = "histogram for Attack", x="Attack", y="relative frequency")+theme_classic()
```
The distribution of the predictor Attack is approximately normal distribution and skewed to the right.
```{r histogram2}
# histogram for Defense
ggplot(raw_data, aes(Defense))+geom_histogram(fill="lightgreen")+labs(title = "histogram for Defense", x="Defense", y="relative frequency")+theme_classic()
```
The distribution of the predictor Attack is approximately normal distribution and skewed to the right with some possible outliers.
```{r histogram3}
# histogram for Special Attack
ggplot(raw_data, aes(Sp..Atk))+geom_histogram(fill="red")+labs(title = "Special Attack", x="Special Attack", y="relative frequency")+theme_classic()
```
The distribution of the predictor Attack is approximately normal distribution and skewed to the right with some possible outliers.
```{r histogram4}
# histogram for Special Defense
ggplot(raw_data, aes(Sp..Def))+geom_histogram(fill="green")+labs(title = "Special Defense", x="Special Defense", y="relative frequency")+theme_classic()
```
The distribution of the predictor Attack is approximately normal distribution and skewed to the right with some possible outliers.
## VIII. Regression Analysis
```{r regression funciton all predictiors}
reg_function <- lm(HP~Attack+Defense+Sp..Atk+Sp..Def+Speed,raw_data)
coefficients(reg_function)
```
The regression function is $HP = 31.483 + 0.296Attack - 0.0840Defense + 0.110SpecialAttack + 0.264SpecialDefense - 0.094Speed$.
* For every one unit increase in the Attack, we expect an average $0.296$ increase in the HP.
* For every one unit increase in the Defense, we expect an average $0.0840$ decrease in the HP.
* For every one unit increase in the Special Attack, we expect an average $0.110$ increase in the HP.
* For every one unit increase in the Special Defense, we expect an average $0.264$ increase in the HP.
* For every one unit increase in the Speed, we expect an average $0.296$ decrease in the HP.
'
```{r SSE, MSE AND SQUARE ROOT OF MSE}
# Save fitted (or predicted) values
raw_data$predicted <- predict(reg_function) # Save residuals
raw_data$resids <- residuals(reg_function) # Square the residuals
raw_data$resid2 <- raw_data$resids^2
# Sum the squared residuals
SSE <- sum(raw_data$resid2)
# Divide by the degree of freedom to find the MSE
MSE <- SSE/(22)
# Square root the MSE to find an estimate of \sigma res_sd_error <- sqrt(MSE)
param <- matrix(c(SSE, MSE, sqrt(MSE)),ncol=3,byrow=TRUE)
colnames(param) <- c("SSE", "MSE", "RMSE")
param
```
## Reference
[1]Barradas, A. (2016, August 29). Pokemon with stats. Retrieved October 13, 2020, from https://www.kaggle.com/abcsds/pokemon