-
Notifications
You must be signed in to change notification settings - Fork 9
/
Copy pathChapter12.Rmd
181 lines (116 loc) · 4.15 KB
/
Chapter12.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
# Ch. 12 Linear Regression {-}
Sections covered: 12.1, 12.2, 12.5
## 12.1 The Simple Linear Regression Model {-}
## 12.2 Estimating Model Parameters {-}
Formulas to know from p. 498:
$b_1 = \dfrac{\sum(x_i -\overline{x})(y_i - \overline{y})}{\sum(x_i - \overline{x})^2} = \frac{S_{xy}}{S_{xx}}$ and $b_0 = \overline{y} - b_1 \overline{x}$
Formula to know from p. 502:
$SSE = \sum(y_i - \hat{y_i})^2$
Formulas to know from p. 504:
$SST = \sum(y_i - \overline{y})^2$ and $r^2 = 1 - \frac{SSE}{SST}$
Formulas to know from p. 505:
$SSR = \sum(\hat{y_i} - \overline{y})^2$ and $SSE + SSR = SST$
### Resources {-}
Interactive Visualization: [Linear Regression](https://jtr13.github.io/D3/BestFittingLine.html) Try fitting the least squares line to a set of random data and check your answer ([and another one](https://www.geogebra.org/m/xC6zq7Zv)).
Video: [Regression I: What is regression? | SSE, SSR, SST | R-squared | Errors (ε vs. e)](https://www.youtube.com/watch?v=aq8VU5KLmkY) [contributed by Lance J.]
### R {-}
Calculating slope and intercept for a sample of (x, y) pairs (p. 498 formulas)
```{r}
# Example 12.8, p. 503
x <- c(12, 30, 36, 40, 45, 57, 62, 67, 71, 78, 93, 94, 100, 105)
y <- c(3.3, 3.2, 3.4, 3, 2.8, 2.9, 2.7, 2.6, 2.5, 2.6, 2.2, 2, 2.3, 2.1)
lm(y ~ x) #lm = linear model
```
**Predicted values:**
```{r}
mod <- lm(y~x)
round(mod$fitted.values, 2)
```
**Residuals:**
```{r}
round(mod$residuals, 2)
```
**SSE, SSR**
```{r}
anova(mod) # anova = analysis of variance
```
The first row under "Sum Sq" is the SSR, and the second row under "Sum Sq" is the SSE:
SSE = `r anova(mod)[2,2]`
SSR = `r anova(mod)[1,2]`
SST = SSE + SSR = `r anova(mod)[2,2]` +
`r anova(mod)[1,2]` = `r round(anova(mod)[2,2] + anova(mod)[1,2], 4)`
**Coefficient of determination $r^2$**
```{r}
# Example 12.4, 12.9
x <- c(132, 129, 120, 113.2, 105, 92, 84, 83.2, 88.4, 59, 80, 81.5, 71, 69.2)
y <- c(46, 48, 51, 52.1, 54, 52, 59, 58.7, 61.6, 64, 61.4, 54.6, 58.8, 58)
mod <- lm(y ~ x)
SSR <- anova(mod)$`Sum Sq`[1]
SST <- anova(mod)$`Sum Sq`[1] + anova(mod)$`Sum Sq`[2]
SSR/SST
```
Or (simply):
```{r}
cor(x,y)^2
```
(See section 12.5)
## 12.5 Correlation {-}
Skip: "Inferences About the Population Correlation Coefficient" (p. 530) *to end of section*.
### Resources {-}
Interactive visualization: [Correlation Coefficient](https://1201.info/ch.-5-joint-probability.html#interactive) (add and remove points)
Interactive visualization: [Interpreting Correlations](https://rpsychologist.com/d3/correlation/) [contributed by Dario G.]
### R {-}
Sample correlation coefficient $r$
```{r}
# Example 12.15, p. 528
x <- c(2.4, 3.4, 4.6, 3.7, 2.2, 3.3, 4.0, 2.1)
y <- c(1.33, 2.12, 1.80, 1.65, 2.00, 1.76, 2.11, 1.63)
cor(x,y)
```
## Practice Exercises {-}
1. **(Least squares line)** Researchers employed a least squares analysis in studying how $Y=$ porosity (%) is related to $X=$ unit weight (pcf) in concrete specimens. Consider the following representative data:
```{r}
x <- c(99.0, 101.1, 102.7, 103.0, 105.4, 107.0, 108.7, 110.8, 112.1, 112.4, 113.6, 113.8, 115.1, 115.4, 120.0)
y <- c(28.8, 27.9, 27.0, 25.2, 22.8, 21.5, 20.9, 19.6, 17.1, 18.9, 16.0, 16.7, 13.0, 13.6, 10.8)
```
(Textbook 12.17)
a. Obtain the equation of the estimated regression line.
```{r}
lm(y~x)
```
$y = 118.91 - 0.9047x$
b. Calculate the residuals corresponding to the first two observations.
```{r}
mod <- lm(y~x)
round(mod$residuals, 2)
```
Or alternatively, use R as a calculator
```{r}
pred <- 118.9099 - 0.9047*x
res <- y - pred
res[1]
res[2]
```
c. Calculate a point estimate of $\sigma$.
```{r}
sig2 <- sum((res)^2)/(length(x)-2)
sqrt(sig2)
```
d. What proportion of observed variation in porosity can be attributed to the approximate linear relationship between unit weight and porosity?
```{r}
cor(x, y)^2
```
e. Calculate the SSE and SST.
```{r}
anova(mod) # analsis of variance
SSE <- anova(mod)$`Sum Sq`[1]
SST <- anova(mod)$`Sum Sq`[1] + anova(mod)$`Sum Sq`[2]
c(SSE, SST)
```
Or alternatively, use R as a calculator. Notice that the same results are produced.
```{r}
SSE1 <- sum((mod$residual)^2)
SST1 <- sum((y-mean(y))^2)
SSR1 <- sum((mod$fitted.values - mean(y))^2)
c(SSE1, SST1, SSE1+SSR1)
```