-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path1.Rmd
98 lines (65 loc) · 2.83 KB
/
1.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
---
title: "Is an automatic or manual transmission better for MPG?"
author: "Sergey Bushmanov, 08/21/2014"
output: pdf_document
geometry: margin=0.6in
---
```{r echo=FALSE}
library(knitr)
library(ggplot2)
library(plyr)
opts_chunk$set(
warning=FALSE,
message=FALSE,
echo=FALSE,
dpi=300,
fig.width=3,
fig.height=3,
fig.caption=TRUE,
fig.align='center',
dev="png",
dev.args=list(type="cairo"),
error=FALSE
)
```
### Executive summary
To answer the stated question we
should go out and collect MPG measures on all brands existing, i.e. total population.
However, with the help of regression analysis we can answer this question approximately,
with a stated precision, by analyzing a sample. The results obtained by analizing
`mtcars` sample in R show that automobiles with automatic transmission appear to deliver much
less MPG. On average, given sample at hand and developed regression model, autos with automatic
transmission have x MPG with 95% CI wrapping this average at ... values, and autos
with manual transmission have y MPG with 95% CI. Predicted 95% ranges are ...
and ... respectively. However, this simple regression model with single predictor
is not very efficient in predicting MPG and an alternative linear model is shown.
### Exploratory analysis
Let's explore data visually by plotting MPG for two groups of cars: with automatic
and manual transmission.
```{r echo=FALSE, fig.width=4}
data(mtcars)
ggplot(data=mtcars, aes(x=factor(am), y=mpg)) +
geom_boxplot(aes(group=factor(am), color=factor(am))) +
labs(x="Type of transmission", y = "MPG, miles/(US) gallon") +
theme(axis.title.x = element_text(size=9), axis.title.y = element_text(size=9)) +
scale_color_discrete(name="Type of transmission",
labels=c("0 - Automatic", "1 - Manual"))
```
This cursory visual analysis does suggests that there is difference in MPG due to
type of transmission. Let's validate this conjecture with regression analysis.
### Regression analysis 1.
Let's try an ordinary least squares (OLS) regression of MPG ("mpg") on type of transmission ("am").
It should be noted that although MPG is continuos variable, type of transmission is
categorical that can take on two states: either "automatic" or "manual". To perform
OLS I am treating "am" as dummy variable: 0 for "automatic" and "1" for manual (explicit factorization also possible).
```{r}
ls1 <- lm(mpg~am, data=mtcars)
summary(ls1)
```
Executive summary
1. Boxplot of MPG on two states of am and exploratory data analysis:
- mean and median, CI, summary by transmission type.
2. Fitting LS linear regreeion: preliminary diagnostics, interpretation of coeff
icients and residual plot. ggplot with additional dimensions, caret::featurePlot
3. Alternative models. Comparison among models.
4. Uncertainty in the best-fit model.