forked from LucaPeg/raisins_and_mushrooms
-
Notifications
You must be signed in to change notification settings - Fork 0
/
PCA.Rmd
86 lines (61 loc) · 1.68 KB
/
PCA.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
---
title: "PCA_raisins"
output:
html_document:
keep_md: true
---
```{r}
data = read.csv('datasets/Raisin_Dataset.csv', sep = ';')
```
We exclude the label variable
```{r}
df = data[-c(8, 9)]
head(df)
```
```{r}
options(scipen = 10)
round((apply(df, 2, mean)), digits = 5); round((apply(df, 2, var)), digits = 5)
```
```{r}
summary(df)
```
```{r}
pr.out = prcomp(df, scale = TRUE)
pr.out
```
```{r}
summary(pr.out)
```
Correct dimensions, show clearly the points but it is not readable
```{r}
plot(pr.out$x[, 1], pr.out$x[, 2], type = "n", xlab = "PC1", ylab = "PC2")
points(pr.out$x[, 1], pr.out$x[, 2], col = rgb(1, 0, 0, alpha = 0.5), pch = 16)
arrows(0, 0, pr.out$rotation[, 1], pr.out$rotation[, 2], length = 0.1, angle = 30)
```
Shows both the dimension and the arrows' label, but not the points
```{r}
biplot(pr.out)
```
Compromise, arrows length increased
```{r}
plot(pr.out$x[, 1], pr.out$x[, 2], type = "n", xlab = "PC1", ylab = "PC2")
points(pr.out$x[, 1], pr.out$x[, 2], col = rgb(1, 0, 0, alpha = 0.5), pch = 16)
arrows(0, 0, pr.out$rotation[, 1]*7, pr.out$rotation[, 2]*7, length = 0.1, angle = 30)
text(pr.out$rotation[, 1]*7, pr.out$rotation[, 2]*7, labels = rownames(pr.out[[2]]), pos = 3)
```
Get the components' value for each observation
```{r}
pcadf <- predict(pr.out, newdata = df)
head(pcadf)
```
Perform 2-means clustering on the components' values for each observation
```{r}
km.out2 = kmeans(pcadf, 2)
km.out2
```
Graphical representation of the clusters.
Results are slightly better on PC than on initial features
```{r}
plot (pcadf, col = adjustcolor(km.out2$cluster + 1, alpha.f = 0.5),
main = "K- Means Clustering Results with K = 2", pch = 20)
```