-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathL136_tSNE_Circles_Template.Rmd
100 lines (76 loc) · 2 KB
/
L136_tSNE_Circles_Template.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
---
title: "t-SNE in Comparison to PCA for Circles"
output:
html_document:
toc: true
toc_float: true
code_folding: hide
number_sections: true
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = T, message = F, warning = F)
```
# Introduction
```{r}
library(Rtsne)
library(keras)
library(tidyverse)
library(plotly)
```
# Data Preparation
First, we create the data.
```{r}
set.seed(123)
df <- dplyr::tibble(x = runif(n = 1000, min = -10, max = 10),
y = runif(n = 1000, min = -10, max = 10),
z = runif(n = 1000, min = -10, max = 10)) %>%
dplyr::mutate(r = x^2+y^2+z^2) %>%
dplyr::mutate (class = ifelse(r < 20, 1, 0)) %>%
dplyr::filter(r < 100) %>%
dplyr::filter(r < 20 | r > 70) %>%
select(-r)
```
Now we visualise the data.
```{r}
df %>%
filter (z > -2 & z < 2) %>%
ggplot(aes(x, y, col = factor(class))) +
geom_point() +
theme_bw() +
scale_color_discrete(name = "Class")
```
We can also create a three-dimensional plot with plotly.
```{r}
plotly::plot_ly(df, x = ~x, y =~y, z=~z, color = ~class)
```
# t-SNE Model
A t-SNE model is created. The function **Rtsne()** is applied.
```{r}
# code here
```
Now we can create the results plot.
```{r}
ggplot(result_tsne, aes(x=V1, y=V2, col = as.factor(label))) +
geom_point(size=2) +
labs(x = "", y = "", title = "t-SNE") +
theme_bw() +
scale_color_discrete(name = "Label")
```
All classes are nicely separated.
# PCA Model
A PCA model is created with **prcomp()**.
```{r}
pca_model <- prcomp(x = df, center = T, scale. = F)
result_pca <- tibble(PC1 = pca_model$x[, 1],
PC2 = pca_model$x[, 2],
label = df$class)
```
We can now visualise the result.
```{r}
ggplot(result_pca, aes(x=PC1, y=PC2, col = factor(label))) +
geom_point(size=2) +
labs(x = "", y = "", title = "PCA") +
theme_bw() +
scale_color_discrete(name = "Label")
```
The labels are overlapping a lot. They are not as nicely separated as for t-SNE.