-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathL119_Apriori_Exercise.Rmd
137 lines (96 loc) · 2.83 KB
/
L119_Apriori_Exercise.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
---
title: "Apriori: Solution"
author: "Bert Gollnick"
output:
html_document:
toc: true
toc_float: true
toc_depth: 2
code_folding: hide
number_sections: true
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, message = F, warning = F)
```
# Data Preparation
```{r}
suppressPackageStartupMessages(library(tidyverse))
suppressPackageStartupMessages(library(arules))
suppressPackageStartupMessages(library(arulesViz))
```
## Raw Data Import
We use a public dataset on grocery store purchases, that you can download e.g. from here: (here)[http://www.salemmarafi.com/wp-content/uploads/2014/03/groceries.csv].
This code downloads the data, if this was not done before.
```{r}
# if file does not exist, download it first
file_path <- "./data/groceries.csv"
if (!file.exists(file_path)) {
dir.create("./data")
url <- "http://www.salemmarafi.com/wp-content/uploads/2014/03/groceries.csv"
download.file(url = url,
destfile = file_path)
}
```
Now the data is imported.
```{r}
grocery <- read_delim(file = "./data/groceries.csv",
delim = ",",
quote = "\"",
skip = 0,
col_names = F,
na = c("", "NA"),
progress = F)
```
## Transformation to Transactions
Now we create a transactions-object based on this dataframe.
We convert the object "grocery" to a matrix. Then convert to an object of type "transactions".
```{r}
m <- as.matrix(grocery)
l <- lapply(1:nrow(m), FUN = function(i) (m[i, ]))
transactions <- as(l, "transactions")
transactions
```
# Model
Now we can work with our transactions object.
## Item Frequency
1. Please print the first item sets.
```{r}
# put your code here
```
2. Please create a graph showing the bestselling 10 items.
```{r}
# put your code here
```
## Cross Table
The cross table shows joint occurences of items.
3. Please create a cross table for "beef", "bottled beer", and "canned beer".
```{r}
# put your code here
```
## Generate Rules
4. Generate all rules with a minimum support of 0.001 and a confidence of 0.5!
```{r}
# put your code here
```
5. Show the rules, sorted by confidence.
```{r}
# put your code here
```
It is most probable to buy bottled beer, after liquor, red/blush wine, is purchased.
6. Show the rules, sorted by lift.
```{r}
# put your code here
```
7. What is the highest achieved lift? Does this mean the rule is relevant (more likely than pure chance)?
```{r}
# put your code here
```
## Specific Rules for an Item
8. Please create rules for coffee. You want to know, which items led to the purchase of coffee. Please use a support of 0.001 and a confidence of 0.1. Sort the results by "confidence".
```{r}
# put your code here
```
9. Visualise the rules with measure "lift" and shading set to "confidence".
```{r}
# put your code here
```