generated from datasciencecampus/skeletor-public
-
Notifications
You must be signed in to change notification settings - Fork 9
/
gov-uk-rap-materials_intro-to-RMySQL.Rmd
206 lines (153 loc) · 6.89 KB
/
gov-uk-rap-materials_intro-to-RMySQL.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
---
title: "Getting Started with RMySQL"
author: "Joseph Crispell"
date: "`r format(Sys.Date(), '%d %b %Y')`"
output:
html_document:
number_sections: true
toc: yes
toc_depth: '2'
toc_float: yes
params:
password: "ThisIsNotMyPassword"
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
# Introduction
Usually a Reproducible Analytical Pipeline will start by retrieving data from a database. Often that database could be a `SQL` database. [`MySQL`](https://www.mysql.com/) is an open-source relational database management system that uses *S*tructured *Q*uery *L*anguage (SQL). We can use `MySQL` to store complex data tables that relate to one another.
`RMySQL` is an R package that we can use to interact with `MySQL`, writing queries, updating data and creating new tables.
The current tutorial aims to give a brief introduction to using `RMySQL` in R. The current content is more-or-less the same as the TutorialsPoint [tutorial](https://www.tutorialspoint.com/r/r_database.htm), with a few updates and edits to match updates to `RMySQL`.
# Installing `MySQL`
To get started, you'll need to install `MySQL`, here are links for the different operating systems:
- Installing on linux systems ([link](https://dev.mysql.com/doc/refman/8.0/en/linux-installation.html)), with extra help [here](https://itsfoss.com/install-mysql-ubuntu/).
- Installing on mac: ([link](https://dev.mysql.com/doc/mysql-osx-excerpt/5.7/en/osx-installation.html)), also, I think it comes ready installed! See [here](https://www.thoughtco.com/installing-mysql-on-mac-2693866))
- Installing on Windows ([link](https://dev.mysql.com/downloads/installer/)), with extra help [here](https://www.wikihow.com/Install-the-MySQL-Database-Server-on-Your-Windows-PC)
For this tutorial, I worked on a Windows machine and it took me a wee while to get MySQL installed properly, so be patient. Also, I had to use the legacy password based authentification to get R to connect properly (see this [question](https://stackoverflow.com/questions/49931541/mysql-changing-authentication-type-from-standard-to-caching-sha2-password) for more information).
# Installing `RMySQL`
Much easier! Just use the following code:
```{r eval=FALSE}
install.packages("RMySQL")
```
Then we can load the library:
```{r}
library(RMySQL)
```
# Working with `RMySQL`
## Create a connection to MySQL
We are going to connect to the "sakila" database, which comes with the `MySQL` installation:
```{r eval=FALSE}
# Open a connection to a MySQL database
connection <- dbConnect(MySQL(),
user = "root",
password = readline(prompt = "Enter password: "), # Password not in code
dbname = "sakila",
host = "localhost"
)
# List the tables available in the sakila database
dbListTables(connection)
```
```{r echo=FALSE}
# Open a connection to a MySQL database
connection <- dbConnect(MySQL(),
user = "root",
password = params$password, # Password not stored in code
dbname = "sakila",
host = "localhost"
)
# List the tables available in the sakila database
head(dbListTables(connection))
```
## Sending a query
We can query a table using the following code:
```{r warning=FALSE}
# Query the "actor" table to get all the rows
# NOTE: SQL syntax is case-insensitive
result <- dbSendQuery(conn = connection, statement = "SELECT * FROM actor")
# Store the first 15 rows of the result in an R data frame object
actors <- dbFetch(result, n = 15)
# Remove the unused results
dbClearResult(result)
```
Note that the last line of code (`dbClearResult(result)`) clears the fetched data and closes the query. It is now preferred to query a table with the following code, which doesn't leave the query open:
```{r warning=FALSE}
# Note - you can use single command below to send query and send result
# as data.frame in one command - doesn't leave query open
actors <- dbGetQuery(conn = connection, statement = "SELECT * FROM actor")
```
Let's take a quick look at the actors table:
```{r}
head(actors)
```
Now, let's try getting the information for actors whose last name is "TEMPLE":
```{r warning=FALSE}
# Note - you can use single command below to send query and send result
# as data.frame in one command - doesn't leave query open
actors <- dbGetQuery(
conn = connection,
statement = "SELECT * FROM actor WHERE last_name = 'TEMPLE'"
)
# Print the table
(actors)
```
## Creating a table
In order to create a `MySQL` table, we first need to set our permissions so that we can input data locally (see [this](https://stackoverflow.com/questions/44288358/is-there-a-faster-way-to-upload-data-from-r-to-mysql) question for more information):
```{r}
dbSendQuery(conn = connection, statement = "SET GLOBAL local_infile = 1")
```
With that done, we are going to store the `mtcars` table on our local `MySQL` server using the following code:
```{r}
# Upload mtcars table onto MySQL server
dbWriteTable(connection, "mtcars", mtcars[, ], overwrite = TRUE)
# Check the table is there
mtcars_mysql <- dbGetQuery(
conn = connection,
statement = "SELECT * FROM mtcars"
)
head(mtcars_mysql)
```
## Updating a table
We can update a table on our `MySQL` server by sending a query, for example we can use the code below to edit the data for cars with a certain horse power:
```{r}
# Update data in the actor info table
dbSendQuery(
conn = connection,
statement = "UPDATE mtcars SET disp = 168.5 WHERE hp = 110"
)
# Check the updated table
mtcars_mysql <- dbGetQuery(
conn = connection,
statement = "SELECT * FROM mtcars"
)
# Take a look to see if the data has changed
head(mtcars_mysql[mtcars_mysql$hp == 110, ])
```
## Inserting a row into a table
Using the code below, we can insert data for a new car into the `mtcars` table on our `MySQL` server:
```{r}
# Insert a new row into table
dbSendQuery(
conn = connection,
statement = paste0(
"INSERT INTO mtcars(row_names, mpg, cyl, disp, hp, drat,",
"wt, qsec, vs, am, gear, carb) values('New Mazda RX4 Wag',",
"21, 6, 168.5, 110, 3.9, 2.875, 17.02, 0, 1, 4, 4)"
)
)
# Check the updated table
mtcars <- dbGetQuery(conn = connection, statement = "SELECT * FROM mtcars")
tail(mtcars)
```
## Removing a table
We can remove the `mtcars` table from our `MySQL` server with the following code:
```{r}
dbSendQuery(connection, "drop table if exists mtcars")
```
## Closing a connection to the `MySQL` server
To wrap up, we can close our connection to the `MySQL` server using the following code:
```{r warning=FALSE}
dbDisconnect(conn = connection)
```
# The end :-)
So, that's me done. This tutorial has hopefully got you started with using the `RMySQL` R package. If you haven't already I would recommend going and doing some more tutorials about `MySQL`. [Here](https://www.tutorialspoint.com/mysql/index.htm) is a good one to start with.
As you'll have noticed, this tutorial has barely scratched the surface of using `MySQL`. It is an extremely powerful language and one that will be found in many Reproducible Analytical Pipelines and well worth learning about.