-
Notifications
You must be signed in to change notification settings - Fork 14
/
89-Intro-to-R.Rmd
265 lines (186 loc) · 13.9 KB
/
89-Intro-to-R.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
# (APPENDIX) Appendix {.unnumbered}
# A Very Brief Introduction to R
```{r, eval=FALSE, include=FALSE}
library(bookdown); library(rmarkdown); rmarkdown::render("89-Intro-to-R.Rmd", "pdf_book")
```
```{r Ch89setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
knitr::opts_chunk$set(tidy = FALSE)
```
## Purpose
This Appendix provides a very brief introduction to the basics of R from the perspective of use for optimization modeling. This is not meant to be comprehensive R tutorial. There is an ever-expanding collection of such materials available.
## Getting Started with R
This Appendix helps a reader new to R quickly get started. I strongly recommend using RStudio. I also like books such as *R in a Nutshell* [@AdlerNutshell2012] or *R in Action* [@KabacoffActionDataAnalysis2011].
I usually find the best way for me to learn a new tool is to roll up my sleeves and jump right in. This book can be approached in that way - just be ready for a little more experimentation. This book is available on Github and the R Markdown files are available for use. Code fragments are shown quite liberally for demonstrating how things work. While this may be a bit verbose it is meant to enable readers to jump in at various points of the book.
While code fragments can be copied from the R markdown files for this book, it is often best to physically retype many of the code fragments shown as that gives time to reflect on what each statement is doing.
Let's define some conventions to be used throughout the book. First, let me be clear, the goal of this section is not to provide a comprehensive introduction to R. Entire books are written on that subject. My goal here is to simply make sure that everyone can get started productively in the material covered later.
Since this book is focused on using R for operations research, we will focus on the capabilities that are needed for this area and introduce additional features and functions as needed. If you are already comfortable with R, RStudio, and RMarkdown, you may skip the remainder of this section.
Begin by ensuring that you have access to or installed R and RStudio. Both are available for Windows, Mac, and Linux operating systems as well as being available as web services.
Now, let's assume that you are running RStudio.
In this book, I will frequently show code fragments called chunks to show R code. In general, these code chunks can be run by simply retyping the command(s) in the Console.
```{r Assigning}
a <- 7
```
This command assigns the value of 7 to the variable *a*. It is standard in R to use `<-` as an assignment operator rather than an equals sign. We can then use R to process this information.\
```{r Math-operation}
6*a
```
Yes, not surprisingly, $6*7$ is $42$. Notice that if we don't assign that result to something, it gives an immediate result to the screen.
We will often be using or defining data that has more than one element. In fact, R is designed around more complex data structures. Let's go ahead and define a matrix of data.\
```{r Create-Matrix}
b<-matrix(c(1,2,3,4,5,6,7,8))
```
By default, it is assuming that the matrix has one column which means that every data value is in a separate row. The `c` function is a concatenate operator meaning that it combines all the following items together.
Let's look at *b* now to see what it contains.
```{r Looking-at-an-Object}
b
```
Let's instead define this matrix to have four columns and two rows.
```{r Resizing-Matrix}
b<-matrix(c(1,2,3,4,5,6,7,8), ncol=4)
b
```
Notice that since there are eight elements, we only needed to tell R that the matrix has four columns and it then knew that there would be only two rows. Of course we could have set the number of rows to be two for the same result.
This is still a little ambiguous. Let's give the rows and columns names. For now we will simply name them Row and Col.
```{r Naming-Rows-and-Columns}
b<-matrix(c(1,2,3,4,5,6,7,8), ncol=4,
dimnames=c(list(c("Row1", "Row2")),
list(c("Col1", "Col2","Col3","Col4"))))
```
::: {.remark name="RStudio Console"}
The RStudio console can save typing by pressing the up arrow key to view previous command(s) which can then be further edited.
:::
Okay, this command has a lot more going on. The term `dimnames` \index{dimnames} is a parameter that contain names for rows and columns. One thing to note is that this line fills up more space than a single line so it rolls over to multiple line. The `dimnames` parameter will get get two concatenated (combined) lists. The first list is a combined list of two text strings "Row1" and "Row2". The next line does the same for columns.
Let's confirm that it works.
```{r Displaying-New-Matrix}
b
```
This table is still not that "nice" looking. Let's use a package that does a nicer job of formatting tables. To do this, we will use an extra package. Up to this point, everything that we have done just simply uses standard built in functions of R. The package that we will use is `kable` \index{kable} but there are plenty of others available such as `pander`, `kable`, `xtable`, and `huxtable`. While `kable` is a function of `knitr`, it is significantly enhanced by the `kableExtra` \index{kable!kableExtra} package. For more information on the specifics of creating rich tables, see Appendix D.
Let's start by loading the `kableExtra` package.
```{r Loading-a-Package}
library (kableExtra)
```
If R indicates that you don't have the `kableExtra` package installed on your computer, you can press the "Install" command under the Packages tab and then type in `kableExtra` or use the `install.packages` command from RStudio. The `kbl` \index{kable!kbl} is a shorthand way of entering the `kable` function that is provided by `kableExtra`. A lot of the power of `R` comes from the thousdands of packages developed by other people so you will often be installing and loading packages.
Notice the hash symbol is used to mark a comment in the above code chunk. It is also used to "comment out" a command that I don't need to use at the current time. Using comments to explain what a command is doing can be helpful to anyone that needs to revisit your code in the future, including yourself!
```{r Using-kable}
kbl (b, booktabs=T,
caption="Example using Kable with booktabs") |>
kableExtra::kable_styling(latex_options = "hold_position")
```
The table looks very nice in the book. Let's explain in detail how the table is generated. This code chunk has a lot going and is worth a careful look. \index{kable!caption} \index{kable!booktabs} \index{kable!kable\_styling} \index{kable!hold\_position}
- The first line calls an `kbl` which is an enhanced wrapper function `knitr’s kable` function which includes easier recognition of the PDF format among many other enhancements.
- The `booktabs=T` sets a format for clean published table style by setting the option to `T` for `TRUE`.
- The second line provides a caption.
- The last part of the second line warrants a little attention. It is the new pipe operator \index{Piping} and requires `R` version 4.1.0 or higher. The pipe operator basically says pass this commands output into the beginning of the next command. We'll use it a lot for table generation. Prior to R version 4.1.0, people that used pipes in R often used the `%>%` operator from a separate package such as `magrittr`. For most users, including us, the built-in `|>` will suffice.
- The last command, `kable_styling`, has an option for `hold_position` which holds the placement closer to where it is called, keeping LaTeX from second guessing where the ideal location would be for the table. Note that is also specifying its source package, in this case, `kableExtra`. There are a lot of other options in `kable_styling` that we will use in later chapters as needed.
- For your own use, you could shorten all of this to just `kable (b)`. On the other hand, `kableExtra` has one more small trick up its sleeve. It provides a shorter name for the `kable` function of just `kbl`. When space is at premium, this shortcut is handy. It also serves as a helpful check to ensure that `kableExtra` is loaded.
Let's continue experimenting with operators.
```{r}
c <- a*b
```
```{r Scalar-Multiplication, echo=FALSE}
kbl (c, booktabs=T,
caption= "Scalar Multiplication of Matrix b to Make Matrix c") |>
kable_styling(latex_options = "hold_position")
```
Now let's do a transpose operation on the original matrix. What this means that it converts rows into columns and columns into rows.
```{r}
d <- t(b)
```
```{r Transposing, echo=FALSE}
kbl (d, booktabs=T,
caption = "Transposition of Matrix b") |>
kable_styling(latex_options = "hold_position") |>
footnote (general = "Row and column names were changed at the same time.")
```
Notice that row and column names are also transposed. The result is that we now have rows labeled as columns and columns labeled as rows! This is only a problem given that we used the words rows and columns in the names. If these had been more descriptive such as weeks and product names, it would have been a good thing to change them at the same time.
Now we have done some basic operations within R.
We could try this but but we aren't quite there yet.
```{r Displaying-My-Solution}
c
d
```
Try renaming the rows and columns for `d`. As a hint, you could use `dimnames` or `colnames` and `rownames`.
Recall that we have created a variety of objects now: a, b, c, and d. R provides a lot of tools for slicing, dicing, and combining data.
```{r Original-Matrix, echo=FALSE}
kbl (b, booktabs=T,
caption="Original Matrix b") |>
kable_styling(latex_options = "hold_position")
```
Let's look at the original matrix, b and what we can do with it. Let's grab the second row, third column and last element.
```{r Extracting-Second-Row, echo=FALSE}
temp1 <- as.matrix(b[2,])
kbl (temp1, booktabs=T,
caption = "Second Row of b") |>
kable_styling(latex_options = "hold_position")
```
The `as.matrix` function is used to convert the object into a matrix so that `kable` can display it well.
```{r}
temp2 <- as.matrix(b[,3])
```
```{r Extracting-Third-Column, echo=FALSE}
kbl (temp2, booktabs=T,
caption="Third Column of b") |>
kable_styling(latex_options = "hold_position")
```
Grabbing the third column is not terribly interesting but operations such as this will often be quite useful.
```{r}
temp3 <- as.matrix(b[2,4])
```
```{r Bottom-Right-Element, echo=FALSE}
kbl (temp3, booktabs=T,
caption="Last Element of b") |>
kable_styling(latex_options = "hold_position")
```
Let's take the first row of `b` and combine it with the second row of `c` to form a new matrix, `e`. Since these are rows that are going to be combined, we will use a command, `rbind`, to bind these rows together. \index{rbind function}
```{r}
temp4 <- rbind(b[1,],c[2,])
```
```{r Combined-Matrix, echo=FALSE}
kbl (temp4, booktabs=T,
caption="Combined Matrix") |>
kable_styling(latex_options = "hold_position")
```
Notice that *temp4* has inherited the column names but lost the row names. Let's set the row names for this matrix.
```{r}
rownames(temp4)<-list("From b", "From c")
```
```{r Changing-Row-Names, echo=FALSE }
kbl (temp4, booktabs=T,
caption="Combined Matrix with Explanation of Source") |>
kable_styling(latex_options = "hold_position")
```
We could combine all of matrix `b` and matrix `c` together using row binding or column binding. Table Let's view the results of binding using columns (`cbind`) and rows (`rbind`). \index{cbind function}
```{r}
temp5 <- cbind(b,c)
```
```{r, echo=FALSE }
kbl (temp5, booktabs=T,
caption="Column Binding of Matrices b and c") |>
kable_styling(latex_options = "hold_position")
```
```{r}
temp6 <- rbind(b,c)
```
```{r Row-Binding, echo=FALSE}
kbl (temp6, booktabs=T,
caption="Row Binding of Matrices b and c") |>
kable_styling(latex_options = "hold_position")
```
Data organizing is a less glamorous part of the job for practicing analytics professionals but can consume a majority of the workday. There is a lot more that can be said about data wrestling but scripting the data cleansing in terms R commands will make the work more repeatable, consistent, and in the end save time.\
## Exercises
::: {.exercise name="Creating a Matrix of Daily Demand for Four Weeks"}
Assume that your company's product has a demand of 10 widgets on Monday, increasing by five units for every day through Sunday. Build a matrix of four weeks of demand where each row is a separate week. Each Monday starts over with the same demand. The rows should be named Week1, Week2, etc. The columns should have names corresponding to the day of the week.
:::
::: {.exercise name="Creating a Matrix of Demand for Two Products"}
Assume that your company's product has a demand of 10 widgets on Monday, increasing by five units for every day through Sunday. Gadgets have a demand of 20 on Monday, increasing by 3 units a day through Sunday. Build a matrix showing each product as a separate row. Rows should have names for the products and columns for the days of the week.
:::
::: {.exercise name="Creating a Matrix of Monthly Demand"}
Consider some top selling online products for the year 2021: toys, shoes, pens and pencils, decorative bottles, drills, cutters, and GPS navigation systems. Create your own assumed starting counts per item for throughout year by increasing each item sales by 50 units per month, starting from January till December.
:::
::: {.exercise name="Displaying Selective Data from the Matrix"}
In the above matrix, grab the columns and rows to show the data only for pens and pencils during the months of June, August and October.
:::
::: {.exercise name="Creating Separate Demands for Products"}
In the above matrix, demand for toys and shoes has been increased by 20 and 10 units respectively. Create a matrix for only these products for the first 6 months. Display the matrix twice--once without formatting and a second time using an R package that provides better table formatting (`pander`, `kable` via `knitr`, `huxtable` or similar tool. Be sure to provide a table caption.)
:::