-
Notifications
You must be signed in to change notification settings - Fork 0
/
instructors.html
381 lines (370 loc) · 22.9 KB
/
instructors.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="generator" content="pandoc">
<title>Software Carpentry: Programming with R</title>
<link rel="shortcut icon" type="image/x-icon" href="/favicon.ico" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap.css" />
<link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap-theme.css" />
<link rel="stylesheet" type="text/css" href="css/swc.css" />
<link rel="alternate" type="application/rss+xml" title="Software Carpentry Blog" href="http://software-carpentry.org/feed.xml"/>
<meta charset="UTF-8" />
<!-- HTML5 shim, for IE6-8 support of HTML5 elements -->
<!--[if lt IE 9]>
<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
</head>
<body class="lesson">
<div class="container card">
<div class="banner">
<a href="http://software-carpentry.org" title="Software Carpentry">
<img alt="Software Carpentry banner" src="img/software-carpentry-banner.png" />
</a>
</div>
<article>
<div class="row">
<div class="col-md-10 col-md-offset-1">
<a href="index.html"><h1 class="title">Programming with R</h1></a>
<h2 class="subtitle">Instructor’s Guide</h2>
<h2 id="legend">Legend</h2>
<p>We are using a dataset with records on inflammation from patients following an arthritis treatment. With it we explain <code>R</code> data structure, basic data manipulation and plotting, writing functions and loops.</p>
<h2 id="overall">Overall</h2>
<p>This lesson is written as an introduction to R, but its real purpose is to introduce the single most important idea in programming: how to solve problems by building functions, each of which can fit in a programmer’s working memory. In order to teach that, we must teach people a little about the mechanics of manipulating data with lists and file I/O so that their functions can do things they actually care about. Our teaching order tries to show practical uses of every idea as soon as it is introduced; instructors should resist the temptation to explain the “other 90%” of the language as well.</p>
<p>The secondary goal of this lesson is to give them a usable mental model of how programs run (what computer science educators call a <a href="reference.html#notional-machine">notional machine</a> so that they can debug things when they go wrong. In particular, they must understand how function call stacks work.</p>
<p>The final example asks them to build a command-line tool that works with the Unix pipe-and-filter model. We do this because it is a useful skill and because it helps learners see that the software they use isn’t magical. Tools like <code>grep</code> might be more sophisticated than the programs our learners can write at this point in their careers, but it’s crucial they realize this is a difference of scale rather than kind.</p>
<p>The <code>R</code> novice inflammation contains a lot of material to cover. Remember this lesson does not spend a lot of time on data types, data structure, etc. It is also on par with the similar lesson on Python. The objective is to explain modular programming with the concepts of functions, loops, flow control, and defensive programming (i.e. SWC best practices). Supplementary material is available for R specifics (<a href="01-supp-addressing-data.html">Addressing Data</a>, <a href="01-supp-data-structures.html">Data Types and Structure</a>, <a href="01-supp-factors.html">Understanding Factors</a>, <a href="01-supp-intro-rstudio.html">Introduction to RStudio</a>, <a href="01-supp-read-write-csv.html">Reading and Writing .csv</a>, <a href="03-supp-loops-in-depth.html">Loops in R</a>, <a href="06-best-practices-R.html">Best Practices for Using R and Designing Programs</a>, <a href="07-knitr-R.html">Dynamic Reports with knitr</a>, <a href="08-making-packages-R.html">Making Packages in R</a>).</p>
<p>A typical, half-day, lesson would use the first three lessons:</p>
<ol style="list-style-type: decimal">
<li><a href="01-starting-with-data.html">Analyzing Patient Data</a></li>
<li><a href="02-func-R.html">Creating Functions</a></li>
<li><a href="03-loops-R.html">Analyzing Multiple Data Sets</a></li>
</ol>
<p>An additional half-day could add the next two lessons:</p>
<ol start="4" style="list-style-type: decimal">
<li><a href="04-cond.html">Making choices</a></li>
<li><a href="05-cmdline.html">Command-Line Programs</a></li>
</ol>
<p>Time-permitting, you can fit in one of these shorter lessons that cover bigger picture ideas like best practices for organizing code, reproducible research, and creating packages:</p>
<ol start="6" style="list-style-type: decimal">
<li><a href="06-best-practices-R.html">Best practices for using R and designing programs</a></li>
<li><a href="07-knitr-R.html">Dynamic reports with knitr</a></li>
<li><a href="08-making-packages-R.html">Making packages in R</a></li>
</ol>
<h2 id="analyzing-patient-data"><a href="01-starting-with-data.html">Analyzing Patient Data</a></h2>
<ul>
<li><p>Check learners are reading files from the correct location (set working directory); remind them of the shell lesson</p></li>
<li><p>Provide shortcut for the assignment operator (<code><-</code>) (RStudio: Alt+- on Windows/Linux; Option+- on Mac)</p></li>
</ul>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">dat <-<span class="st"> </span><span class="kw">read.csv</span>(<span class="st">"data/inflammation-01.csv"</span>, <span class="dt">header =</span> <span class="ot">FALSE</span>)
animal <-<span class="st"> </span><span class="kw">c</span>(<span class="st">"m"</span>, <span class="st">"o"</span>, <span class="st">"n"</span>, <span class="st">"k"</span>, <span class="st">"e"</span>, <span class="st">"y"</span>)
<span class="co"># Challenge - Slicing (subsetting data)</span>
animal[<span class="dv">4</span>:<span class="dv">1</span>] <span class="co"># first 4 characters in reverse order</span></code></pre></div>
<pre class="output"><code>[1] "k" "n" "o" "m"
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">animal[-<span class="dv">1</span>] <span class="co"># remove first character</span></code></pre></div>
<pre class="output"><code>[1] "o" "n" "k" "e" "y"
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">animal[-<span class="dv">4</span>] <span class="co"># remove fourth character</span></code></pre></div>
<pre class="output"><code>[1] "m" "o" "n" "e" "y"
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">animal[-<span class="dv">1</span>:-<span class="dv">4</span>] <span class="co"># remove first to fourth characters</span></code></pre></div>
<pre class="output"><code>[1] "e" "y"
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">animal[<span class="kw">c</span>(<span class="dv">5</span>, <span class="dv">2</span>, <span class="dv">3</span>)] <span class="co"># new character vector</span></code></pre></div>
<pre class="output"><code>[1] "e" "o" "n"
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Challenge - Subsetting data</span>
<span class="kw">max</span>(dat[<span class="dv">5</span>, <span class="dv">3</span>:<span class="dv">7</span>])</code></pre></div>
<pre class="output"><code>[1] 3
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">sd_day_inflammation <-<span class="st"> </span><span class="kw">apply</span>(dat, <span class="dv">2</span>, sd)
<span class="kw">plot</span>(sd_day_inflammation)</code></pre></div>
<h2 id="addressing-data"><a href="01-supp-addressing-data.html">Addressing Data</a></h2>
<ul>
<li>Note that the data frame <code>dat</code> is not the same set of data as in other lessons</li>
</ul>
<h2 id="data-types-and-structure"><a href="01-supp-data-structures.html">Data Types and Structure</a></h2>
<ul>
<li>Lesson on data types and structures</li>
</ul>
<h2 id="understanding-factors"><a href="01-supp-factors.html">Understanding Factors</a></h2>
<h2 id="introduction-to-rstudio"><a href="01-supp-intro-rstudio.html">Introduction to RStudio</a></h2>
<h2 id="reading-and-writing-.csv"><a href="01-supp-read-write-csv.html">Reading and Writing .csv</a></h2>
<h2 id="creating-functions"><a href="02-func-R.html">Creating Functions</a></h2>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Challenge - Create a function</span>
fence <-<span class="st"> </span>function(original, wrapper) {
answer <-<span class="st"> </span><span class="kw">c</span>(wrapper, original, wrapper)
<span class="kw">return</span>(answer)
}</code></pre></div>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Challenge - A more advanced function</span>
analyze <-<span class="st"> </span>function(filename) {
<span class="co"># Plots the average, min, and max inflammation over time.</span>
<span class="co"># Input is character string of a csv file.</span>
dat <-<span class="st"> </span><span class="kw">read.csv</span>(<span class="dt">file =</span> filename, <span class="dt">header =</span> <span class="ot">FALSE</span>)
avg_day_inflammation <-<span class="st"> </span><span class="kw">apply</span>(dat, <span class="dv">2</span>, mean)
<span class="kw">plot</span>(avg_day_inflammation)
max_day_inflammation <-<span class="st"> </span><span class="kw">apply</span>(dat, <span class="dv">2</span>, max)
<span class="kw">plot</span>(max_day_inflammation)
min_day_inflammation <-<span class="st"> </span><span class="kw">apply</span>(dat, <span class="dv">2</span>, min)
<span class="kw">plot</span>(min_day_inflammation)
}
<span class="co"># Challenge - rescale</span>
rescale <-<span class="st"> </span>function(v) {
<span class="co"># Rescales a vector, v, to lie in the range 0 to 1.</span>
L <-<span class="st"> </span><span class="kw">min</span>(v)
H <-<span class="st"> </span><span class="kw">max</span>(v)
result <-<span class="st"> </span>(v -<span class="st"> </span>L) /<span class="st"> </span>(H -<span class="st"> </span>L)
<span class="kw">return</span>(result)
}</code></pre></div>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Challenge - A function with default argument values</span>
rescale <-<span class="st"> </span>function(v, <span class="dt">lower =</span> <span class="dv">0</span>, <span class="dt">upper =</span> <span class="dv">1</span>) {
<span class="co"># Rescales a vector, v, to lie in the range lower to upper.</span>
L <-<span class="st"> </span><span class="kw">min</span>(v)
H <-<span class="st"> </span><span class="kw">max</span>(v)
result <-<span class="st"> </span>(v -<span class="st"> </span>L) /<span class="st"> </span>(H -<span class="st"> </span>L) *<span class="st"> </span>(upper -<span class="st"> </span>lower) +<span class="st"> </span>lower
<span class="kw">return</span>(result)
}
answer <-<span class="st"> </span><span class="kw">rescale</span>(dat[, <span class="dv">4</span>], <span class="dt">lower =</span> <span class="dv">2</span>, <span class="dt">upper =</span> <span class="dv">5</span>)
<span class="kw">min</span>(answer)</code></pre></div>
<pre class="output"><code>[1] 2
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">max</span>(answer)</code></pre></div>
<pre class="output"><code>[1] 5
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">answer <-<span class="st"> </span><span class="kw">rescale</span>(dat[, <span class="dv">4</span>], <span class="dt">lower =</span> -<span class="dv">5</span>, <span class="dt">upper =</span> -<span class="dv">2</span>)
<span class="kw">min</span>(answer)</code></pre></div>
<pre class="output"><code>[1] -5
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">max</span>(answer)</code></pre></div>
<pre class="output"><code>[1] -2
</code></pre>
<h2 id="analyzing-multiple-data-sets"><a href="03-loops-R.html">Analyzing Multiple Data Sets</a></h2>
<ul>
<li>The transition from the previous lesson to this one might be challenging for a very novice audience. Do not rush through the challenges, maybe drop some.</li>
</ul>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Challenge - Using loops</span>
print_N <-<span class="st"> </span>function(N) {
nseq <-<span class="st"> </span><span class="kw">seq</span>(N)
for (num in nseq) {
<span class="kw">print</span>(num)
}
}
<span class="kw">print_N</span>(<span class="dv">3</span>)</code></pre></div>
<pre class="output"><code>[1] 1
[1] 2
[1] 3
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">total <-<span class="st"> </span>function(vec) {
<span class="co">#calculates the sum of the values in a vector</span>
vec_sum <-<span class="st"> </span><span class="dv">0</span>
for (num in vec) {
vec_sum <-<span class="st"> </span>vec_sum +<span class="st"> </span>num
}
<span class="kw">return</span>(vec_sum)
}
ex_vec <-<span class="st"> </span><span class="kw">c</span>(<span class="dv">4</span>, <span class="dv">8</span>, <span class="dv">15</span>, <span class="dv">16</span>, <span class="dv">23</span>, <span class="dv">42</span>)
<span class="kw">total</span>(ex_vec)</code></pre></div>
<pre class="output"><code>[1] 108
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">expo <-<span class="st"> </span>function(base, power) {
result <-<span class="st"> </span><span class="dv">1</span>
for (i in <span class="kw">seq</span>(power)) {
result <-<span class="st"> </span>result *<span class="st"> </span>base
}
<span class="kw">return</span>(result)
}
<span class="kw">expo</span>(<span class="dv">2</span>, <span class="dv">4</span>)</code></pre></div>
<pre class="output"><code>[1] 16
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Challenge - Using loops to analyze multiple files</span>
analyze_all <-<span class="st"> </span>function(pattern) {
<span class="co"># Runs the function analyze for each file in the current working directory</span>
<span class="co"># that contains the given pattern.</span>
filenames <-<span class="st"> </span><span class="kw">list.files</span>(<span class="dt">path =</span> <span class="st">"data"</span>, <span class="dt">pattern =</span> pattern, <span class="dt">full.names =</span> <span class="ot">TRUE</span>)
for (f in filenames) {
<span class="kw">analyze</span>(f)
}
}</code></pre></div>
<h2 id="loops-in-r"><a href="03-supp-loops-in-depth.html">Loops in R</a></h2>
<h2 id="making-choices"><a href="04-cond-colors-R.html">Making Choices</a></h2>
<h2 id="making-choices-1"><a href="04-cond.html">Making Choices</a></h2>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Challenge - Using conditions to change behaviour</span>
plot_dist <-<span class="st"> </span>function(x, threshold) {
if (<span class="kw">length</span>(x) ><span class="st"> </span>threshold) {
<span class="kw">boxplot</span>(x)
} else {
<span class="kw">stripchart</span>(x)
}
}
plot_dist <-<span class="st"> </span>function(x, threshold, <span class="dt">use_boxplot =</span> <span class="ot">TRUE</span>) {
if (<span class="kw">length</span>(x) ><span class="st"> </span>threshold &<span class="st"> </span>use_boxplot) {
<span class="kw">boxplot</span>(x)
} else if (<span class="kw">length</span>(x) ><span class="st"> </span>threshold &<span class="st"> </span>!use_boxplot) {
<span class="kw">hist</span>(x)
} else {
<span class="kw">stripchart</span>(x)
}
}
<span class="co"># Challenge - Changing behaviour of the plot command</span>
analyze <-<span class="st"> </span>function(filename, <span class="dt">output =</span> <span class="ot">NULL</span>) {
<span class="co"># Plots the average, min, and max inflammation over time.</span>
<span class="co"># Input:</span>
<span class="co"># filename: character string of a csv file</span>
<span class="co"># output: character string of pdf file for saving</span>
if (!<span class="kw">is.null</span>(output)) {
<span class="kw">pdf</span>(output)
}
dat <-<span class="st"> </span><span class="kw">read.csv</span>(<span class="dt">file =</span> filename, <span class="dt">header =</span> <span class="ot">FALSE</span>)
avg_day_inflammation <-<span class="st"> </span><span class="kw">apply</span>(dat, <span class="dv">2</span>, mean)
<span class="kw">plot</span>(avg_day_inflammation, <span class="dt">type =</span> <span class="st">"l"</span>)
max_day_inflammation <-<span class="st"> </span><span class="kw">apply</span>(dat, <span class="dv">2</span>, max)
<span class="kw">plot</span>(max_day_inflammation, <span class="dt">type =</span> <span class="st">"l"</span>)
min_day_inflammation <-<span class="st"> </span><span class="kw">apply</span>(dat, <span class="dv">2</span>, min)
<span class="kw">plot</span>(min_day_inflammation, <span class="dt">type =</span> <span class="st">"l"</span>)
if (!<span class="kw">is.null</span>(output)) {
<span class="kw">dev.off</span>()
}
}</code></pre></div>
<h2 id="best-practices-for-using-r-and-designing-programs"><a href="06-best-practices-R.html">Best Practices for Using R and Designing Programs</a></h2>
<h2 id="command-line-programs"><a href="05-cmdline.html">Command-Line Programs</a></h2>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Challenge - A simple command line program</span>
cat arith.R</code></pre></div>
<pre class="output"><code>main <- function() {
# Performs addition or subtraction from the command line.
#
# Takes three arguments:
# The first and third are the numbers.
# The second is either + for addition or - for subtraction.
#
# Ex. usage:
# Rscript arith.R 1 + 2
# Rscript arith.R 3 - 4
#
args <- commandArgs(trailingOnly = TRUE)
num1 <- as.numeric(args[1])
operation <- args[2]
num2 <- as.numeric(args[3])
if (operation == "+") {
answer <- num1 + num2
cat(answer)
} else if (operation == "-") {
answer <- num1 - num2
cat(answer)
} else {
stop("Invalid input. Use + for addition or - for subtraction.")
}
}
main()
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">cat find-pattern.R</code></pre></div>
<pre class="output"><code>main <- function() {
# Finds all files in the current directory that contain a given pattern.
#
# Takes one argument: the pattern to be searched.
#
# Ex. usage:
# Rscript find-pattern.R csv
#
args <- commandArgs(trailingOnly = TRUE)
pattern <- args[1]
files <- list.files(pattern = pattern)
cat(files, sep = "\n")
}
main()
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">## Challenge - A command line program with arguments
cat check.R</code></pre></div>
<pre class="output"><code>main <- function() {
# Checks that all csv files have the same number of rows and columns.
#
# Takes multiple arguments: the names of the files to be checked.
#
# Ex. usage:
# Rscript check.R inflammation-*
#
args <- commandArgs(trailingOnly = TRUE)
first_file <- read.csv(args[1], header = FALSE)
first_dim <- dim(first_file)
# num_rows <- dim(args[1])[1] # nrow(args[1])
# num_cols <- dim(args[1])[2] # ncol(args[1])
for (filename in args[-1]) {
new_file <- read.csv(filename, header = FALSE)
new_dim <- dim(new_file)
if (new_dim[1] != first_dim[1] | new_dim[2] != first_dim[2]) {
cat("Not all the data files have the same dimensions.")
}
}
}
main()
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Challenge - Shorter command line arguments</span>
cat readings-usage.R</code></pre></div>
<pre class="output"><code>main <- function() {
args <- commandArgs(trailingOnly = TRUE)
action <- args[1]
filenames <- args[-1]
if (!(action %in% c("--min", "--mean", "--max"))) {
usage()
} else if (length(filenames) == 0) {
process(file("stdin"), action)
} else {
for (f in filenames) {
process(f, action)
}
}
}
process <- function(filename, action) {
dat <- read.csv(file = filename, header = FALSE)
if (action == "--min") {
values <- apply(dat, 1, min)
} else if (action == "--mean") {
values <- apply(dat, 1, mean)
} else if (action == "--max") {
values <- apply(dat, 1, max)
}
cat(values, sep = "\n")
}
usage <- function() {
cat("usage: Rscript readings-usage.R [--min, --mean, --max] filenames", sep = "\n")
}
main()
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Challenge - Implementing wc in R</span>
cat line-count.R</code></pre></div>
<pre class="output"><code>main <- function() {
args <- commandArgs(trailingOnly = TRUE)
if (length(args) > 0) {
for (filename in args) {
input <- readLines(filename)
num_lines <- length(input)
cat(filename)
cat(" ")
cat(num_lines, sep = "\n")
}
} else {
input <- readLines(file("stdin"))
num_lines <- length(input)
cat(num_lines, sep = "\n")
}
}
main()
</code></pre>
<h2 id="dynamic-reports-with-knitr"><a href="07-knitr-R.html">Dynamic Reports with knitr</a></h2>
<h2 id="making-packages-in-r"><a href="08-making-packages-R.html">Making Packages in R</a></h2>
</div>
</div>
</article>
<div class="footer">
<a class="label swc-blue-bg" href="http://software-carpentry.org">Software Carpentry</a>
<a class="label swc-blue-bg" href="https://github.com/swcarpentry/lesson-template">Source</a>
<a class="label swc-blue-bg" href="mailto:admin@software-carpentry.org">Contact</a>
<a class="label swc-blue-bg" href="LICENSE.html">License</a>
</div>
</div>
<!-- Javascript placed at the end of the document so the pages load faster -->
<script src="http://software-carpentry.org/v5/js/jquery-1.9.1.min.js"></script>
<script src="css/bootstrap/bootstrap-js/bootstrap.js"></script>
</body>
</html>