-
Notifications
You must be signed in to change notification settings - Fork 0
/
02-data-structures-part1.html
451 lines (446 loc) · 30.8 KB
/
02-data-structures-part1.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="generator" content="pandoc">
<title>Software Carpentry: R for reproducible scientific analysis</title>
<link rel="shortcut icon" type="image/x-icon" href="/favicon.ico" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap.css" />
<link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap-theme.css" />
<link rel="stylesheet" type="text/css" href="css/swc.css" />
<link rel="alternate" type="application/rss+xml" title="Software Carpentry Blog" href="http://software-carpentry.org/feed.xml"/>
<meta charset="UTF-8" />
<!-- HTML5 shim, for IE6-8 support of HTML5 elements -->
<!--[if lt IE 9]>
<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
</head>
<body class="lesson">
<div class="container card">
<div class="banner">
<a href="http://software-carpentry.org" title="Software Carpentry">
<img alt="Software Carpentry banner" src="img/software-carpentry-banner.png" />
</a>
</div>
<article>
<div class="row">
<div class="col-md-10 col-md-offset-1">
<a href="index.html"><h1 class="title">R for reproducible scientific analysis</h1></a>
<h2 class="subtitle">Data structures</h2>
<section class="objectives panel panel-warning">
<div class="panel-heading">
<h2 id="learning-objectives"><span class="glyphicon glyphicon-certificate"></span>Learning objectives</h2>
</div>
<div class="panel-body">
<ul>
<li>To become familiar with the different types of data</li>
<li>To understand the different basic data structures commonly encountered in R</li>
<li>To use R built in functions to extract the type, class, and structure of an R object.</li>
</ul>
</div>
</section>
<h4 id="data-types">Data Types</h4>
<p>Before we can analyse any data, we need to have a strong understanding of the basic data types and structures in which we can store information. This is particularly important for efficient and frustration-free programming.</p>
<p>R has five basic atomic types (meaning they can’t be broken down into anything smaller):</p>
<ul>
<li>logical (e.g., <code>TRUE</code>, <code>FALSE</code>)</li>
<li>numeric</li>
<li>integer (e.g, <code>2L</code>, <code>as.integer(3)</code>)</li>
<li>double (i.e. decimal) (e.g, <code>-24.57</code>, <code>2.0</code>, <code>pi</code>)</li>
<li>complex (i.e. complex numbers) (e.g, <code>1 + 0i</code>, <code>1 + 4i</code>)</li>
<li>text (called “character” in R) (e.g, <code>"a"</code>, <code>"swc"</code>, <code>'This is a cat'</code>)</li>
</ul>
<p>There are a few functions we can use to interrogate data in R to determine its type:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">typeof</span>() <span class="co"># what is its atomic type?</span>
<span class="kw">is.logical</span>() <span class="co"># is it TRUE/FALSE data?</span>
<span class="kw">is.numeric</span>() <span class="co"># is it numeric?</span>
<span class="kw">is.integer</span>() <span class="co"># is it an integer?</span>
<span class="kw">is.complex</span>() <span class="co"># is it complex number data?</span>
<span class="kw">is.character</span>() <span class="co"># is it character data?</span></code></pre></div>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h4 id="challenge-1-data-types"><span class="glyphicon glyphicon-pencil"></span>Challenge 1: Data types</h4>
</div>
<div class="panel-body">
<p>Use your knowledge of how to assign a value to a variable, to create examples of data with the following characteristics:</p>
<ol style="list-style-type: decimal">
<li>Variable name: ‘answer’, Type: logical</li>
<li>Variable name: ‘height’, Type: numeric</li>
<li>Variable name: ‘dog_name’, Type: character</li>
</ol>
<p>For each variable you’ve created, test that it has the data type you intended. Do you find anything unexpected?</p>
</div>
</section>
<h4 id="data-structures">Data Structures</h4>
<p>There are five data structures you will commonly encounter in R. These are:</p>
<ul>
<li>vector</li>
<li>factor</li>
<li>list</li>
<li>matrix</li>
<li>data.frame</li>
</ul>
<h3 id="vectors">Vectors</h3>
<p>A vector is the most common and basic data structure and is pretty much the workhorse of <code>R</code>. It is sometimes referred to as atomic vector, because they can <strong>only contain one data type</strong>. Vectors are the building blocks of every other data structure.</p>
<p>A vector can contain any of the five types we introduced before:</p>
<ul>
<li>logical (e.g., <code>TRUE</code>, <code>FALSE</code>)</li>
<li>integer (e.g., <code>2L</code>, <code>as.integer(3)</code>)</li>
<li>numeric (real or decimal) (e.g, <code>2</code>, <code>2.0</code>, <code>pi</code>)</li>
<li>complex (e.g, <code>1 + 0i</code>, <code>1 + 4i</code>)</li>
<li>character (e.g, <code>"a"</code>, <code>"swc"</code>)</li>
</ul>
<aside class="callout panel panel-info">
<div class="panel-heading">
<h4 id="tip-character-vectors"><span class="glyphicon glyphicon-pushpin"></span>Tip: “Character Vectors”</h4>
</div>
<div class="panel-body">
<p>You will sometimes hear the term “character vector”, especially in warning or error messages. This is a somewhat confusing and unfortunate name. Remember that the type “character” really means some text wrapped in quotation symbols.</p>
</div>
</aside>
<p>Create an empty vector with <code>vector()</code> or by using the concatenate function, <code>c()</code>.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">x <-<span class="st"> </span><span class="kw">vector</span>()
x</code></pre></div>
<pre class="output"><code>logical(0)
</code></pre>
<p>So by default, it creates an empty vector (i.e. a length of 0) of type “logical”.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">x <-<span class="st"> </span><span class="kw">vector</span>(<span class="dt">length =</span> <span class="dv">10</span>) <span class="co"># with a predefined length</span>
x</code></pre></div>
<pre class="output"><code> [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
</code></pre>
<p>If we count the number of FALSEs there should be 10.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">x <-<span class="st"> </span><span class="kw">vector</span>(<span class="st">"character"</span>, <span class="dt">length =</span> <span class="dv">10</span>) <span class="co"># with a predefined length and type</span>
x</code></pre></div>
<pre class="output"><code> [1] "" "" "" "" "" "" "" "" "" ""
</code></pre>
<p>Or we can use the concatenate function to combine any values we like into a vector (so long as they’re the same atomic type!).</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">x <-<span class="st"> </span><span class="kw">c</span>(<span class="dv">10</span>, <span class="dv">12</span>, <span class="dv">45</span>, <span class="dv">33</span>)
x</code></pre></div>
<pre class="output"><code>[1] 10 12 45 33
</code></pre>
<p>You can also create vectors as sequence of numbers</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">series <-<span class="st"> </span><span class="dv">1</span>:<span class="dv">10</span>
series</code></pre></div>
<pre class="output"><code> [1] 1 2 3 4 5 6 7 8 9 10
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">seq</span>(<span class="dv">10</span>)</code></pre></div>
<pre class="output"><code> [1] 1 2 3 4 5 6 7 8 9 10
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">seq</span>(<span class="dv">1</span>, <span class="dv">10</span>, <span class="dt">by =</span> <span class="fl">0.1</span>)</code></pre></div>
<pre class="output"><code> [1] 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3
[15] 2.4 2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7
[29] 3.8 3.9 4.0 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0 5.1
[43] 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6.0 6.1 6.2 6.3 6.4 6.5
[57] 6.6 6.7 6.8 6.9 7.0 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9
[71] 8.0 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 9.0 9.1 9.2 9.3
[85] 9.4 9.5 9.6 9.7 9.8 9.9 10.0
</code></pre>
<aside class="callout panel panel-info">
<div class="panel-heading">
<h4 id="tip-creating-integers"><span class="glyphicon glyphicon-pushpin"></span>Tip: Creating integers</h4>
</div>
<div class="panel-body">
<p>When you combine numbers using the concatenate function <code>c()</code>, the type of the values stored will automatically become “numeric”, which is real/decimal numbers. If you specifically want to create a vector of integers (whole numbers only), you need to append each number with an L, i.e. <code>c(10L, 12L, 45L, 33L)</code>.</p>
</div>
</aside>
<p>You can also use the concatenate function to add elements to a vector:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">x <-<span class="st"> </span><span class="kw">c</span>(x, <span class="dv">57</span>)
x</code></pre></div>
<pre class="output"><code>[1] 10 12 45 33 57
</code></pre>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h4 id="challenge-2"><span class="glyphicon glyphicon-pencil"></span>Challenge 2</h4>
</div>
<div class="panel-body">
<p>Vectors can only contain one atomic type. If you try to combine different types, R will create a vector that is the least common denominator: the type that is easiest to coerce to.</p>
<p><strong>Guess what the following do without running them first:</strong></p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">xx <-<span class="st"> </span><span class="kw">c</span>(<span class="fl">1.7</span>, <span class="st">"a"</span>)
xx <-<span class="st"> </span><span class="kw">c</span>(<span class="ot">TRUE</span>, <span class="dv">2</span>)
xx <-<span class="st"> </span><span class="kw">c</span>(<span class="st">"a"</span>, <span class="ot">TRUE</span>)</code></pre></div>
</div>
</section>
<p>This is called implicit coercion.</p>
<p>The coercion rule goes <code>logical</code> -> <code>integer</code> -> <code>numeric</code> -> <code>complex</code> -> <code>character</code>.</p>
<p>You can also coerce vectors explicitly using the <code>as.<class_name></code>. Example</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">as.numeric</span>()</code></pre></div>
<pre class="output"><code>numeric(0)
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">as.character</span>()</code></pre></div>
<pre class="output"><code>character(0)
</code></pre>
<p>R will try to do whatever makes the most sense for that value:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">as.character</span>(x)</code></pre></div>
<pre class="output"><code>[1] "10" "12" "45" "33" "57"
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">as.complex</span>(x)</code></pre></div>
<pre class="output"><code>[1] 10+0i 12+0i 45+0i 33+0i 57+0i
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">x <-<span class="st"> </span><span class="dv">0</span>:<span class="dv">6</span>
<span class="kw">as.logical</span>(x)</code></pre></div>
<pre class="output"><code>[1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE
</code></pre>
<p>It is common in many programming languages for 0 to represent FALSE, while every other number is treated as TRUE. Sometimes coercions, especially nonsensical ones won’t work.</p>
<p>In some cases, R won’t be able to do anything sensible:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">x <-<span class="st"> </span><span class="kw">c</span>(<span class="st">"a"</span>, <span class="st">"b"</span>, <span class="st">"c"</span>)
<span class="kw">as.numeric</span>(x)</code></pre></div>
<pre class="output"><code>Warning: NAs introduced by coercion
</code></pre>
<pre class="output"><code>[1] NA NA NA
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">as.logical</span>(x)</code></pre></div>
<pre class="output"><code>[1] NA NA NA
</code></pre>
<p>In both cases, a vector of “NAs” was returned, and in the first case so was a warning.</p>
<aside class="callout panel panel-info">
<div class="panel-heading">
<h4 id="tip-special-objects"><span class="glyphicon glyphicon-pushpin"></span>Tip: Special Objects</h4>
</div>
<div class="panel-body">
<p>“NA” is a special object in R which denotes a missing value. NA can occur in any type of vector. There are a few other types of special objects: <code>Inf</code> denotes infinity (can be positive or negative), while <code>NaN</code> means Not a number, an undefined value (i.e. <code>0/0</code>). <code>NULL</code> denotes that the data structure doesn’t exist (but can occur in list elements).</p>
</div>
</aside>
<p>You can ask questions about the structure of vectors:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">x <-<span class="st"> </span><span class="dv">0</span>:<span class="dv">10</span>
<span class="kw">tail</span>(x, <span class="dt">n=</span><span class="dv">2</span>) <span class="co"># get the last 'n' elements</span></code></pre></div>
<pre class="output"><code>[1] 9 10
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">head</span>(x, <span class="dt">n=</span><span class="dv">1</span>) <span class="co"># get the first 'n' elements</span></code></pre></div>
<pre class="output"><code>[1] 0
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">length</span>(x)</code></pre></div>
<pre class="output"><code>[1] 11
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">str</span>(x)</code></pre></div>
<pre class="output"><code> int [1:11] 0 1 2 3 4 5 6 7 8 9 ...
</code></pre>
<p>Vectors can be named:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">x <-<span class="st"> </span><span class="dv">1</span>:<span class="dv">4</span>
<span class="kw">names</span>(x) <-<span class="st"> </span><span class="kw">c</span>(<span class="st">"a"</span>, <span class="st">"b"</span>, <span class="st">"c"</span>, <span class="st">"d"</span>)
x</code></pre></div>
<pre class="output"><code>a b c d
1 2 3 4
</code></pre>
<aside class="callout panel panel-info">
<div class="panel-heading">
<h4 id="advanced-tip-for-programmers"><span class="glyphicon glyphicon-pushpin"></span>Advanced Tip for Programmers</h4>
</div>
<div class="panel-body">
<p>If you’re coming from other programming languages you might recognise this as a useful tool akin to dictionaries and hash tables. This is true for small vectors, but for true hash table functionality, you should use the environment object. See <code>?new.env</code>.</p>
</div>
</aside>
<h4 id="matrices">Matrices</h4>
<p>Another data structure you’ll likely encounter are matrices. Underneath the hood, they are really just atomic vectors, with added dimension attributes.</p>
<p>We can create one with the <code>matrix</code> function. Let’s generate some random data:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">set.seed</span>(<span class="dv">1</span>) <span class="co"># make sure the random numbers are the same for each run</span>
x <-<span class="st"> </span><span class="kw">matrix</span>(<span class="kw">rnorm</span>(<span class="dv">18</span>), <span class="dt">ncol=</span><span class="dv">6</span>, <span class="dt">nrow=</span><span class="dv">3</span>)
x</code></pre></div>
<pre class="output"><code> [,1] [,2] [,3] [,4] [,5] [,6]
[1,] -0.6264538 1.5952808 0.4874291 -0.3053884 -0.6212406 -0.04493361
[2,] 0.1836433 0.3295078 0.7383247 1.5117812 -2.2146999 -0.01619026
[3,] -0.8356286 -0.8204684 0.5757814 0.3898432 1.1249309 0.94383621
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">str</span>(x)</code></pre></div>
<pre class="output"><code> num [1:3, 1:6] -0.626 0.184 -0.836 1.595 0.33 ...
</code></pre>
<p>You can use <code>rownames</code>, <code>colnames</code>, and <code>dimnames</code> to set or retrieve the column and rownames of a matrix. The functions <code>nrow</code> and <code>ncol</code> will tell you the number of rows and columns, while <code>length</code> will tell you the number of elements.</p>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h4 id="challenge-3"><span class="glyphicon glyphicon-pencil"></span>Challenge 3</h4>
</div>
<div class="panel-body">
<p>What do you think will be the result of <code>length(x)</code>? Try it. Were you right? Why / why not?</p>
</div>
</section>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h4 id="challenge-4"><span class="glyphicon glyphicon-pencil"></span>Challenge 4</h4>
</div>
<div class="panel-body">
<p>Make another matrix, this time containing the numbers from 1 to 50, with 5 columns and 10 rows. Did the <code>matrix</code> function fill your matrix by column, or by row, as its default behaviour? See if you can figure out how to change this by reading the documentation for <a href="https://stat.ethz.ch/R-manual/R-devel/library/base/html/matrix.html">matrix</a></p>
</div>
</section>
<h4 id="factors">Factors</h4>
<p>Factors are special vectors that represent categorical data. Factors can be ordered or unordered and are important for modeling functions, such as <code>aov()</code>, <code>lm()</code> and <code>glm()</code>, or for plot functions.</p>
<p>Factors can only contain predefined values, and we can create one by calling the function <code>factor</code>.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">x <-<span class="st"> </span><span class="kw">factor</span>(<span class="kw">c</span>(<span class="st">"yes"</span>, <span class="st">"no"</span>, <span class="st">"no"</span>, <span class="st">"yes"</span>, <span class="st">"yes"</span>))
x</code></pre></div>
<pre class="output"><code>[1] yes no no yes yes
Levels: no yes
</code></pre>
<p>So we can see that the output is very similar to a character vector, but with an attached levels component. This becomes clearer when we look at its structure:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">str</span>(x)</code></pre></div>
<pre class="output"><code> Factor w/ 2 levels "no","yes": 2 1 1 2 2
</code></pre>
<p>This reveals something important: while factors look (and often behave) like character vectors, they are actually integers under the hood, and here, we can see that “no” is represented by a 1, and “yes” a 2.</p>
<p>In modeling functions, it is paramount to know what the baseline level is. This is the first factor, but by default the ordering is determined by alphabetical order of words entered. You can change this by specifying the levels:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">x <-<span class="st"> </span><span class="kw">factor</span>(<span class="kw">c</span>(<span class="st">"case"</span>, <span class="st">"control"</span>, <span class="st">"control"</span>, <span class="st">"case"</span>), <span class="dt">levels =</span> <span class="kw">c</span>(<span class="st">"control"</span>, <span class="st">"case"</span>))
<span class="kw">str</span>(x)</code></pre></div>
<pre class="output"><code> Factor w/ 2 levels "control","case": 2 1 1 2
</code></pre>
<p>In this case, we’ve explicitly told R that “control” should represented by 1, and “case” by 2. This designation can be very important for interpreting the results of statistical models!</p>
<h4 id="lists">Lists</h4>
<p>If you want to combine different types of data, you will need to use lists. Lists act as containers for any type of data structure, even themselves!</p>
<p>Lists can be created using the function <code>list</code> or coerced from other objects using <code>as.list()</code>:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">x <-<span class="st"> </span><span class="kw">list</span>(<span class="dv">1</span>, <span class="st">"a"</span>, <span class="ot">TRUE</span>, <span class="dv">1</span>+4i)
x</code></pre></div>
<pre class="output"><code>[[1]]
[1] 1
[[2]]
[1] "a"
[[3]]
[1] TRUE
[[4]]
[1] 1+4i
</code></pre>
<p>Each element of the list is denoted by a <code>[[</code> in the output. Inside each list element is an atomic vector of length one containing</p>
<p>Lists can contain more complex objects:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">xlist <-<span class="st"> </span><span class="kw">list</span>(<span class="dt">a =</span> <span class="st">"Research Bazaar"</span>, <span class="dt">b =</span> <span class="dv">1</span>:<span class="dv">10</span>, <span class="dt">data =</span> <span class="kw">head</span>(iris))
xlist</code></pre></div>
<pre class="output"><code>$a
[1] "Research Bazaar"
$b
[1] 1 2 3 4 5 6 7 8 9 10
$data
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
</code></pre>
<p>In this case our list contains a character vector of length one, a numeric vector with 10 entries, and a small data frame from one of R’s many preloaded datasets (see <code>?data</code>). We’ve also given each list element a name, which is why you see <code>$a</code> instead of <code>[[1]]</code>.</p>
<p>Lists can also contain themselves:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">list</span>(<span class="kw">list</span>(<span class="kw">list</span>(<span class="kw">list</span>())))</code></pre></div>
<pre class="output"><code>[[1]]
[[1]][[1]]
[[1]][[1]][[1]]
list()
</code></pre>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h4 id="challenge-5"><span class="glyphicon glyphicon-pencil"></span>Challenge 5</h4>
</div>
<div class="panel-body">
<p>Create a list of length two containing a character vector for each of the sections in this part of the workshop:</p>
<ul>
<li>Data types</li>
<li>Data structures</li>
</ul>
<p>Populate each character vector with the names of the data types and data structures we’ve seen so far.</p>
</div>
</section>
<p>Lists are extremely useful inside functions. You can “staple” together lots of different kinds of results into a single object that a function can return. In fact many R functions which return complex output store their results in a list.</p>
<h2 id="challenge-solutions">Challenge solutions</h2>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h4 id="solution-to-challenge-1-data-types"><span class="glyphicon glyphicon-pencil"></span>Solution to challenge 1: Data types</h4>
</div>
<div class="panel-body">
<p>Use your knowledge of how to assign a value to a variable, to create examples of data with the following characteristics:</p>
<ol style="list-style-type: decimal">
<li>Variable name: ‘answer’, Type: logical</li>
<li>Variable name: ‘height’, Type: numeric</li>
<li>Variable name: ‘dog_name’, Type: character</li>
</ol>
<p>For each variable you’ve created, test that it has the data type you intended. Do you find anything unexpected?</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">answer <-<span class="st"> </span><span class="ot">TRUE</span>
height <-<span class="st"> </span><span class="dv">150</span>
dog_name <-<span class="st"> "Snoopy"</span>
<span class="kw">is.logical</span>(answer)</code></pre></div>
<pre class="output"><code>[1] TRUE
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">is.numeric</span>(height)</code></pre></div>
<pre class="output"><code>[1] TRUE
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">is.character</span>(dog_name)</code></pre></div>
<pre class="output"><code>[1] TRUE
</code></pre>
</div>
</section>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h4 id="solution-to-challenge-2"><span class="glyphicon glyphicon-pencil"></span>Solution to challenge 2</h4>
</div>
<div class="panel-body">
<p>Vectors can only contain one atomic type. If you try to combine different types, R will create a vector that is the least common denominator: the type that is easiest to coerce to.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">xx <-<span class="st"> </span><span class="kw">c</span>(<span class="fl">1.7</span>, <span class="st">"a"</span>)
xx</code></pre></div>
<pre class="output"><code>[1] "1.7" "a"
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">typeof</span>(xx)</code></pre></div>
<pre class="output"><code>[1] "character"
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">xx <-<span class="st"> </span><span class="kw">c</span>(<span class="ot">TRUE</span>, <span class="dv">2</span>)
xx</code></pre></div>
<pre class="output"><code>[1] 1 2
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">typeof</span>(xx)</code></pre></div>
<pre class="output"><code>[1] "double"
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">xx <-<span class="st"> </span><span class="kw">c</span>(<span class="st">"a"</span>, <span class="ot">TRUE</span>)
xx</code></pre></div>
<pre class="output"><code>[1] "a" "TRUE"
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">typeof</span>(xx)</code></pre></div>
<pre class="output"><code>[1] "character"
</code></pre>
</div>
</section>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h4 id="solution-to-challenge-3"><span class="glyphicon glyphicon-pencil"></span>Solution to challenge 3</h4>
</div>
<div class="panel-body">
<p>What do you think will be the result of <code>length(x)</code>?</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">x <-<span class="st"> </span><span class="kw">matrix</span>(<span class="kw">rnorm</span>(<span class="dv">18</span>), <span class="dt">ncol=</span><span class="dv">6</span>, <span class="dt">nrow=</span><span class="dv">3</span>)
<span class="kw">length</span>(x)</code></pre></div>
<pre class="output"><code>[1] 18
</code></pre>
<p>Because a matrix is really just a vector with added dimension attributes, <code>length</code> gives you the total number of elements in the matrix.</p>
</div>
</section>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h4 id="solution-to-challenge-4"><span class="glyphicon glyphicon-pencil"></span>Solution to challenge 4</h4>
</div>
<div class="panel-body">
<p>Make another matrix, this time containing the numbers 1:50, with 5 columns and 10 rows. Did the <code>matrix</code> function fill your matrix by column, or by row, as its default behaviour? See if you can figure out how to change this. (hint: read the documentation for <code>matrix</code>!)</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">x <-<span class="st"> </span><span class="kw">matrix</span>(<span class="dv">1</span>:<span class="dv">50</span>, <span class="dt">ncol=</span><span class="dv">5</span>, <span class="dt">nrow=</span><span class="dv">10</span>)
x <-<span class="st"> </span><span class="kw">matrix</span>(<span class="dv">1</span>:<span class="dv">50</span>, <span class="dt">ncol=</span><span class="dv">5</span>, <span class="dt">nrow=</span><span class="dv">10</span>, <span class="dt">byrow =</span> <span class="ot">TRUE</span>) <span class="co"># to fill by row</span></code></pre></div>
</div>
</section>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h4 id="solution-to-challenge-5"><span class="glyphicon glyphicon-pencil"></span>Solution to challenge 5</h4>
</div>
<div class="panel-body">
<p>Create a list of length two containing a character vector for each of the sections in this part of the workshop:</p>
<ul>
<li>Data types</li>
<li>Data structures</li>
</ul>
<p>Populate each character vector with the names of the data types and data structures we’ve seen so far.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">my_list <-<span class="st"> </span><span class="kw">list</span>(
<span class="dt">data_types =</span> <span class="kw">c</span>(<span class="st">"logical"</span>, <span class="st">"integer"</span>, <span class="st">"double"</span>, <span class="st">"complex"</span>, <span class="st">"character"</span>),
<span class="dt">data_structures =</span> <span class="kw">c</span>(<span class="st">"vector"</span>, <span class="st">"matrix"</span>, <span class="st">"factor"</span>, <span class="st">"list"</span>)
)</code></pre></div>
</div>
</section>
</div>
</div>
</article>
<div class="footer">
<a class="label swc-blue-bg" href="http://software-carpentry.org">Software Carpentry</a>
<a class="label swc-blue-bg" href="https://github.com/swcarpentry/lesson-template">Source</a>
<a class="label swc-blue-bg" href="mailto:admin@software-carpentry.org">Contact</a>
<a class="label swc-blue-bg" href="LICENSE.html">License</a>
</div>
</div>
<!-- Javascript placed at the end of the document so the pages load faster -->
<script src="http://software-carpentry.org/v5/js/jquery-1.9.1.min.js"></script>
<script src="css/bootstrap/bootstrap-js/bootstrap.js"></script>
</body>
</html>