-
Notifications
You must be signed in to change notification settings - Fork 1
/
index.Rmd
1416 lines (1018 loc) · 42.1 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Introduction to R: Tutorial"
output:
learnr::tutorial:
progressive: true
allow_skip: true
runtime: shiny_prerendered
---
```{r setup, include=FALSE}
library("shiny")
library("learnr")
knitr::opts_chunk$set(echo = FALSE)
tutorial_options(exercise.cap = "Code", exercise.eval = FALSE, exercise.timelimit = NULL, exercise.lines = 3, exercise.checker = FALSE, exercise.completion = TRUE, exercise.diagnostics = TRUE, exercise.startover = TRUE)
```
## INTRODUCTION
### About this tutorial
This Introduction to R e-learning tutorial is a part of Natural England's training material on using the R programming langauge to undertake statistical analysis. This tutorial will help you to understand the syntax and concepts of the R language. After completing the course you should be able to:
* understand the main data types in R: vector, matrix, data frame.
* be able to import data from common file formats
* be able to work with functions and know where to find information on new functions
* be able to manipulate data with mathematical and logical operators
* undertake simple statistical analysis on your data
* make basic graphs
This will give you the grounding to follow the statistical case study examples that are being developed and to apply the techniques to your own data.
(Last updated: 26 October 2017)
###Target audience
The course is suitable for anyone who wants to be able to undertake statistical analysis themselves.
## How to use this tutorial
This tutorial is interactive. It includes some explanation, but will also ask you to discover how **R** works by executing **R** code in exercises, or through quizzes. You will be able to try out your quiz answers before you submit your solution.
The content of each section will be progressively revealed: Click **Continue** or **Next topic** to move forward. You will get most out of the training if you complete all the exercises.
Whilst you may wish to [install **R** and **RStudio**](http://trim/HPRMWebClientClassic/download/?uri=3571239) prior to starting this training course so that you can apply what you have learnt as you go along, having the software installed is not essential to running the interactive tutorials.
You may find it helpful to view the introductory video to R and R Studio before you start this tutorial: [Overview of R and R Studio](need link) (Duration 20 minutes?)
##Understanding code formatting
This tutorial includes **R** code which is always formatted like this: `this is what R code looks like` or as a separate chunk of code (which looks more like it does in RStudio):
```{r echo=TRUE, eval=FALSE}
This is what an R code chunk looks like.
```
The output which R generates is presented like this:
```{r}
print("This is what R output looks like.")
```
Some excercises require you to write code. Here's some example code. Pressing **Run Code** will make R read and execute the code. Try it now:
```{r intro, exercise=TRUE}
print("Pirates say R")
```
In **R**, any text to the right of a hash `#` is ignored by the **R** programme. It is used to write **comments** addressed to the human reading the code.
```{r, echo=TRUE}
#This is a comment. R will print it, but ignore it.
#pssst: R is a silly name for a programme. Don't tell R.
```
Click **Next topic** to view the contents of this tutorial.
##CONTENTS
* Section 1: [R data types](#section-section-1-r-data-types) (xx minutes)
* Section 2: [Importing data](#section-tutorial-2-importing-data) (xx minutes)
* Section 3: [Functions and operators](#section-tutorial-3-functions-and-operators) (xx minutes)
* Section 4: [Summary statistics](#section-tutorial-4-summary-statistics) (xx minutes)
* Section 5: [Plots and charts](#section-graphs) (xx minutes)
Click **Next topic** to read on.
## SECTION 1: R DATA TYPES
This session should take about **???** minutes to complete. It introduces the ways in which R handles data, namely:
* Basic types (numeric, character and logical)
* Data structures (vectors, matrices and data frames)
* Handling missing values
###
Lets look at **basic types** first.
## Basic types
The basic object type in **R** is called a **vector**, which contains a list of values (e.g. a measurement, species name, presence/absence). In **R** values will normally be one of the following types:
* Numeric
* Character
* Logical
* Factors
###
###Numeric values
Any value which consists of only numbers is automatically treated as **numeric** data in **R**.
The following code creates a variable called `height` which has a value of `180`.
```{r echo=T}
height <- 180
```
###
Typing the name of a variable displays its value.
```{r echo=T}
height
```
###
In **R** the `c` function is used to **combine** several values into a vector.
```{r numeric.values-setup, echo=T}
bird.counts <- c(15, 700, 300, 120)
```
Run the following code to display the values held in `bird.counts`.
```{r numeric.values, exercise = TRUE}
bird.counts
```
```{r wazzup-setup}
wazzup <- print("wazzup?")
```
```{r wazzup, exercise = TRUE}
wazzup
```
Tip: In **R** the full stop `.` character can be used as a separator in variable names instead of using an underscore `_` character.
###Character values
Any value which contains text is automatically treated as being **character** data. You tell R something is a character by enclosing it in double `"char"` or single `'char'` quotes.
species: `"bellis perennis", "columba livia"`
###Logical values
A logical value can only be either `TRUE` or `FALSE`. **R** automatically recognises those two words as logical values, but can also treat other values as logical (e.g. `0` and `1`, `T` and `F`).
feature present: `TRUE, FALSE, FALSE, TRUE`
###Quiz
```{r quiz1, echo=FALSE}
quiz(caption = "Quiz: Data types",
question("Which type of value is `10`. Tick all that apply.",
answer("character"),
answer("numeric", correct = TRUE),
answer("logical"),
incorrect = "Incorrect: `10` is a numeric value."
),
question("Which of these values does R treat as being character data?",
answer("`\"favourable condition\"`", correct = TRUE),
answer("`2017`"),
answer("`\"2017\"`", correct = TRUE),
answer("`TRUE`"),
incorrect = "Incorrect: **\"favourable condition\"** is treated as character. `2017` is treated as a numeric, however when put in quotes `\"2017\"` is treated as a character."
)
)
```
###Factors
We have not yet encountered **categorical** data. In **R** such data are known as **factors**.
Data is categorical if its values belong to a collection of known, defined and non-overlapping classes (referred to as **levels**)
###
Common examples might be:
* species lists
* habitat types
* survey squares
* etc.
Factors may contain numeric, characters or logical values.
>taxonomic group: `"brophytes", "graminoids", "vascular plants","graminoids"`
Levels: `brophytes, graminoids, vascular plants`
>species present: `TRUE, FALSE, FALSE, TRUE, FALSE`
Levels: `FALSE, TRUE`
>abundance `5,0,0,1,0`
Levels: `0,1,5`
##Vector
###What are vectors?
In **R** vectors are sequences of values. The values in a vector must all be of the same basic type.
Vectors can be:
* numeric, e.g. `c(9.2, 33.7, 27.4, 14.9)`
* logical, e.g. `c(TRUE, TRUE, FALSE)`
* character, e.g. `c("mud","sand","shingle")`
* factors, e.g. `factor(c("quadrat1","quadrat1","quadrat2","quadrat2")`
Vectors are the basic building blocks of how data are stored in R. For example, if you import a table of data from Excel into R each column of data is stored as a vector.
You can make a vector by *combining* a sequence of values separated with a comma. The combine function in **R** is `c()` and each value must be separated with a comma `,`.
###
Click **Run Code** to see how it works with each of the main vector types:
Numeric vector:
```{r vec-eg-1, exercise=TRUE, exercise.eval=FALSE, exercise.lines=1}
c(9.2, 33.7, 27.4, 14.9)
```
###
Logical vector:
```{r vec-eg-2, exercise=TRUE, exercise.eval=FALSE, exercise.lines=1}
c(TRUE, TRUE, FALSE)
```
###
Categorical vector:
```{r vec-eg-3, exercise=TRUE, exercise.eval=FALSE, exercise.lines=1}
factor(c("Male","Female"))
```
<!--
## Extracting values from vectors
If you want to access specific values from a vector you use square brackets. For example, to access the first value from `water.table.depth` you would use:
```{r echo=T}
#water.table.depth[1]
```
To access a range of values you use a colon `:`. For example, to access the first three values from water.table.depth you would use:
```{r echo=T}
#water.table.depth[1:3]
```
-->
###Generating data: sequence and repeat functions
There are often occasions where you need to generate a sequence of numbers.
###
For example: we want to survey a 500m transect at 100m intervals, going east from a given grid reference. We need:
* five unique plot numbers
* five grid eastings, 100m apart
* five identical grid northings
###
For the plot numbers we can use two methods:
* the colon `:` operator:
```{r vec-eg-4, exercise=TRUE, exercise.eval=FALSE, exercise.lines=1}
1:5
```
###
* or the sequence `seq()` function (we will learn more about functions later)
```{r vec-eg-5, exercise=TRUE, exercise.eval=FALSE, exercise.lines=1}
seq(from=1, to=5)
```
###
**R** allows us to drop the `from=` and `to=` parameter labels to shorten the code to:
```{r vec-eg-6, exercise=TRUE, exercise.eval=FALSE, exercise.lines=1}
seq(1, 5)
```
###
For the grid eastings we can also use the `seq()` function, but this time we have to specify the size of the increments:
```{r vec-eg-7, exercise=TRUE, exercise.eval=FALSE, exercise.lines=1}
seq(512300, 512700, by=100)
```
###
Or we specify the length of the vector we want:
```{r vec-eg-8, exercise=TRUE, exercise.eval=FALSE, exercise.lines=1}
seq(512300, 512700, length.out=5)
```
###
Finally, for the grid northings, we can use the repeat `rep()` function, specifying first the value, and second the number of repetitions:
```{r vec-eg-9, exercise=TRUE, exercise.eval=FALSE, exercise.lines=1}
rep(245600, 5)
```
### Question
```{r q1}
quiz(
question("How would you create a vector containing a sequence of numbers from one to ten? (tick all answers that apply)",
answer("`1:10`", correct = T),
answer("`c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)`", correct = T),
answer("`rep(1,10)`"),
answer("`seq(1,10)`", correct = T),
incorrect= "Not quite. 1:10, c(1,2,3,4,5,6,7,8,9,10) and seq(1,10) will all create a vector containing a sequence of numbers from one to ten. However, rep(1,10) will create a vector containing the number 1 repeated 10 times."
)
)
```
Before you answer the question, use the empty code block below to try out the options. Press **Run Code** to see the results. Press **Hint** for more information.
```{r block-q1, exercise=T, exercise.eval=T}
```
<div id="block-q1-hint">
**Hint:**
The correct code should give an output that looks like this:
`[1] 1 2 3 4 5 6 7 8 9 10`
</div>
###
End of topic: **Vector**
## Variables
In **R** you give data a name to an object by using the `<-` assignment operator.
```{r variables1-setup, echo=T}
water.table.depth <- c(12.6, 13.9, 9.4, 7.6, 8.1)
```
The full stop `.` character can be used as a separator in variable names instead of using an underscore `_` character.
If you type the name of an **object** into **R** it will show the content of the object in the **console** window. Click **Run Code** to see how it works:
```{r variables1, exercise=TRUE, exercise.eval=FALSE}
water.table.depth
```
```{r variables2-setup, echo=TRUE}
water.table.depth <- c(12.6, 13.9, 9.4, 7.6, 8.1)
```
###
You can check what type of values the `water.table.depth` vector contains by asking **R** what **mode** it is:
```{r variables2, exercise=TRUE, exercise.eval=FALSE}
mode(water.table.depth)
```
The values are numeric.
###
End of topic: **Variables**
##Matrix
###What is a matrix?
A **matrix** consists of two or more *vectors* which must:
* have the same length; and
* contain the same type of data.
###Example: pitfall traps
We have four [pitfall traps](http://en.wikipedia.org/wiki/Pitfall_trap) and have counted how many invertebrates of two species are found in each. We found the following numbers of species in the traps:
*Species A*: 11,19,33,12
*Species B*: 9, 33, 27, 14
###The bind functions
We can **bind** these two **vectors** into a **matrix**:
```{r echo=TRUE}
trap.count <- cbind(c(11,19,33,12), c(9,33,27,14))
```
###
The `cbind` (**c**olumn **bind**) function combind the vectors into a matrix. You could also use `rbind` (**r**ow **bind**).
###Matrix function
We can also coerce a single vector into a matrix by using the **matrix** function. For this you need to specify the vector, and either:
* the number of rows `nrow=` or
* the number of columns `ncol=`
which the matrix should have.
###Example: weather records
You have recorded monthly mean temperature at Thursley NNR for five years and have stored them into the vector `temp`:
```{r, echo=TRUE}
temp <- c(1.27, 3.68, 6.43, 9.32, 11.06, 16.14, 18.38, 16.16, 14.03, 10.85, 5.71, -0.01, 4.42, 6.66, 6.98, 12.62, 12.89, 14.72, 15.74, 16.31, 15.74, 12.96, 10.06, 6.44, 6.00, 3.81, 8.50, 7.90, 12.89, 14.52, 16.12, 17.38, 13.73, 10.38, 6.99, 5.35, 3.93, 3.25, 3.53, 7.91, 10.80, 14.50, 19.10, 17.70,
14.33, 12.89, 6.61, 6.26, 6.28, 6.97, 8.16, 10.62, 12.63, 15.94, 18.68, 15.86, 15.74, 13.31, 8.92, 5.40)
```
###
You want to put these into a matrix where each row represents a month and each column a year. You know they are in order,starting with January of the first year.
###
First we will try specifying the number of rows (12 months = 12 columns). Click **Run Code** to see how it works:
```{r weath-ex-1, exercise=TRUE, exercise.eval=FALSE}
matrix(temp, nrow= 12)
```
###
Next we will do it by specifying the number of columns (5 years = 5 columns). Click **Run Code** to see how it works:
```{r weath-ex-2, exercise=TRUE, exercise.eval=FALSE}
matrix(temp, ncol= 5)
```
###
The resulting matrices are identical to each other.
###Exercise: another pitfall trap
You have inspected three pitfall traps and are recording the number of spiders and the number of ground beetles in each. You want to create the following matrix:
```{r}
cbind(c(1,2,3),c(4,5,6))
```
###
```{r trap-ex-3}
quiz(
question("How can you combine the data into a matrix like the one above? (tick all answers that apply)",
answer("matrix(c(1:6))", correct = F),
answer("cbind(c(1,2,3),c(4,5,6))", correct = T),
answer("rbind(c(1,4),c(2,5),c(3,6))", correct = T),
answer("matrix(c(1,2,3,4,5,6), nrow=3)", correct = T)
)
)
```
Before you answer the question, use the empty code block below to try out the options. Press **Run Code** to see the results. Press **Hint** for more information.
```{r block-trap-ex-3, exercise=T, exercise.eval=T}
```
<div id="block-trap-ex-3-hint">
Sorry, no hint this time. Try out all the solutions and see which works.
</div>
###Solution
* *Incorrect*: `matrix(c(1:6))` It is right to use the matrix function, but you have to specify the number of rows `nrow` or columns `ncol`.
* *Correct*: `cbind(c(1,2,3),c(4,5,6))` You can use column bind to combine the two column vectors.
* *Correct*: `rbind(c(1,4),c(2,5),c(3,6))` You can use row bind to combine the three row vectors.
* *Correct*: `matrix(c(1,2,3,4,5,6), nrow=3)` YOu can use the matrix function and specify the number of rows.
###
End of topic **Matrix**
## Data Frame
###What is a data frame?
A data frame consists of two or more **vectors** of the **same length**. Each column can contain **different types of data**. It is the preferred way of storing tabular data in R. It is also the default way data from Excel spreadsheets is imported into **R**.
###Example: vegetation quadrat
We have surveyed a 1m by 1m vegetation quadrat on a heathland. We have estimated the area which each plant species covers in percent. These are our results:
plant | cover
:--- |:----
shrubs | 60%
grasses | 25%
mosses | 30%
flowers | 5%
###
We can't combine this into matrix because *plants* and *cover* are different types of data.
*Plants* is **character** data
*Cover* is **numeric** data.
###
However, we can bind these two vectors into a **data frame**:
```{r echo=TRUE}
quadrat <- data.frame(plants = c("shrubs", "grasses", "mosses", "flowers"),
cover = c(60, 25, 30, 5))
```
###
The `data.frame` function combines the vectors into a data frame. You could also use `cbind` and **R** would automatically choose the `data.frame` function because a matrix is impossible in this case.
###
We have already encountered the **assignment operator** `<-` which tells **R** to save the results of the function `data.frame` into a new object. Here, the object we have created is `quadrat`.
### Recap: calling objects
You can see the value of an object by typing its name and running R.
Type the code to show the contents of the data frame we have just created, then press **Run Code* to see if it is right.
```{r veg.ex-setup}
quadrat <- data.frame(plants = c("shrubs", "grasses", "mosses", "flowers"),
cover = c(60, 25, 30, 5))
```
```{r veg.ex, exercise=TRUE}
quadrat
```
```{r veg.ex-hint-1}
Type the name of the object and click Run Code
```
###Accessing data in data frames
You are likely to want to retrieve all or parts of the data in a data frame. There are a number of ways of doing this:
#### All the data
The name of the object returns all the data. E.g.:
```{r, echo=TRUE}
quadrat
```
#### single columns
The `$` sign is used to reference a column name. It returns a vector.
```{r, echo=TRUE}
quadrat$cover
```
#### Index referencing
Parts of a dataframe can be referenced by their **index**. Think of it as co-ordinates, where the first index is the row and the second the column. The index is enclosed in square brackets `[ROW, COLUMN ]`
###
A **single cell** is referenced by two numbers, and returns a vector of lenght one.
The following code returns the third row, second column:
```{r, echo=TRUE}
quadrat[3,2]
```
###
A **single row** is referenced by one number, leaving the column reference blank. It returns a data frame.
The following code returns a dataframe of the third row only:
```{r, echo=TRUE}
quadrat[3, ]
```
###
A **single column** is referenced by one number, leaving the row reference blank. It returns a vector.
The following code returns a vector of the second column:
```{r, echo=TRUE}
quadrat[2]
```
###
You can simplify this by leaving the row reference out entirely, and return a data frame of that column.
The following code returns a data frame of the second column:
```{r, echo=TRUE}
quadrat[2]
```
#### Referencing by name
Instead of specifing the number of a column, it can also be addressed by name.
The following code returns a data frame of the first column (`plants`):
```{r, echo=TRUE}
quadrat["plants"]
```
#### Referencing ranges
Ranges of rows or columns can also be addressed.
The following code returns the first three rows of the second column as a vector.
```{r, echo=TRUE}
quadrat[1:3,2]
```
###Exercise: extracting a column
You are asked to extract a species list from the quadrat you have surveyed.
```{r veg-ex-2}
quiz(
question("Which of the following will return a vector of species names? (tick all answers that apply)",
answer("`quadrat[1]`", correct = T),
answer("`quadrat[,1]`", correct = T),
answer("`quadrat.plants`", correct = F),
answer("`quadrat$plants`", correct = T),
answer("`quadrat[\"plants\"]`", correct = T)
)
)
```
Before you answer the question, use the empty code block below to try out the options. Press **Run Code** to see the results. Press **Hint** for more information.
```{r block.veg.ex.2-setup}
quadrat <- data.frame(plants = c("shrubs", "grasses", "mosses", "flowers"),
cover = c(60, 25, 30, 5))
```
```{r block.veg.ex.2, exercise=TRUE}
```
<div id="block-trap-ex-3-hint">
Sorry, no hint this time. Try out all the solutions and see which works.
</div>
###Solution
* *Correct*: `quadrat[1]` You can index a column by its number, without specifying any rows.
* *Correct*: `quadrat[,1]` You access values by specifying rows and columns: `[row , column]`. Here we left the row index blank, which returns all rows, and the first column.
* *Incorrect*: `quadrat.plants` **R** thinks this is an object called `quadrat.plants`: we haven't defined such an object.
* *Correct*: `quadrat$plants` The `$` character is used to reference a column name.
* *Correct*: `quadrat["plants"]` Here we have used a character string to reference a column by its name.
###
End of topic **Data Frame**
##Missing Values
Missing values occur all too frequently:
* a sensor might fail or a battery run out
* part of a site is temporarily inaccessible
* the surveyor's handwriting is hard to read
* someone forgot to write a measurement down!
In **R** 'Not available' or missing values are specified using `NA`
```{r echo=TRUE}
veg.heights <- c(44.2,22.7,NA,33.3)
veg.heights
```
###
This can cause problems, for instance when calculating the mean:
```{r echo=TRUE}
mean(veg.heights)
```
###
The mean cannot be calculated because the missing value is included. However, you can tell **R** to ignore missing values using the parameter:
`na.rm = TRUE` (it stands for *remove NA*)
###
```{r echo=TRUE}
mean(veg.heights, na.rm = TRUE)
```
It has calculated the mean.
###
end of topic **Missing Values**
## TUTORIAL 2: IMPORTING DATA
This session should take about **5** minutes to complete. It gives a brief introduction to importing data from a text file.
## The working directory
The working directory is the place R will look for any data, unless you tell it otherwise. So first you need to set the working directory:
###
`setwd("C:\\introR")`
Windows uses backslashes `\`: in **R** these must be written twice `\\` (because a single backslash has a special meaning in **R**).
###
`setwd("C:/introR")`
Alternatively you can used single forward slashes `/` for paths.
## Importing csv
Data is often collated from sources such as:
* field survey sheets
* sensor measurements
* third party data (e.g. weather records)
###Example
You want to understand the impact of grazing on a site, so you have measured the height of grasses, mosses and shrubs on five occasions in the last ten years. You have saved your measurements as
###
The easiest way of getting data into R is to read it from a text file, for example a file saved as comma separated values in Excel (.csv). To make it easy for a beginner to deal with, the .csv file should have:
* a single row at the top with header information
* data should be in columns
* observations should be in rows
* there should be no summary rows or columns
###The read.csv function
The **read.csv()** function can bring csv file data into **R**. The code needs to have the following components:
###
`read.csv()` The function which will interpret the file and read it into **R**
###
`"veg-heights.csv"` The name of the file you are importing. Note that it is in quotes `""`: this is because we want **R** to read it as a string of characters.
###
`header = TRUE` An **parameter** for the `read.csv()` function. We will learn more about parameters later.
###
`birds` A name for the **object** we will read the file into. If the object already exists **R** will overwrite it. If it doesn't exist, **R** will create it.
###
Here is the assembled line of code:
```{r, echo=TRUE, eval=FALSE}
veg.heights <- read.csv("veg-heights.csv", header=TRUE)
```
## Importing other formats
Data in text files is often separated by commas, but you may find other separators, most commonly tab separated values. For these we need a different function:
###
`read.table()` It works just like `read.csv()` but you need to specify the separator.
`sep = ""` An argument to specify the separator: here it specifies white space (including tabs).
###
```{r, echo=TRUE, eval=FALSE}
veg.heights <- read.table("veg-heights.txt", sep="")
```
###
By installing R extensions you can read data in other formats, such as Excel or spatial data from shapefiles.
## Viewing data in R Studio
Once you have read in a file, in R Studio it will appear in the Global Environment window in the top right hand side.
![RStudio's view of data that has been imported](images/IntroR_data_import.png){width="560"}
<!--
### Excercise: import some data
To complete this exercise you must first download the sample data `veg-heights.csv` and `veg-heights.txt` from [....] and save them to your C drive in a folder called "introR".
###
1 In line 1 of of the code box below, write code to set the working directory to C:/introR/
2 In line 2, Write the code to import the birds data. You can copy it from above.
3 In line 3, write code for displaying the data.
Then press **Run Code**. If you have done it correctly you should see the content of the birds dataset.
```{r import.ex, exercise=TRUE}
```
```{r import.ex-hint-1}
setwd("C:/introR")
```
```{r import.ex-hint-2}
veg.heights <- read.csv("veg-heights.csv", header=TRUE)
```
```{r import.ex-hint-3}
veg.heights
```
-->
###
end of topic **Importing data**
## TUTORIAL 3: FUNCTIONS AND OPERATORS
This session should take about **30** minutes to complete. It explains:
* how functions work
* how to find help
* mathematical operators
* operations on R objects
* logical operators
* relational operators
## How do functions work
Functions are crucial tools in any programming language. A function is a part of a computer program that performs some specific action.
We have already encountered some R functions, such as `c`, `rep`, `seq`, `data.frame()`, `read.csv()` and `read.txt()`.
Lets look at the components of a function we've already used:
###
![The components of a function](images/IntroR_functions.png){width="560"}
## How to find help
###
The following website provides a useful list of R's built in functions.
[Quick-R: Built-in Functions](http://www.statmethods.net/management/functions.html)
###
R has built in documentation to explain what functions do, and what their default parameters are.
Putting a question mark in front of a function name displays its help.
Try to find help for the `any` function:
```{r help.ex, exercise = TRUE}
```
```{r help.ex-hint}
?any
```
If you've done it correctly the help file will pop up in a new window. Click **Hint** above if you are struggling.
## Mathematical Operators
You can use **R** to carry out mathematical operations:
###
```{r results="asis"}
mathops <- cbind(Operator = c("+", "-", "*", "/", "^", "sqrt", "log"), Description = c("Addition", "Subtraction", "Multiplication", "Division", "Exponent", "Square root", "Natural logarithm"))
knitr::kable(mathops, caption = "Mathematical Operators")
```
###
**R** evaluates mathematical operations:
```{r echo = TRUE}
2*2
```
###
Normal arithmetic rules apply (if you are unsure, see [Order of Operations](http://en.wikipedia.org/wiki/Order_of_operations)):
```{r echo = TRUE}
2*1+2
```
###
... so use brackets if necessary:
```{r echo = TRUE}
2*(1+2)
```
### Exercise: mathematical operators
```{r mathops.ex.1}
quiz(
question("Which of these calculations equals **20**? (tick all answers that apply)",
answer("5+5*2", correct = F),
answer("(5+5)*2", correct = T),
answer("5+(5*2)", correct = F),
answer("sqrt(400)", correct = T)
)
)
```
Before you answer the question, use the empty code block below to try out the options. Press **Run Code** to see the results.
```{r block-mathops.ex.1, exercise=T, exercise.eval=T}
```
## Operations on R objects
Objects such as vectors and data frames can be manipulated with mathematical operations.
### Example: water table
We have measured the water table (in cm) on a site once a month for a year:
```{r echo=TRUE}
water.table <- c(20.4, 17.3, 22.5, 11.6, 3.6, 2.2, 4.6, 5.5, 12.4, 25.4, 17.3, 19.2)
```
###
We'd like to convert the readings from cm to metres, so we need to divide each value by 100. We can simply divide the entire object by 100:
```{r echo=TRUE}
water.table / 100
```
###
We repeat our measurements the following year:
```{r echo=TRUE}
water.table.rpt <- c(26, 15, 24.1, 10.1, 5.5, 0.4, 2.3, 5.9, 13.4, 28.1, 19.3, 18.7)
```
###
Now we'd like to know the the difference in water table between one year and the next. We can just subtract one object from the other:
```{r echo=TRUE}
water.table.rpt - water.table
```
###
Or we can calculate the difference and convert to metres in one line of code:
```{r echo=TRUE}
(water.table.rpt - water.table)/100
```
### Example: pitfall trap
In a previous exercise we collected data from a pitfall trap and saved it into a matrix:
```{r echo=TRUE}
trap.count <- cbind(c(11,19,33,12), c(9,33,27,14))
```
###
Now we'd like to calculate the log of each element in the matrix:
```{r echo=TRUE}
log(trap.count)
```
### Excercise: birds
A school has been recording birds in its neighbourhood. They have asked you to analyse the results. The data have been imported and saved to the variable `birds`.
`birds <- import.csv("birds.csv")`
Have a look at the data they sent you by typing `View(birds)`.
```{r block10-setup}
birds <- cbind(species = c("duck", "gull", "lapwing", "nuthatch", "owl", "robin", "sparrow", "tit" ),
farm = c(10,5,2,0,0,0,7,2),
garden = c(7,0,0,0,0,3,0,6),
wood = c(0,0,0,4,2,0,0,8)
)
as.data.frame(birds)
```
```{r block10, exercise=TRUE}
```
```{r block10-hint}
#Make sure the V of `View(birds)` is capitalised
```
```{r birds.obj.ex.2}
quiz(
question("How would you calculate the total number of birds seen? (one correct answer)",
answer("`sum(birds)`", correct = F),
answer("`birds$farm + birds$garden + birds$wood`", correct = F),
answer("`sum(birds$farm, birds$garden, birds$wood)`", correct = T)
)
)
```
Before you answer the question, use the empty code block below to try out the options. Press **Run Code** to see the results and **Hint** if you are stuck.
```{r block-birds.obj.ex.2-setup}
birds <- cbind(species = c("duck", "gull", "lapwing", "nuthatch", "owl", "robin", "sparrow", "tit" ),
farm = c(10,5,2,0,0,0,7,2),
garden = c(7,0,0,0,0,3,0,6),
wood = c(0,0,0,4,2,0,0,8)
)
as.data.frame(birds)
```
```{r block-birds.obj.ex.2, exercise=TRUE}
```
```{r block-birds.obj.ex.2-hint}
"Have another look at the section Accessing data in data frames"
```
###Solution
* *Incorrect* `sum(birds)` This function is a good choice because it adds all the numbers in a dataframe. But here you are also inadvertently telling R to add up the names of the birds (in the column 'Species')
* *Incorrect* `birds$farm + birds$garden + birds$wood` Here you are actually making a new vector made up of the three columns of bird observations.
* *Correct* `sum(birds$farm, birds$garden, birds$wood)` Sum is geing told to add all the numbers in the three columns of bird numbers.
## Logical Operators
You can use **R** to carry out logical operations
|Operator |Description |
|:--------|:-----------|
|! |NOT |
|& |AND |
|| |OR |
### Logical vectors
You can create logical vectors:
```{r echo = T}
x <- c(TRUE, FALSE, FALSE, TRUE)
y <- c(FALSE, TRUE, FALSE, TRUE)