generated from uviclibraries/dsc-template
-
Notifications
You must be signed in to change notification settings - Fork 1
/
basics-0.Rmd
1099 lines (767 loc) · 30.7 KB
/
basics-0.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "2-Data Types, Basic Commands and Charting"
#author: DSC / Chloë Farr
#date: '2024-01-18'
output:
github_document:
toc: true
toc_depth: 3
pdf_document: default
html_document:
toc: true
toc_depth: '3'
df_print: paged
customjs: http://code.jquery.com/jquery-1.4.2.min.js
layout: default
nav_order: 3
parent: Workshop Activities
editor_options:
markdown:
wrap: 72
keep_md: true # Retain YAML front matter in output Markdown
---
```{r setup, include=FALSE}
#this code block is to the benefit of anyone working with the rmd file and running it straight in R, rather than participating in the work
knitr::opts_chunk$set(echo = TRUE)
#uncomment the install.packages if you haven't installed and loaded the 'readxl' package
#install.packages("readxl")
library(readxl)
startCodeDetailsBlock <- function(summaryText = "Check Your Code") {
# Use cat to directly output the string without quotes and avoid unintended newlines
cat(
"{::options parse_block_html='true' /}<details><summary markdown='span'>",summaryText,"</summary>"
)
}
endCodeDetailsBlock <- function() {
cat("</details>{::options parse_block_html='false'/}")
}
```
<img src="images/rstudio-22.png" alt="rstudio logo" style="float:right;width:220px;"/>
<br>
## 1. Getting familiar with the RStudio Interface
![Code Editor, R console, Workplace and Plots](images/rstudio-01.png)
<br> The RStudio interface is divided into several key areas, each
serving a specific purpose. - One of the great features of RStudio is
that you can customize the layout by reorganizing these windows to suit
your workflow.
> Hint: You can rearrange these windows and tabs to fit your personal preference by dragging them around the workspace.<!--chloe make a video on rearranging windows and resetting--> When you rearrange the panes in RStudio on your computer, the layout stays as you set it across future sessions.
*Main components of the RStudio interface:*
1. *Code Editor:* This is where you write and edit your R scripts.
- It features syntax highlighting, code completion, and other
helpful tools to make coding easier.
2. *Console:* The console is where R code is executed.
- You can type commands directly into the console, and it displays
outputs, messages, and errors.
- You might prefer to use the console for immediate execution, or
testing of small code snippets or commands.
3. *Files/Plots/Packages/Help Pane:*
- *Files:* Browse, open, and manage files in your working directory.
- *Plots:* View graphical outputs from your R code, such as plots and
graphs.
- *Packages:* Install, update, and load R packages.
- *Help:* Access R documentation and help files for:
- functions and packages
- example code
- information about datasets built in to R
- information about other general R-related topics.
- *Viewer:* Used to view local web content.
- We won't be covering this
- E.g., web graphics generated using packages like
- googleVis,
- htmlwidgets
- rCharts
- local web application created using Shiny, Rook, or OpenCPU.
- See more at the RStudio [Viewer
Page](https://rstudio.github.io/rstudio-extensions/rstudio_viewer.html){:target="\_blank"}
- *Presentation:* You can use RStudio to create presentations. The presentations can be viewed in this tab.
> We won't be covering this<br>
> See more at the RStudio [Viewer
Page](https://rstudio.github.io/rstudio-extensions/rstudio_viewer.html){:target="\_blank"}<br>
> To learn how, click [here](https://fish497.github.io/website/lectures/week_09/lec_27_presentations.html){:target="\_blank"}
https://rmarkdown.rstudio.com/lesson-11.html
4. *Environment/History Pane:*
- *Environment:* Shows your current workspace, including:
- defined variables
- data frames
- function objects.
- *History:* Records all the commands you've run in the current and
previous sessions.
- *Connections:* Used to manage and configure the integration of data
sources with your projects.
- E.g., Oracle, SQL, Salesforce
- *Tutorial:* Used to run tutorials that will help you learn and
master the R programming language.
- See more at the RStudio [Tutorial
Page](https://rstudio.github.io/rstudio-extensions/rstudio-tutorials.html){:target="\_blank"}
<br>
#### [Task 1.1:]{.underline} Open RStudio and get familiar with the interface
by identifying the 4 windows and switching between the tabs.
> Note: This task is just for you to get comfortable. There is no solution for this task.
### Working in the Code Editor
Use the code editor if you want to develop more complex, reusable, and
maintainable code that can be saved in a script and executed later.
- We won't be working in the code editor at this level.
- It will be introduced at the beginning of the Intermediate level
workshop. <br>
### Working in the Console
Each new line of code (aka. command line) begins with the angle bracket
`>` also known as the ‘prompt’ symbol. <br>
You will type the commands into the Console after the most recent angle
bracket `>`.
*command line:* lines of code in your console.
*‘prompt’ symbol* `>` : Each new command starts with this.
*execute:* run your command by pressing the 'enter' or 'return' key on
your keyboard.
- When you are ready to execute ('run') the command, type ‘enter’ or
‘return’ key on your keyboard.
- The output to the command will appear below your command.
*Things to be mindful of:*
- You cannot execute a command until the previous command has been
completely executed.
- If you don't see the prompt symbol, one of two things is happening:
- R is still processing your previous command, and you must wait
for it to finish.
- You might instead see the plus `+` symbol, which indicates that
you have entered an incomplete command.
- If you see the `+` symbol, you must enter the remainder of the
command before entering a new one.
- An error will occur if you write the `+` symbol into your command.
- Sometimes the output can be extensive and show more information than
you expected
- E.g., when you load in a package (we will discuss packages more
in Activity 3). <br>
For all tasks in this workshop, enter your commands in the Console
(bottom left) (top tabs say 'console', 'terminal', and 'background
jobs').<br>
<br>
#### [Task 1.2:]{.underline} Try getting help!
To do this, you'll run the `help()` function.
- Try getting information on vectors.
<br>
```{r echo = F, results='asis'}
startCodeDetailsBlock(summaryText = "Check Your Code for custom number of intervals")
```
```{r}
#Get additional information about "vectors" (a data type),
help("vector") # then type 'enter' or 'return'
```
```{r echo = F, results='asis'}
endCodeDetailsBlock()
```
<br> `help("vector")` will provide you with information about the mean
function in RStudio. - The help information will be displayed in the
Console following your command.
<br>
::: {#gif1}
<img src="images/rstudio-02.gif"/> <br>
:::
</details>
<br>
> Note: You can get help on related content by selecting the dropdown list at the top of the Help tab. <!--screenshot-->
<br>
------------------------------------------------------------------------
As you work through these activities, remember to save your workspace. -
Save your workspace by clicking on the top menu bar:
- File
- Save
You won't be asked to enter a name when saving.
------------------------------------------------------------------------
<br> <br>
## 2. Creating and Manipulating Vectors and Basic Variables
Remember: Write all of your code in the **Console** tab.
<!--add infographic about the trajectory of this section (broad to narrow)-->
> Note: For the purposes of this workshop, 'variable' and 'data object' are used interchangeably.
To create any data object:
- the command will begin with the a name for the new variable
- followed by: - an assignment operator `<-`
- and then the data or expression that defines the content of the variable.
-This can include direct values, function calls, operations, or other variables. <br>
```
variableName <- "word"
```
<br>
Variable names are never wrapped in quotes. String/character values being assigned to variable must be wrapped in quotes. Variable names cannot begin with anything other than an alphabetical character, but otherwise can contain special characters and numbers (*_13). Variable names cannot contain spaces, but string values can.
<br>
**Definition - "Function":** A set of instructions defined to perform a specific task.
-E.g., help() : 'help' is a function to get information
**Definition - "Function Call":** The act of executing a function with
specific arguments, if required, to produce a result. <br>
- e.g., help("integer")
- This calls the 'help' function with the argument (aka parameter) "integer"
- It will return information about an 'integer' object type.
<br>
### 2.1 Variables and Basic Data Types
Let's start by looking at types of variables.
<br>
**Definition - "Basic Data Types":** Types of data representing the simplest forms of data.
**Basic Data Types**:
- *Numeric*: Decimal or floating-point numbers (e.g., 4.5, -3.2).
- *Integer*: Whole numbers (e.g., 1, -5, 20).
- In R, integers are often just treated as numeric unless
explicitly specified.
- *Character*: Text or strings (e.g., "hello", "1234").
- *Logical*: Boolean values, either TRUE or FALSE.
- *Factor*: Categorical data, or data as levels (e.g., "low", "medium", "high").
<br> Here we'll look at basic operations with character variables.
<br>
Whenever you enter a string parameter, the string will more likely than not be wrapped in quotes. If it doesn’t work, add or remove quotes.
#### [Task 2.1.1:]{.underline} Create a variable for a pig's first name. `The first pig's first name is 'Bart'.`
```{r echo = F, results='asis'}
startCodeDetailsBlock(summaryText = "Check your code")
```
```{r}
#assign the first name 'Bart' to the first pig (pig1)
pig1.first_name <- "Bart"
```
```{r echo = F, results='asis'}
endCodeDetailsBlock()
```
<br>
#### [Task 2.1.2:]{.underline} Create a variable for a Bart's last name. `Bart's last name is 'Smith'.`
```{r echo = F, results='asis'}
startCodeDetailsBlock(summaryText = "Check your code")
```
```{r}
#assign the last name 'Smith' to the first pig (pig1)
pig1.last_name <- "Smith"
```
```{r echo = F, results='asis'}
endCodeDetailsBlock()
```
<br>
#### [Task 2.1.3:]{.underline}
Create a variable that equals Bart's first and last name, then display
the full name in the console <br>
The `paste()` function combines two strings and inserts a space between
them. `paste()` takes two arguments, like `paste(string1, string2)`
```{r echo = F, results='asis'}
startCodeDetailsBlock(summaryText = "Check your code")
```
```{r}
#concatenate the first pig's (pig1) first ('Bart') and last name ('Smith')
pig1.full_name <- paste(pig1.first_name, pig1.last_name)
#after pig1.full_name has been created, print (display) Bart's full name...
pig1.full_name
```
```{r echo = F, results='asis'}
endCodeDetailsBlock()
```
<br>
Now we'll look at basic operations with **numeric and integer
variables**. First we'll create height information for Bart and find out
how much he's grown in height.
<br>
<img src="images/rstudio-basics-Bart.png" alt="Bart as a piglet and adult" style="width:420px;"/>
#### [Task 2.1.4:]{.underline} Create a variable for Bart's height as a piglet: 10
```{r echo = F, results='asis'}
startCodeDetailsBlock(summaryText = "Check your code")
```
```{r}
#Assign the value of Bart's piglet height
pig1.heightA <- 10
```
```{r echo = F, results='asis'}
endCodeDetailsBlock()
```
<br>
#### [Task 2.1.5:]{.underline} Create a variable for Bart's height now: 22.3
```{r echo = F, results='asis'}
startCodeDetailsBlock(summaryText = "Check your code")
```
```{r}
#Assign the value of Bart's current height
pig1.heightB <- 22.3
```
```{r echo = F, results='asis'}
endCodeDetailsBlock()
```
<br>
#### [Task 2.1.6:]{.underline}
Now create a variable expressing the amount he's grown.
```{r echo = F, results='asis'}
startCodeDetailsBlock(summaryText = "Check your code")
```
```{r}
# Find the difference in height using the expression: 'heightB - heightA'
# using the subtraction operator.
pig1.heightGain <- pig1.heightB - pig1.heightA
#after pig1.heightGain has been created, print (display) the value of pig.heightGain...
pig1.heightGain
```
```{r echo = F, results='asis'}
endCodeDetailsBlock()
```
*Hint:* "Expressing" indicates that the value will require an
expression, in this case, a mathematical operation.
<br>
`pig1.heightA` is an 'integer' data type (whole number)
`pig1.heightB` is a 'numeric' data type (decmial number)
R can perform operations on different data types like getting the
difference of a value.
------------------------------------------------------------------------
Reminder! Save your work
------------------------------------------------------------------------
> Additional: To remove data objects from your environment, execute the 'remove' function in the console: `rm()`.<br>
> e.g., `rm(full_name)`
<br> **Time for logical or boolean values!**
We can denote if Bart is small or large with a boolean value.
<br>
#### [Task 2.1.7:]{.underline} Create two variables (pig1.mini and pig1.large) which indicate that Bart is a large pig and not a mini pig.
```{r echo = F, results='asis'}
startCodeDetailsBlock(summaryText = "Check your code")
```
```{r}
pig1.mini <- FALSE
pig1.large <- TRUE
```
```{r echo = F, results='asis'}
endCodeDetailsBlock()
```
*Hint:* Boolean values are either 'TRUE' or 'FALSE' (case sensitive).
<br>
### 2.2 Vectors
A vector is a 1-dimensional list of items that are of the same data type
(all text, all whole numbers, etc.)
To create a vector object, you will use the `c()` function. <br>
- The 'c' stands for 'combine'.
- It's used to create a vector by grouping individual values into a
list-like structure.
- Think of it as placing items into a container where each item
remains distinct and can be individually accessed.
- For example, `vector1 <- c(value1, value2)` creates a vector
named 'vector1' containing the elements 'value1' and
'value2' as separate items in a sequence, not as a single
merged item.
- A value in a vector can be accessed by using square brackets and
its index (the value's place in the vector), where **1** is the
first index.
- `vector1[1]` will output: 'value1'
<br>
As you might have seen if you tested the help() function by looking up
information on vectors, you will know that <!--check its output--> many
functions and operations in R are designed to work naturally with
vectors.
<br>
#### [Task 2.2.1:]{.underline} Make a vector for the following weight values of miniature goats. Name your variable 'goat.weights'
| `Goat weights: 13.3, 17.2, 14.8, 14.6, 12.4`
```{r echo = F, results='asis'}
startCodeDetailsBlock(summaryText = "Check your code")
```
```{r}
# The period between 'goat' and 'weight' has no special purpose.
# It only shows the person reading the code that 'weights' is information that pertains to the goats
goat.weights <- c(13.3, 17.2, 14.8, 14.6, 12.4)
```
```{r echo = F, results='asis'}
endCodeDetailsBlock()
```
<br>
The command you just ran has now appeared in your console (bottom left
window)
- the goat.weight vector is now listed in the Environment tab
(top right window) under [Values]{.underline}.<br>
<br>
#### [Task 2.2.2:]{.underline} Show the contents of the vector containing the goat weights.
If at any point you want to view the value of a variable, use the
`print()` function with the name of the variable name and type 'enter'
or 'return' to execute.
```{r echo = F, results='asis'}
startCodeDetailsBlock(summaryText = "Check your code")
```
```{r}
print(goat.weights)
```
```{r echo = F, results='asis'}
endCodeDetailsBlock()
```
<br>
#### [Task 2.2.3:]{.underline} Display the weight of the second goat in the vector.
```{r echo = F, results='asis'}
startCodeDetailsBlock(summaryText = "Check your code")
```
```{r}
goat.weights[2]
```
```{r echo = F, results='asis'}
endCodeDetailsBlock()
```
*Hint:* `data_object_name[indexNumber]`
<br>
You have just worked with numeric vectors. Now let's move to string
vectors.
<img src="images/rstudio-basics-goats.png" alt="a row of goats" style="width:420px;"/>
<br>
#### [Task 2.2.4:]{.underline} Make a vector for the following
name values of miniature goats. Name your variable 'goat.name'
`Goat names: baby, pickles, cookie, sparkle, gabbie`
> Note: Text values must be wrapped in quotations. You can use double or single quotes, but must be consistent - Good: "text" - Good: 'text' - Bad: 'text"
```{r echo = F, results='asis'}
startCodeDetailsBlock(summaryText = "Check your code")
```
```{r}
goat.name <- c("baby", "pickes", "cookie", "sparkle", "gabbie")
```
```{r echo = F, results='asis'}
endCodeDetailsBlock()
```
<br>
To get the length of a vector, we can use the <code>length()</code>
function.
<br>
#### [Task 2.2.5:]{.underline} Print (display) the length of the vector of miniature goat names.
Note: In a script (code editor), you often need to use the print() function explicitly to see the output, especially when running multiple lines of code or within functions. However, in the console, R automatically displays the output of expressions upon execution of the command.
```{r echo = F, results='asis'}
startCodeDetailsBlock(summaryText = "Check your code")
```
```{r}
length(goat.name)
```
```{r echo = F, results='asis'}
endCodeDetailsBlock()
```
<br>
#### (Additional Information) Lists
A 'list' can hold items of different types (even vectors), while items
in a 'vector' must all be the same type. <br>
To make a list, we'll use the `list()` function.
> Hint: Remember that all items in a vector must be the same type, but
> can be different types if in a list.
> Additional: If you want to create 2D lists, also known as a table, you
will create a matrix using the `matrix()` function.
<br>
>- For more on matrices, [check me
out](https://www.w3schools.com/r/r_matrices.asp){:target="\_blank"}.
<br>- Instead of creating our own matrices, we will be importing data later
on.
------------------------------------------------------------------------
Reminder! Save your work
------------------------------------------------------------------------
<br>
## 3. Descriptive Statistics
Statistics is:
- the science of collecting, analyzing, interpreting
- presenting data to uncover patterns and trends
- making informed decisions based on this data.
If you're unfamiliar with statistics, you can learn more about it from
the [w3school Statistics
Tutorial](https://www.w3schools.com/statistics/index.php){:target="\_blank"}
In this section, we'll be focusing on
- Basic statistical measures
- Presenting data in a histogram
- More on presenting data will be covered in [Activity 4-Data Visualization](https://uviclibraries.github.io/rstudio/ggplot2-data.html){:target="\_blank"}
- Importing data
### 3.1 Basic statistical measures
The function names for the following three statistical measures (mean,
median, standard deviation) are quite intuitive.
- It is just the name or abbreviation of the measure,
- where the argument is the object containing the set of values we are analyzing.
- Each function takes the vector as its argument.
These three functions are designed for sets of numerical and decimal
values. If run on other types (string, aka text, and boolean, aka
true/false), result will be `NA`.
#### [Task 3.1.1:]{.underline}
For this task, we will use a new vector object containing weights for a
set of pigs.
Create a vector object with the weights of a set of pigs. Name your
variable 'pigs.weight'
`Weights of pigs: 22, 27, 19, 25, 12, 22, 18`
```{r echo = F, results='asis'}
startCodeDetailsBlock(summaryText = "Check your code")
```
```{r}
pigs.weight <- c(22, 27, 19, 25, 12, 22, 18)
```
```{r echo = F, results='asis'}
endCodeDetailsBlock()
```
<br>
#### [Task 3.1.2:]{.underline}
**Mean:** the average value in a set.
The `mean()` function calculates the sum of the in the set and divides
the sum by the number of items in the set.
Write and execute a command that outputs the mean value of the pigs'
weights.
```{r echo = F, results='asis'}
startCodeDetailsBlock(summaryText = "Check your code")
```
```{r}
# output the average weight of all of the pigs
mean(pigs.weight)
```
```{r echo = F, results='asis'}
endCodeDetailsBlock()
```
<br>
#### [Task 3.1.3:]{.underline}
Write and execute a command that outputs the median value of the pigs'
weights
**Median:** The middle value in a sorted set (e.g. lowest - highest).
`median()`
```{r echo = F, results='asis'}
startCodeDetailsBlock(summaryText = "Check your code")
```
```{r}
median(pigs.weight)
```
```{r echo = F, results='asis'}
endCodeDetailsBlock()
```
<br> The output tells you the weight of the pig that falls between the
lighter half and the heavier half of the pigs. <br>
<br>
#### [Task 3.1.4:]{.underline}
**Standard deviation:** Describes how spread out the data is. `sd()`
Write and execute a command that outputs the standard deviation of the
pigs' weights
The output tells you how much the weights of the pigs vary from the
average weight.
- A small standard deviation means that most pigs'
weights are close to the average, indicating uniformity in size.
- A large standard deviation suggests a wide range of weights. <br>
```{r echo = F, results='asis'}
startCodeDetailsBlock(summaryText = "Check your code")
```
```{r}
sd(pigs.weight)
```
```{r echo = F, results='asis'}
endCodeDetailsBlock()
```
<br>
#### [Task 3.1.5:]{.underline}
Display a summary of values pertaining to the pigs' weights
We can execute a **'summary'** to generate several descriptive
statistics at the same time. `summary()`
```{r echo = F, results='asis'}
startCodeDetailsBlock(summaryText = "Check your code")
```
```{r}
summary(pigs.weight)
```
```{r echo = F, results='asis'}
endCodeDetailsBlock()
```
<br>
### 3.2. Histogram Plot for Pig Weights
**Histogram:** A graph used for understanding and analysing the
distribution of values in a vector.
A histogram illustrates:
- Where data points tend to cluster
- The variability of data
- The shape of variability
<br>
#### [Task 3.2.1:]{.underline}
Create a histogram for the pigs' weights using the histogram function
`hist()`
- Parameter: vector of pig weights
```{r echo = F, results='asis'}
startCodeDetailsBlock(summaryText = "Check your code and see the histogram")
```
```{r}
hist(pigs.weight)
# The histogram will appear in the Plot tab.
```
```{r echo = F, results='asis'}
endCodeDetailsBlock()
```
<br>
[Task 3.2.2:]{.underline} Create a histogram for the pigs' weights, with
axes labels.
We can also pass in additional parameters to control the way our plot
looks.
Some of the frequently used parameters are:
- `main` : The title of the plot
- e.g., `main = "This is the Plot Title"` <br>
- `xlab` : The x-axis label
- e.g., `xlab = "The X Label"` <br>
- `ylab` : The y-axis label
- e.g., ylab = "The Y Label"
<br>
In your histogram for the pigs' weights, use:
- X-label: "Weight"
- Y-label: "Frequency"
- This is a default value.
- You don't have to specify it unless you would like a different
label.
- Graph title: "Histogram of Pigs' Weights"
*Hint:* Remember, a parameter is information that goes in the
parenthesis of the function.
Multiple parameters: `function_name(parameter1, parameter2)`
- E.g., `hist(dataset, xlab="x-label", ylab = "y-label", main = "main title")`
```{r echo = F, results='asis'}
startCodeDetailsBlock(summaryText = "Check your code")
```
```{r}
# The first parameter is the name of the data (vector) object
# 'main' is the graph title
# 'xlab' is the label of the x-axis
# label parameters can be in any order, but following the data object
# y-label on a histogram defaults to "frequency". You can add 'ylab=""' if you'd like.
hist(pigs.weight,main='Histogram of Pig Weight',xlab='Weight')
# The histogram will appear in the Plot tab.
```
```{r echo = F, results='asis'}
endCodeDetailsBlock()
```
The histogram will appear in the Plots tab (bottom right quadrant if you
haven't modified your RStudio layout).
<br>
## 4. Importing Data
So far, we've create our own objects by manually entering all of the
data in the console. In this section, we'll learn how to create objects
by importing (aka 'reading') data (compiled outside of R) into R and
visualise it with a histogram.
### 4.1 Importing Excel data into R
R can handle multiple file types:
- .csv (comma separated values)
- excel (.xls, .xlsx)
- .txt (and .tsv - tab separated values)
- .json (used for nested data structures)
- These would likely be arrays of more than 2 dimensions.
- SPSS (another specialized statistics software)
- Data scraped from the web or via an API.
#### [Task 4.1.1]{.underline} Download and save test data
Download and save [this Excel spreadsheet of Income
data](docs/income.xlsx){:target="\_blank"}
- *Note:* Please remember where the income.xlsx file is saved (usually
in a “downloads” or “desktop” folder).
#### [Task 4.1.2:]{.underline} Import the dataset of Income data
- From the top menu bar, select...
- File
- Import dataset
- From Excel
- In the 'Import Excel Data' window select your file by:
- Entering the file path to the income.xlsx file you just
downloaded.
- Selecting "Browse" on the right side of the path bar and
locating it in the browser.
- Under 'Import Options,' make sure 'Name' is the same text as you
wish for the variable to be named. Ours will be 'income'.
- Click "Import"
- If asked to install the `readxl` package, click **Yes**.
> Note: Don't worry about making a mistake importing this data. You can always remove it using the `rm()` function.
![Browse and import menu and buttons](images/rstudio-15.png)
![Import excel data window](images/rstudio-17.png)
<!-- remove summary command from screenshot-->
<br>
What you just imported is now stored as a 'data frame' object whose name
is `income`.
**Definition - Data frame:** essentially a table. It is 2-dimensional
object that can hold different types of data types.
> Additional: Data frames contain information about a set of objects (e.g., cats).<br>
> - The data frame will contain one or more columns and one or more rows.<br>
> - One column contains related values (column 1 = age, column 2 = eye color).<br>
> - Because the column contains the same type of information, it is equivalent to a vector. <br>
> - i.e., the 'eye color' column will contain characters, not numbers.<br>
> - One row denotes one object from the set. In a data frame of information about a set of cats, each row is information about one specific cats.
A row can contain many different bits of information, like age (numerical), eye color (character), breed (character), whether or not it's spayed/neutered (boolean). Because rows may contain values of different types, one row would most likely not be a vector. It would likely be a list, which can contain values of different types.
```{r load-data, echo=FALSE, results='hide'}
income <- read_excel("docs/income.xlsx")
```
To see the data in our data frame, simply enter the name of the data
frame in the console and type 'enter' or 'return'.
```{r echo = F, results='asis'}
startCodeDetailsBlock(summaryText = "Check your code")
```
```{r eval=FALSE}
income
```
```{r echo = F, results='asis'}
endCodeDetailsBlock()
```
<br>
The following will be the output:
```{r echo=FALSE}
income
```
> Note: We will explore other ways to view and preview content of our data frames in Activity 3.
> Note: `<char>` stands for "character" data type and `<dbl>` stands for "double-precision floating point numbers data" type. <br>
We can see now that our data frame `income` contains 10 objects (rows),
and 4 variables (columns)
- It can be inferred that this data relates to 10 people
- The values with each person are:
- id (in lieu of a name) (dbl)
- gender (char)
- income (dbl)
- experience (dbl) <br>
#### [Task 4.1.3:]{.underline} Display a summary of statistics for the `income` data.
```{r echo = F, results='asis'}