generated from jtr13/bookdown-template
-
Notifications
You must be signed in to change notification settings - Fork 32
/
Copy pathlit_prog.qmd
1244 lines (992 loc) · 44.6 KB
/
lit_prog.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Literate programming
You now know about version control, how to collaborate using github.com and
functional programming. By only learning about this, you have already made some
massive steps towards making your projects reproducible. Especially by using Git
and Github. Even if you're using private repos and work in the private sector,
by using version control, you ensure that reusing this code for future projects
is much easier. Auditing is greatly simplified as well.
But this book is still far from over. Let's think about our project up until
now. We have downloaded some data, and wrote code to analyse it. Fair enough.
But usually, we don't really stop there. We now need to write a report, or maybe
a Powerpoint presentation. If you're a researcher, you still need to write a
paper, just getting the results is not enough, and if you work in the private
sector, you also need to present the results of your analysis to management.
::: {.content-hidden when-format="pdf"}
<figure>
<img src="images/report_draft_loop.png"
alt="The cursed report drafting loop."></img>
<figcaption>The cursed report drafting loop.</figcaption>
</figure>
:::
::: {.content-visible when-format="pdf"}
```{r, echo = F}
#| fig-cap: "The cursed report drafting loop."
knitr::include_graphics("images/report_draft_loop.png")
```
:::
The problem is that writing code, getting some results, and putting these results
into a document (it doesn’t matter what kind) is often very tedious. The picture
above illustrates this cursed report drafting loop. Get some results, copy and
paste images into Word or Powerpoint, get a change request, or notice a mistake,
and start from scratch again. If you're using LaTeX it'll be easier for
pictures, but you'll still need to update tables by hand each time you need to
touch your analysis code.
Worse, what if you start with a Word or LaTeX document, but then get asked to
make a Powerpoint presentation as well? Then you need to copy and paste
everything again, but this time into Powerpoint... and if you get a change
request after you're done and need to start over, you might seriously consider
raising goats instead of dealing with this again.
But if we can make the loop look like this instead:
::: {.content-hidden when-format="pdf"}
<figure>
<img src="images/md_draft_loop.png"
alt="The holy report drafting loop."></img>
<figcaption>The holy report drafting loop.</figcaption>
</figure>
:::
::: {.content-visible when-format="pdf"}
```{r, echo = F}
#| fig-cap: "The holy report drafting loop."
knitr::include_graphics("images/md_draft_loop.png")
```
:::
Basically, everything from cleaning, analysing and drafting is done in one
single step? Well, this is what literate programming enables you to do. And even
if you get asked to make a Powerpoint presentation, you can start from the same
source code as the original report, and remove everything that you don't need
and compile to a Powerpoint (or Beamer) presentation.
## A quick history of literate programming
In literate programming, authors mix code and prose, which makes the output of
their programs not just a series of tables, or graphs or predictions, but a
complete report that contains the results of the analysis directly embedded into
it. Scripts written using literate programming are also very easy to compile, or
render, into a variety of document formats like `html`, `docx`, `pdf` or even
`pptx`. R supports several frameworks for literate programming: Sweave, knitr
and Quarto.
Sweave was the first tool available to R (and S) users, and allowed the mixing
of R and LaTeX code to create a document. Friedrich Leisch developed Sweave in
2002 and described it in his 2002 paper [@leisch2002]. As Leisch argues, *the
traditional way of writing a report as part of a statistical data analysis
project uses two separate steps*: running the analysis using some software, and
then copy and pasting the results into a word processing tool (as illustrated
above). To really drive that point home: the problem with this approach is that
much time is wasted copy and pasting things, so experimenting with different
layouts or data analysis techniques is very time-consuming. Copy and paste
mistakes will also happen (it’s not a question of if, but when) and updating
reports (for example, when new data comes in) means that someone will have,
again, to copy and paste the updated results into a new document.
Sweave makes it possible to embed the analysis in the final document itself, by
providing a way to mix LaTeX and R code which gets executed whenever the final,
output document gets compiled. This gives practitioners considerable time
savings because it eliminates the copy and pasting of results from R outputs
into a document.
The snippet below shows the example from Leisch’s paper:
```
\documentclass[a4paper]{article}
\begin{document}
In this example we embed parts of the examples from the
\texttt{kruskal.test} help page into a LaTeX document:
<<>>=
data (airquality)
kruskal.test(Ozone ~ Month, data = airquality)
@
which shows that the location parameter of the Ozone
distribution varies significantly from month to month.
Finally we include a boxplot of the data:
\begin{center}
<<fig=TRUE,echo=FALSE>>=
boxplot(Ozone ~ Month, data = airquality)
@
\end{center}
\end{document}
```
Even if you’ve never seen a LaTeX source file, you should be able to figure out
what’s going on. The first line states what type of document we’re writing. Then
comes `\begin{document}` which tells the compiler where the document starts.
Then comes the content. You can see that it’s a mixture of plain English with R
code defined inside chunks starting with `<<>>=` and ending with `@`. Finally,
the document ends with `\end{document}`. Getting a human-readable PDF from this
source is a two-step process: first, this source gets converted into a `.tex`
file and then this `.tex` file into a PDF. Sweave is included with every R
installation since version 1.5.0, and still works to this day. For example, we
can test that our Sweave installation works just fine by compiling the example
above. This is what the final output looks like:
::: {.content-hidden when-format="pdf"}
<figure>
<img src="images/hello_sweave.PNG"
alt="More than 20 years later, the output is still the same."></img>
<figcaption>More than 20 years later, the output is still the same.</figcaption>
</figure>
:::
::: {.content-visible when-format="pdf"}
```{r, echo = F}
#| fig-cap: "More than 20 years later, the output is still the same."
knitr::include_graphics("images/hello_sweave.PNG")
```
:::
Let me just state that the fact that it is still possible to compile this
example more than 20 years later is an incredible testament to how mature and
stable this software is (both R, Sweave, and LaTeX). But as impressive as this
is, LaTeX has a steep learning curve, and Leisch even advocated the use of the
Emacs text editor to edit Sweave files, which also has a very steep learning
curve (but this is entirely optional; for example, I’ve edited and compiled the
example on the RStudio IDE).
The next generation of literate programming tools was provided by a package
called `{knitr}` in 2012. From the perspective of the user, the biggest change
from Sweave is that `{knitr}` is able to use many different formats as source
files. The one that became very likely the most widely used format is a flavour
of the Markdown markup language, R Markdown (Rmd). But this is not the only
difference with Sweave:`{knitr}` can also run code chunks for other languages,
such as Python, Perl, Awk, Haskell, bash and more [@xie2014]. Since version
1.18, `{knitr}` uses the `{reticulate}` package to provide a Python engine for
the Rmd format.
::: {.content-visible when-format="pdf"}
\newpage
:::
To illustrate the Rmd format, let’s rewrite the example from
Leisch’s Sweave paper into it:
````{verbatim}
---
output: pdf_document
---
In this example we embed parts of the examples from the
\texttt{kruskal.test} help page into a LaTeX document:
```{r}
data (airquality)
kruskal.test(Ozone ~ Month, data = airquality)
```
which shows that the location parameter of the Ozone
distribution varies significantly from month to month.
Finally we include a boxplot of the data:
```{r, echo = FALSE}
boxplot(Ozone ~ Month, data = airquality)
```
````
This is what the output looks like:
::: {.content-hidden when-format="pdf"}
<figure>
<img src="images/hello_rmd.PNG"
alt="It’s very close to the Sweave output."></img>
<figcaption>It’s very close to the Sweave output.</figcaption>
</figure>
:::
::: {.content-visible when-format="pdf"}
```{r, echo = F}
#| fig-cap: "It’s very close to the Sweave output."
knitr::include_graphics("images/hello_rmd.PNG")
```
:::
Just like in a Sweave document, an Rmd source file also has a header in which
authors can define a number of general options. Here I’ve only specified that I wanted a
pdf document as an output file. I then copy and pasted the contents from the
Sweave source, but changed the chunk delimiters from `<<>>=` and `@` to ```` ```{r} ```` to start an R chunk and ```` ``` ```` to end it. Remember; we need
to specify the engine in the chunk because `{knitr}` supports many engines. For
example, it is possible to run a bash command by adding this chunk to the
source:
````{verbatim}
---
output: pdf_document
---
In this example we embed parts of the examples from the
\texttt{kruskal.test} help page into a LaTeX document:
```{r}
data (airquality)
kruskal.test(Ozone ~ Month, data = airquality)
```
which shows that the location parameter of the Ozone
distribution varies significantly from month to month.
Finally we include a boxplot of the data:
```{r, echo = FALSE}
boxplot(Ozone ~ Month, data = airquality)
```
```{bash}
pwd
```
````
(bash’s `pwd` command shows the current working directory). You may have noticed
that I’ve also kept two LaTeX commands in the source Rmd, `\texttt{}` and `LaTeX`.
This is because Rmd files get first converted into `LaTeX` files and then into a
PDF. If you’re using RStudio, this document can be compiled by clicking a button
or using a keyboard shortcut, but you can also use the `rmarkdown::render()`
function. This function does two things transparently: it first converts the Rmd
file into a source LaTeX file, and then converts it into a PDF. It is of course
possible to convert the document to a Word document as well, but in this case,
LaTeX commands will be ignored. Html is another widely used output format.
If you're a researcher and prefer working with LaTeX directly instead of having
to switch to Markdown, you can either use Sweave, or use `{knitr}` but instead
of writing your documents using the R Markdown format, you can use the `Rnw`
format which is basically the same as Sweave, but uses `{knitr}` for
compilation. Take a look at [this
example](https://github.com/yihui/knitr/blob/master/inst/examples/knitr-minimal.Rnw)^[https://is.gd/Z7VS09]
from the `{knitr}` Github repository for example.
You should know that `{knitr}` makes it possible to author many, many different
types of documents. It is possible to write books, blogs, package documentation
(and even entire packages, as we shall see later in this book), Powerpoint
slides... It is extremely powerful because we can use the same general R
Markdown knowledge to build many different outputs.
::: {.content-hidden when-format="pdf"}
<figure>
<img src="images/rmd_to_many.PNG"
alt="One format to rule them all."></img>
<figcaption>One format to rule them all.</figcaption>
</figure>
:::
::: {.content-visible when-format="pdf"}
```{r, echo = F, out.height="300px"}
#| fig-cap: "One format to rule them all."
knitr::include_graphics("images/rmd_to_many.PNG")
```
:::
Finally, the latest in literate programming for R is a new tool developed by
Posit, called Quarto. If you’re an R user and already know `{knitr}` and the Rmd
format, you should be able to immediately use Quarto. So what’s the difference?
In practice and for R users not much but there are some things that Quarto is
able to do out of the box for which you'd need extensions with `{knitr}`. Quarto
has some nice defaults; in fact, this book is written in Quarto's Markdown
flavour and compiled with Quarto instead of `{knitr}` because the default Quarto
output looks nicer than the default `{knitr}` output. However, there may even be
things that Quarto can’t do at all (at least for now) when compared to
`{knitr}`. So why bother switching? Well, Quarto provides sane defaults and some
nice features out of the box, and the cost of switching from the Rmd format to
Quarto’s Qmd format is basically 0. Also, and this is probably the biggest
reason to use Quarto, Quarto is not tied to R. Quarto is actually a standalone
tool that needs to be installed alongside your R installation, and works
completely independently. In fact, you can use Quarto without having R installed
at all, as Quarto, just like `{knitr}` supports many engines. This means that if
you’re primarily using Python, you can use Quarto to author documents that mix
Python chunks and prose. Quarto also supports the Julia programming language and
Observable JS, making it possible to include interactive visualisations into an
Html document. Let’s take a look at how the example from Leisch’s paper looks as
a Qmd (Quarto’s flavour of Markdown) file:
````{verbatim}
---
output: pdf
---
In this example we embed parts of the examples from the
\texttt{kruskal.test} help page into a LaTeX document:
```{r}
data (airquality)
kruskal.test(Ozone ~ Month, data = airquality)
```
which shows that the location parameter of the Ozone
distribution varies significantly from month to month.
Finally we include a boxplot of the data:
```{r, echo = FALSE}
boxplot(Ozone ~ Month, data = airquality)
```
````
(I’ve omitted the bash chunk from before, not because Quarto does not support
it, but to keep close to the original example from the paper.)
As you can see, it’s exactly the same as the Rmd file from before. The only
difference is in the header. In the Rmd file I specified the output format as:
````
---
output: pdf_document
---
````
whereas in the Qmd file we changed it to:
````
---
output: pdf
---
````
While Quarto is the latest option in literate programming, it is quite recent,
and as such, I feel it might be better to stick with `{knitr}` and the Rmd
format for now, so that’s what we’re going to use going forward. Also, the
`{knitr}` and the Rmd format are here to stay, so there's little risk in keeping
using it, and anyways, as already stated, if switching to Quarto becomes a
necessity, the cost of switching would be very, very low. In what follows, I
won’t be focused on anything really `{knitr}` or Rmd specific, so should you
want to use Quarto instead, you should be able to follow along without any
problems at all, since the Rmd and Qmd formats have so much overlap. Also,
Quarto needs to be installed separately, but to use `{knitr}` and RMarkdown,
no specific tools are necessary.
In the next two sections, I will show you how to set up and use `{knitr}` as
well as give you a quick overview of the R Markdown syntax. However, we will
very quickly focus on the templating capabilities of `{knitr}`: expanding text,
using child documents, and parameterised reports. These are advanced topics and
not easy to tackle if you’re not comfortable with R already. Just as functions
and higher-order functions like `lapply()` avoid having to repeat yourself, so
does templating, but for literate programming. The goal is to write functions
that return literal R Markdown code, so that you can loop over these functions
to build entire sections of your documents. However, the learning curve for
these features is quite steep, but by now, you should have noticed that this
book expects a lot from you. Keep going, and you shall be handsomely rewarded.
## `{knitr}` basics
This section will be a very small intro to `{knitr}`. I’m going to teach you
just enough to get started writing Rmd files. Most, if not all, of what I’ll be
explaining here is also applicable to the Qmd format. There are many resources
out there that you can use if you want to dig deeper, for instance the [R
Markdown
website](https://rmarkdown.rstudio.com/lesson-1.html)^[https://rmarkdown.rstudio.com/lesson-1.html]
from Posit, or the [R Markdown: The Definitive
Guide](https://bookdown.org/yihui/rmarkdown/)^[https://bookdown.org/yihui/rmarkdown/]
and [R Markdown
Cookbook](https://bookdown.org/yihui/rmarkdown-cookbook/)^[https://bookdown.org/yihui/rmarkdown-cookbook/]
eBooks. I will also not assume that you are using the RStudio IDE and give you
instead the lower level commands to render documents. If you use RStudio and
want to know how to use it effectively to author Rmd documents, you should take
a look at
[this](https://rmarkdown.rstudio.com/authoring_quick_tour.html)^[https://rmarkdown.rstudio.com/authoring_quick_tour.html]
page. In fact, this section will basically focus on the same topics, but without
RStudio.
### Set up
The first step is to install the `{knitr}` and the `{rmarkdown}` packages.
That's easy, just type:
```{r, eval = F}
install.packages("rmarkdown")
```
in an R console. Since `{knitr}` is required to install `{rmarkdown}`, it gets
installed automatically. If you want to compile PDF documents, you should also
have a working LaTeX distribution. You can skip this next part if you’re only
interested in generating Html and Word files. For what follows in the book, we
will only be rendering Html documents, so no need to install LaTeX (by the way,
you don't even need a working Word installation to compile documents to the
`docx` format). However, if you already have a working LaTeX installation, you
shouldn’t have to do anything else to generate PDF documents. If you don’t have
a working LaTeX distribution, then Yihui Xie, the creator of `{knitr}` created
an R package called `{tinytex}` which you can use to install a working LaTeX
distribution very easily. In fact, this is the way I recommend installing LaTeX
even if you’re not an R user (it is possible to use the tinytex distribution
without R; it’s just that the `{tinytex}` R package provides many functions that
makes installing and maintaining it very easy). Simply run these commands in an
R console to get started:
```{r, eval = F}
install.packages("tinytex")
tinytex::install_tinytex()
```
and that’s it! If you need to install specific LaTeX packages, then refer to the
**Maintenance** section on [tinytex’s](https://yihui.org/tinytex/#maintenance)^[https://yihui.org/tinytex/#maintenance]
website. For example, to compile the example from Leisch’s article on Sweave
discussed previously, the `grfext` LaTeX package needs to be installed (as
explained by the error output in the console when I tried compiling). To install
this package, you can use the `tlmgr_install()` function from `{tinytex}`:
```{r, eval = F}
tlmgr_install("grfext")
```
After you’ve installed `{knitr}`, `{rmarkdown}` and, optionally, `{tinytex}`,
simply try to compile the following document:
````{verbatim}
---
output: html_document
---
# Document title
## Section title
### Subsection title
This is **bold** text. This is *text in italics*.
My favourite programming language for statistics is ~~SAS~~ R.
````
save this document into a file called `rmd_test.rmd` using your favourite
text editor. Then render it into an Html file by running the following command
in the R console:
```{r, eval = F}
rmarkdown::render("path/to/rmd_test.rmd")
```
This should create a file called `rmd_test.html`; open it with your web browser
and you should see the following:
::: {.content-hidden when-format="pdf"}
<figure>
<img src="images/rmd_test.PNG"
alt="This is how formatting looks like, once the document is compiled."></img>
<figcaption>This is how formatting looks like, once the document is compiled.</figcaption>
</figure>
:::
::: {.content-visible when-format="pdf"}
```{r, echo = F}
#| fig-cap: "This is how formatting looks like, once the document is compiled."
knitr::include_graphics("images/rmd_test.PNG")
```
:::
Congratulations, you just *knitted* your first Rmd document!
### Markdown ultrabasics
R Markdown is a flavour of Markdown, which means that you should know some
Markdown to really take full advantage of R Markdown. The example document from
before should have already shown you some basics: titles, sections and
subsections all start with a `#` and the depth level is determined by the number
of `#`s. For bold text, simply put the words in between `**` and for italics use
only one `*`. If you want ***bold and italics***, use `***`. The original
designer of Markdown did not think that underlining text was important, so there
is no *easy* way of doing it, unfortunately. For this, you need to use a
somewhat hidden feature; without going into too many technical details, the
program that converts Rmd files to the final output format is called Pandoc, and
it’s possible to use some of Pandoc's features to format text. For example, for
underlining:
````{verbatim}
[This is some underlined text in a R Markdown document]{.underline}
````
This will underline the text between square brackets.[^1]
[^1]: https://stackoverflow.com/a/68690065/1298051
The next step is to mix code and prose. As you’ve seen from Leisch’s canonical
example, this is quite easily achieved by using R code chunks. The R Markdown
example below shows various code chunks alongside some options. For example, a
code chunk that uses the `echo = FALSE` option will not appear (but the output
of the computation will):
````{verbatim}
---
title: "Document title"
output: html_document
date: "2023-01-28"
---
# R code chunks
This below is an R code chunk:
```{r}
data(mtcars)
plot(mtcars)
```
The code chunk above will appear in the final output.
The code chunk below will be hidden:
```{r, echo = FALSE}
data(iris)
plot(iris)
```
This next code chunk will not be evaluated:
```{r, eval = FALSE}
data(titanic)
str(titanic)
```
The last one below runs, but code and output from the code is
not shown in the final document. This is useful for loading
libraries and hiding startup messages:
```{r, include = FALSE}
library(dplyr)
```
````
If you use RStudio and create a new R Markdown file from the menu, a template R
Markdown file is generated for you to fill out. The first R chunk is this one:
````{verbatim}
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
````
This is an R chunk named `setup` with the option `include = FALSE` (so neither
the chunk itself, nor the output it produces will be shown in the compiled
document). Naming chunks is optional, but we are going to make use of this later
on. The code that runs in this chunk defines a global option to show the source
code from all the chunks by default (which is the default behaviour). You can
change `TRUE` to `FALSE` if you want to hide every code chunk instead (if you’re
using Quarto, global options are set
[differently](https://quarto.org/docs/computations/execution-options.html
)^[https://quarto.org/docs/computations/execution-options.html]).
Something else you might have noticed in the previous example, is that we’ve
added some more content in the header:
````{verbatim}
---
title: "Document title"
output: html_document
date: "2023-01-28"
---
````
There are several other options available that you can define in the header.
Later on, I will show you some more options, for example how to define a table of
contents.
To finish this part on code chunks, you should know about inline code chunks.
Take a look at the following example:
````{verbatim}
---
title: "Document title"
output: html_document
date: "2023-01-28"
---
# R code chunks
```{r, echo = FALSE}
data(iris)
```
The iris dataset has `r nrow(iris)` rows.
````
The last sentence from this example has an inline code chunk. This is quite
useful, as it allows to parameterise sentences and paragraphs, and thus avoids
needing to copy and paste (and we will go quite far into how to avoid copy and
pasting, thanks to more advanced features we will shortly discuss).
To finish this crash course, you should know that to use footnotes you need to
write the following:
````{verbatim}
This sentence has a footnote.[^1]
[^1]: This is the footnote.
````
or the following (which I prefer):
````{verbatim}
This sentence has a footnote.^[This is the footnote]
````
and that you can write LaTeX formulas as well. For example, add the lines below
into the example from before and render either a PDF or an HTML document (don’t
put the LaTeX formula below inside a chunk, simply paste it as if it were normal
text. This doesn’t work for Word output because Word does not support LaTeX
equations):
```
\begin{align*}
S(\omega)
&= \frac{\alpha g^2}{\omega^5}
e^{[ -0.74\bigl\{\frac{\omega U_\omega 19.5}{g}\bigr\}
^{\!-4}\,]} \\
&= \frac{\alpha g^2}{\omega^5}
\exp\Bigl[ -0.74\Bigl\{\frac{\omega U_\omega 19.5}{g}\Bigr\}
^{\!-4}\,\Bigr]
\end{align*}
```
The LaTeX code above results in this equation:
::: {.content-hidden when-format="pdf"}
<figure>
<img src="images/latex_equation.png"
alt="A rendered LaTeX equation."></img>
<figcaption>A rendered LaTeX equation.</figcaption>
</figure>
:::
::: {.content-visible when-format="pdf"}
```{r, echo = F, out.width="200px"}
#| fig-cap: "A rendered LaTeX equation."
knitr::include_graphics("images/latex_equation.png")
```
:::
## Keeping it DRY
Remember; we never, ever, want to have to repeat ourselves. Copy and pasting is
forbidden. Striving for 0 copy and pasting will make our code much more robust
and likely to be correct.
To keep DRY, we started by using functions, as discussed in the previous
chapter, but we can go much further than that. For example, suppose that we need
to write a document that has the following structure:
- A title
- A section
- A table inside this section
- Another section
- Another table inside this section
- Yet another section
- Yet another table inside this section
Is there a way to automate the creation of such a document by taking advantage
of the repeating structure? Of course there is. The question is not, *is it
possible to do X?*, but *how to do X?*.
### Generating R Markdown code from code
The example below is a fully working minimal example of this. Copy it inside a
document titled something like `rmd_templating.Rmd` and render it. You will see
that the output contains more sections than defined in the source. This is
because we use templating at the end. Take some time to read the document, as
the text inside explains what is going on:
````{verbatim}
---
title: "Templating"
output: html_document
date: "2023-01-27"
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## A function that creates tables
```{r}
create_table <- function(dataset, var){
table(dataset[var]) |>
knitr::kable()
}
```
The function above uses the `table()` function to create
frequency tables, and then this gets passed to the
`knitr::kable()` function that produces a good looking table
for our rendered document:
```{r}
create_table(mtcars, "am")
```
Let’s suppose that we want to generate a document that would
look like this:
- first a section title, with the name of the variable of interest
- then the table
So it would look like this:
## Frequency table for variable: "am"
```{r}
create_table(mtcars, "am")
```
We don’t want to create these sections for
every variable by hand.
Instead, we can define a function that
returns the R markdown code required to
create this. This is this function:
```{r}
return_section <- function(dataset, var){
a <- knitr::knit_expand(text = c(
"## Frequency table for variable: {{variable}}",
create_table(dataset, var)),
variable = var)
cat(a, sep = "\n")
}
```
This new function, `return_section()` uses
`knitr::knit_expand()` to generate RMarkdown
code. Words between `{{}}` get replaced by
the provided `var` argument to the function.
So when we call `return_section("am")`,
`{{variable}}` is replaced by `"am"`. `"am"`
then gets passed down to `create_table()`
and the frequency table gets generated.
We can now generate all the section by simply
applying our function to a list of column names:
```{r, results = "asis"}
invisible(lapply(colnames(mtcars), return_section, dataset = mtcars))
```
````
The last function, named `return_section()` uses `knit_expand()`, which
is the function that does the heavy lifting. This function returns literal R
Markdown code. It returns `## Frequency table for variable: {{variable}}` which
creates a level 2 section title with the text *Frequency table for variable:
xxx* where the `xxx` will get replaced by the variable passed to
`return_section()`. So calling `return_section(mtcars, "am")` will print the
following in your console:
```
## Frequency table for variable: am
|am | Freq|
|:--|----:|
|0 | 19|
|1 | 13|
```
We now simply need to find a clever way to apply this function to each variable
in the `mtcars` dataset. For this, we are going to use `lapply()` which
implements a for loop (you could use `purrr::map()` just as well for this):
```{r, eval = FALSE}
invisible(lapply(colnames(mtcars),
return_section,
dataset = mtcars))
```
This will create, for each variable in `mtcars`, the same R Markdown code as
above. Notice that the R Markdown chunk where the call to `lapply()` is has the
option `results = "asis"`. This is because the function returns literal Markdown
code, and we don’t want the parser to have to parse it again. We tell the parser
"don’t worry about this bit of code, it’s already good". As you see, the call to
`lapply()` is wrapped inside `invisible()`. This is because `return_section()`
does not return anything, it just prints something to the console. No object is
returned. `return_section()` is a function with only a side-effect: it changes
something outside its scope. So if you don’t wrap the call to `lapply()` inside
`invisible()`, then a bunch of `NULL`s will also get printed (`NULL`s get
returned by functions that don't return anything). To avoid this, use
`invisible()` (and use `purrr::walk()` rather than `purrr::map()` if you want to
use tidyverse packages and functions).
See the output [here](https://jolly-arithmetic-ce29cf.netlify.app/)^[https://is.gd/EzdUtt].
This is not an easy topic, so take the time to play around with the example
above. Try to print another table, try to generate more complex Markdown code,
remove the call to `invisible()` and knit the document and see what happens with
the output, replace the call to `lapply()` with `purrr::walk()` or
`purrr::map()`. Really take the time to understand what is going on.
While extremely powerful, this approach using `knit_expand()` only works if your
template only contains text. If you need to print something more complicated in
the document, you need to use child documents instead. For example, suppose that
instead of a table we wanted to show a plot made using `{ggplot2}`. This would
not work, because a ggplot object is not made of text, but is a list with many
elements. The `print()` method for ggplot objects then does some magic and
prints a plot. But if you want to show plots using `knitr::knit_expand()`, then
the contents of the list will be shown, not the plot itself. This is where child
documents come in. Child documents are exactly what you think they are: they’re
smaller documents that get knitted and then embedded into the parent document.
You can define anything within these child documents, and as such you can even
use them to print more complex objects, like a ggplot object. Let’s go back to
the example from before and make use of a child document (for ease of
presentation, we will not use a separate Rmd file, but will inline the child
document into the main document). Read the Rmd example below carefully, as all
the steps are explained:
````{verbatim}
---
title: "Templating with child documents"
output: html_document
date: "2023-01-27"
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(ggplot2)
```
## A function that creates ggplots
```{r}
create_plot <- function(dataset, aesthetic){
ggplot(dataset) +
geom_point(aesthetic)
}
```
The function above takes a dataset and an aesthetic
made using `ggplot2::aes()` to create a plot:
```{r}
create_plot(mtcars, aes(y = mpg, x = hp))
```
Let’s suppose that we want to generate a document
that would look like this:
- first a section title, with the dataset used;
- then a plot
So it would look like this:
## Dataset used: "mtcars"
```{r}
create_plot(mtcars, aes(y = mpg, x = hp))
```
We don’t want to create these sections for every
aesthetic by hand.
Instead, we can make use of a child document that
gets knitted separately and then embedded in the
parent document. The chunk below makes use of this trick:
```{r, results = "asis"}
x <- list(aes(y = mpg, x = hp),
aes(y = mpg, x = hp, size = am))
res <- lapply(x,
function(dataset, x){
knitr::knit_child(text = c(
'\n',
'## Dataset used: `r deparse(substitute(dataset))`',
'\n',
'```{r, echo = F}',
'print(create_plot(dataset, x))',
'```'
),
envir = environment(),
quiet = TRUE)
}, dataset = mtcars)
cat(unlist(res), sep = "\n")
```
The child document is the `text` argument to the
`knit_child()` function. `text` is literal R Markdown
code: we define a level 2 header, and then an R chunk.
This child document gets knitted, so we need to specify
the environment in which it should get knitted. This means
that the child document will get knitted in the same
environment as the parent document (our current global
environment). This way, every package that gets loaded
and every function or variable that got defined in the
parent document will also be available to the child document.
To get the dataset name as a string, we use the
`deparse(substitute(dataset))` trick; this substitutes
"dataset" by its bound value, so `mtcars`. But `mtcars` is
an expression and we don’t want it to get evaluated, or the
contents of the entire dataset would be used in the title
of the section. So we use `deparse()` which turns unevaluated
expressions into strings.
We then use `lapply()` to loop over two aesthetics with an
anonymous function that encapsulates the child document. So we
get two child documents that get knitted, one per aesthetic.
This gets saved into variable `res`. This is thus a list of
knitted Markdown.
Finally, we need unlist `res` to actually merge the Markdown
code from the child documents into the parent document.
````
See the output
[here](https://fastidious-bavarois-3b17b9.netlify.app/)^[https://is.gd/aR2hyz].
Here again, take some time to play with the above example. Change the child
document, try to print other types of output, really take your time to
understand this. To know more about child documents, take a look at [this
section](https://bookdown.org/yihui/rmarkdown-cookbook/child-document.html#child-document)^[https://is.gd/gAqzf9]
of the R Markdown Cookbook [@xie2020r].
By the way, if you wish to add a table of contents to your document, change the
header to this:
````{verbatim}
---
title: "Templating with child documents and TOC"
output:
html_document:
toc: true
toc_float: true
date: "2023-01-27"
---
````
### Tables in R Markdown documents
Getting tables right in Rmd documents is not always an easy task. There are
several packages specifically made just for this task, and the package
that I recommend tick the following two important boxes:
- Work the same way regardless of output format (Word, PDF or Html);
- Work for any type of table: summary tables, regression tables, two-way tables, etc.
Let's start with the simplest type of table, which would be a table that simply
shows some rows of data. `{knitr}` comes with the `kable()` function, but this
function generates a very plain looking output. For something
publication-worthy, we recommend the `{flextable}` package, developed by
@gohel23:
```{r, eval = FALSE}
library(flextable)
my_table <- head(mtcars)
flextable(my_table) |>
set_caption(caption = "Head of the mtcars dataset") |>
theme_booktabs()