UC Davis DataLab, Winter 2021
- Course reader: https://ucdavisdatalab.github.io/adventures_in_data_science/index.html
- Course reader storage: https://datalab.ucdavis.edu/adventures-in-datascience
This repo uses Git Large File Storage (git LFS) for large files. If you don't have git LFS installed, download it and run the installer. Then in the shell (in any directory), run:
git lfs install
Then your one-time setup of git LFS is done. Next, clone this repo with git clone
. The large files will be downloaded automatically with the rest of the
repo.
When building the course reader, all of the code used to make all of the chapters will be run in your local R environment. That means you need to be able to run all of the code for chapters that were developed by any of the instructors. In particular, you will need to have installed all of the R packages that are used anywhere in the reader. During the site build process, R will quit with an error when it is asked to used a package that isn't installed on your machine. When this happens, you can look at what package was called for, and then install it before attempting a new build. However, this is a slow and frustrating process, so the following list of packages should be installed before trying to build the site (please add any that your chapter uses):
- bookdown
- cluster
- dplyr
- ggforce
- ggformula
- ggplot2
- kableExtra
- knitr
- mosaic
- mosaicModel -- best installed from GitHub to avoid a bug on the CRAN
version with
remotes::install_github("ProjectMOSAIC/mosaicModel")
- mvtnorm
- pdftools
- remotes
- rlang
- rvest
- statnet
- stringr
- tesseract
- tidyr
- tm
- tokenizers
- visNetwork
- wakefield
- xml2
- sf
- mapview
- gdtools
- leafem
- leaflet
The course reader is a live webpage, hosted through GitHub. While you are free to direct students to any readings that you'd like them to complete ahead of class, sometimes it makes more sense to simply explain a concept or demonstrate some code yourself. In this case, the reader is meant to be a space where you can enter this information and post it to a public-facing site for students. Supplementary examples for other readings are also appropriate content for the reader, but please refrain from using it as a space to host slides. External readings should be uploaded directly to Canvas or linked from the syllabus.
To make alterations to the reader:
-
Run
git pull
(if it's your first time editing, first see the "Requirements & Setup" section of this document). -
At the top level of the repo, create a new R Markdown file (
.Rmd
) for your chapter, or edit an existing one. Enter your text, code, and other information directly into the file. Make sure your file:- Follows the naming scheme
##_topic-of-chapter.Rmd
(an exception to this naming scheme isindex.Rmd
, which contains the reader's index page). - Begins with a first-level header (like
# This
). This is how your chapter will be named in the navigation sidebar. Further section titles should be hierarchically under this one, so begin with second-level headers (like## This
) or below (just keep adding a#
symbol for each level of hierarchy). - Uses caching for resource-intensive code (see the "Resource-intensive Code" section of this document).
- Follows the naming scheme
-
Put any supporting media in the
data/
orimg/
directories. For large files, see the "Large Files" section of this document. Images generated by your R code (such is plots) will be automatically be saved in thedocs/
folder when you run step 4, so there's no need to specifically save those. -
Run the script
gen_html.R
to regenerate the HTML files in thedocs/
directory. You can do this in the shell with./gen_html.R
or in R withsource("gen_html.R")
. -
When you're finished,
git add
:- Any files you edited directly
- Any supporting media you added to
docs/
orimg/
- The entire
docs/
directory - The entire
_bookdown_files/
directory (contains the R Markdown cache) - The
.gitattributes
file (if you added a large file)
Then
git commit
andgit push
. The live web page will update automatically after a few minutes.
If one of your code chunks takes a lot of time or memory to run, consider
caching the result, so the chunk won't run every time someone knits the
reader. To cache a code chunk, add cache=TRUE
in the chunk header. It's
best practice to label cached chunks, like so:
```{r YOUR_CHUNK_NAME, cache=TRUE}
# Your code...
```
Cached files are stored in the _bookdown_files/
directory. If you ever want
to clear the cache, you can delete this directory (or its subdirectories).
The cache will be rebuilt the next time you knit the reader.
Beware that caching doesn't work with some packages, especially packages that use external libraries. Because of this, it's best to leave caching off for code chunks that are not resource-intensive.
If you want to include a large file (say over 1 MB), you should use git LFS. You can register a large file with git LFS with the shell command:
git lfs track YOUR_FILE
This command updates the .gitattributes
file at the top level of the repo. To
make sure the change is saved, you also need to run:
git add .gitattributes
Now that your large is registered with git LFS, you can add, commit, and push the file with git the same way you would any other file, and git LFS will automatically intercede as needed.
GitHub provides 1 GB of storage and 1 GB of monthly bandwidth free per repo for large files. If your large file is more than 50 MB, check with the other contributors before adding it.
docs
-- output HTML filesimg
-- image files used in chapters_bookdown.yml
-- bookdown settings (mostly where files are)_common.R
-- R code to run at beginning of R session for each chaptergen_html.R
-- R script to generate the HTML filesindex.Rmd
-- index page_output.yml
-- bookdown settings (mostly formatting)
If, as you're teaching, you notice students having trouble with a sequence of commands, a workflow configuration, or the particular setup of their machines, please make note of this. We'd like to keep track of these problems so that the next time this course is taught, they can be used as a reference. Not all issues are appropriate for a public site like this, however, so enter your issues in a private spreadsheet, which you can find here.