Skip to content

ucdavisdatalab/adventures_in_data_science

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IST8X: Adventures in Data Science

UC Davis DataLab, Winter 2021

Common Links

Requirements & Setup

Install LFS

This repo uses Git Large File Storage (git LFS) for large files. If you don't have git LFS installed, download it and run the installer. Then in the shell (in any directory), run:

git lfs install

Then your one-time setup of git LFS is done. Next, clone this repo with git clone. The large files will be downloaded automatically with the rest of the repo.

Install Necessary Packages

When building the course reader, all of the code used to make all of the chapters will be run in your local R environment. That means you need to be able to run all of the code for chapters that were developed by any of the instructors. In particular, you will need to have installed all of the R packages that are used anywhere in the reader. During the site build process, R will quit with an error when it is asked to used a package that isn't installed on your machine. When this happens, you can look at what package was called for, and then install it before attempting a new build. However, this is a slow and frustrating process, so the following list of packages should be installed before trying to build the site (please add any that your chapter uses):

  • bookdown
  • cluster
  • dplyr
  • ggforce
  • ggformula
  • ggplot2
  • kableExtra
  • knitr
  • mosaic
  • mosaicModel -- best installed from GitHub to avoid a bug on the CRAN version with remotes::install_github("ProjectMOSAIC/mosaicModel")
  • mvtnorm
  • pdftools
  • remotes
  • rlang
  • rvest
  • statnet
  • stringr
  • tesseract
  • tidyr
  • tm
  • tokenizers
  • visNetwork
  • wakefield
  • xml2
  • sf
  • mapview
  • gdtools
  • leafem
  • leaflet

Protocols

The course reader is a live webpage, hosted through GitHub. While you are free to direct students to any readings that you'd like them to complete ahead of class, sometimes it makes more sense to simply explain a concept or demonstrate some code yourself. In this case, the reader is meant to be a space where you can enter this information and post it to a public-facing site for students. Supplementary examples for other readings are also appropriate content for the reader, but please refrain from using it as a space to host slides. External readings should be uploaded directly to Canvas or linked from the syllabus.

To make alterations to the reader:

  1. Run git pull (if it's your first time editing, first see the "Requirements & Setup" section of this document).

  2. At the top level of the repo, create a new R Markdown file (.Rmd) for your chapter, or edit an existing one. Enter your text, code, and other information directly into the file. Make sure your file:

    • Follows the naming scheme ##_topic-of-chapter.Rmd (an exception to this naming scheme is index.Rmd, which contains the reader's index page).
    • Begins with a first-level header (like # This). This is how your chapter will be named in the navigation sidebar. Further section titles should be hierarchically under this one, so begin with second-level headers (like ## This) or below (just keep adding a # symbol for each level of hierarchy).
    • Uses caching for resource-intensive code (see the "Resource-intensive Code" section of this document).
  3. Put any supporting media in the data/ or img/ directories. For large files, see the "Large Files" section of this document. Images generated by your R code (such is plots) will be automatically be saved in the docs/ folder when you run step 4, so there's no need to specifically save those.

  4. Run the script gen_html.R to regenerate the HTML files in the docs/ directory. You can do this in the shell with ./gen_html.R or in R with source("gen_html.R").

  5. When you're finished, git add:

    • Any files you edited directly
    • Any supporting media you added to docs/ or img/
    • The entire docs/ directory
    • The entire _bookdown_files/ directory (contains the R Markdown cache)
    • The .gitattributes file (if you added a large file)

    Then git commit and git push. The live web page will update automatically after a few minutes.

Resource-intensive Code

If one of your code chunks takes a lot of time or memory to run, consider caching the result, so the chunk won't run every time someone knits the reader. To cache a code chunk, add cache=TRUE in the chunk header. It's best practice to label cached chunks, like so:

```{r YOUR_CHUNK_NAME, cache=TRUE}
# Your code...
```

Cached files are stored in the _bookdown_files/ directory. If you ever want to clear the cache, you can delete this directory (or its subdirectories). The cache will be rebuilt the next time you knit the reader.

Beware that caching doesn't work with some packages, especially packages that use external libraries. Because of this, it's best to leave caching off for code chunks that are not resource-intensive.

Large Files

If you want to include a large file (say over 1 MB), you should use git LFS. You can register a large file with git LFS with the shell command:

git lfs track YOUR_FILE

This command updates the .gitattributes file at the top level of the repo. To make sure the change is saved, you also need to run:

git add .gitattributes

Now that your large is registered with git LFS, you can add, commit, and push the file with git the same way you would any other file, and git LFS will automatically intercede as needed.

GitHub provides 1 GB of storage and 1 GB of monthly bandwidth free per repo for large files. If your large file is more than 50 MB, check with the other contributors before adding it.

Repository Layout

  • docs -- output HTML files
  • img -- image files used in chapters
  • _bookdown.yml -- bookdown settings (mostly where files are)
  • _common.R -- R code to run at beginning of R session for each chapter
  • gen_html.R -- R script to generate the HTML files
  • index.Rmd -- index page
  • _output.yml -- bookdown settings (mostly formatting)

Issue Tracking

If, as you're teaching, you notice students having trouble with a sequence of commands, a workflow configuration, or the particular setup of their machines, please make note of this. We'd like to keep track of these problems so that the next time this course is taught, they can be used as a reference. Not all issues are appropriate for a public site like this, however, so enter your issues in a private spreadsheet, which you can find here.