-
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
152 lines (115 loc) · 4.46 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# explorecourses <img src="man/figures/logo-explorecourses.png" align="right" alt="Logo: a person looking at all the courses to choose from" width="150"/>
<!-- badges: start -->
[![R-CMD-check](https://github.com/coatless-rpkg/explorecourses/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/coatless-rpkg/explorecourses/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->
> [!IMPORTANT]
>
> This package is part of a homework exercise for STATS 290 regarding data mining
> and web APIs.
The goal of `explorecourses` is to automatically retrieve course information from
Stanford University's [ExploreCourses](https://explorecourses.stanford.edu/) API.
## Installation
You can install the development version of explorecourses from [GitHub](https://github.com/) with:
``` r
# install.packages("remotes")
remotes::remotes("coatless-rpkg/explorecourses")
```
## Usage
First, load the package:
```{r}
#| eval: false
library(explorecourses)
```
The package contains three main functions:
1. `fetch_all_courses()`: Fetches all courses from the ExploreCourses API for a set of departments (Default: all).
2. `fetch_department_courses()`: Fetches the courses for a specific department.
3. `fetch_departments()`: Fetches the list of departments from the ExploreCourses API.
By default, we'll retrieve all courses across all departments for the current
academic year using:
```{r}
#| eval: false
all_courses <- fetch_all_courses()
```
We can also request specific courses for a set of departments in a given academic year. For example, to retrieve all courses for the departments of "STATS" and "MATH" for the academic year 2023-2024, we can use:
```{r}
#| eval: false
stats_and_math_courses <- fetch_all_courses(c("STATS", "MATH"), year = "20232024")
```
This function is excellent for retrieving course information across multiple departments for a given academic year as it allows for parallel processing of the data.
For a single department, we can use the `fetch_department_courses()` function to
retrieve the courses for that department in any academic year. This function's
overhead is lower as it does not support parallel processing. For example, to
retrieve all courses for the "STATS" department, we can use:
```{r}
#| eval: false
department_courses <- fetch_department_courses("STATS")
```
To determine possible department shortcodes, we can use:
```{r}
#| eval: false
departments <- fetch_departments()
```
This will return a data frame with the department short name, long name, and school
the department is associated with.
### Cache
To cache the data, we can use the `cache_dir` parameter in the `fetch_all_courses()`,
`fetch_department_courses()`, and `fetch_departments()` functions. This
will cause the XML data downloaded from the API to be stored in the specified
directory and reused on subsequent calls.
We can list the current cache contents using the `list_cache()` function:
```{r}
#| eval: false
list_cache() # List current cache
```
```r
# Cache contents:
#
# Found 256 cached files
# Directory: explorecourses_cache
#
# AA ACCT AFRICAAM ALP AMELANG
# AMHRLANG AMSTUD ANES ANTHRO APPPHYS
# ARABLANG ARCHLGY ARMELANG ARTHIST ARTSINST
# ...
```
### Parallel Processing
We can speed up the process of fetching and transforming course data
by using parallel processing. For the `fetch_all_courses()` function, we've
set up parallel processing using the `furrr` package, which provides `purrr`'s
functional interface to the `future` parallel processing library. As a result,
we will be able to download and process all courses for every department in
parallel. Moreover, we've set up progress reporting using the `progressr`
package to track the progress of the parallel processing.
```{r}
#| eval: false
library(explorecourses)
library(future)
library(progressr)
# Set up parallel processing
plan(multisession)
# Set up progress reporting
handlers(handler_progress())
# Show progress bar for fetching all courses
with_progress({
# Fetch all courses for the departments in parallel
all_courses <- fetch_all_courses()
})
# Reset to sequential processing
plan(sequential)
```
Please note, we need to ensure we deactivate the `multisession` plan by resetting
it to `sequential` after we've finished using it.
## License
AGPL (>= 3)