-
Notifications
You must be signed in to change notification settings - Fork 31
/
Copy pathdocument.qmd
132 lines (109 loc) · 3.42 KB
/
document.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
---
title: "ETL Step 1 - Raw Data Refresh"
date: "`r lubridate::now(tzone = 'EST')` EST"
output: html_document
editor_options:
chunk_output_type: console
---
This notebook uses the [bikeHelpR](https://pkg.current.posit.team/client/#/repos/internal-r/packages/bikeHelpR) package to update a database with the latest bike share data. The package pulls data from <https://capitalbikeshare.com> which provides an API to access bike share data. The raw data is written to the *Content DB* database to the *bike_raw_data* table.
```{r setup, include=FALSE}
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
library(dplyr)
library(dbplyr)
library(odbc)
library(DBI)
```
## Get data from API
Use the the `bikeHelpR` package to get the latest data from <https://capitalbikeshare.com>.
### Station status
```{r station_status_data}
feeds_station_status <-
bikeHelpR::feeds_urls() %>%
filter(name == "station_status") %>%
pull(url) %>%
bikeHelpR::get_data()
station_status <-
feeds_station_status %>%
magrittr::extract2("data") %>%
dplyr::mutate(time = feeds_station_status$last_updated) %>%
dplyr::select(
is_installed,
num_bikes_available,
last_reported,
is_renting,
eightd_has_available_keys,
num_docks_available,
num_docks_disabled,
is_returning,
station_id, num_ebikes_available,
num_bikes_disabled,
time
)
glimpse(station_status)
```
### Station info
```{r station_info_data}
# The station information endpoint.
station_information_url <-
bikeHelpR::feeds_urls() %>%
filter(name == "station_information") %>%
pull(url)
# Call the endpoint to obtain the JSON data.
request <- httr2::request(station_information_url)
response <- httr2::req_perform(request)
json_data <- httr2::resp_body_json(response)
# Convert the JSON data into a tibble.
station_info <-
json_data$data %>%
as_tibble() %>%
tidyr::unnest_wider(stations) %>%
select(station_id, name, lat, lon) %>%
distinct() %>%
mutate(
last_updated = as.POSIXct(
json_data$last_updated,
origin = "1970-01-01 00:00:00 UTC"
)
)
glimpse(station_info)
```
## Update database
Write the new data from the API to the database.
```{r write_to_db}
con <- dbConnect(
odbc::odbc(),
Driver = "postgresql",
Server = Sys.getenv("DB_SERVER"),
Port = "5432",
Database = "soleng",
UID = Sys.getenv("DB_USER"),
PWD = Sys.getenv("DB_PASSWORD"),
BoolsAsChar = "",
timeout = 10
)
dbWriteTable(con,
name = DBI::Id(schema="content", name="bike_raw_data"),
station_status, append = TRUE)
dbWriteTable(con,
name = DBI::Id(schema="content", name="bike_station_info"),
station_info, overwrite = TRUE)
dbDisconnect(con)
print("Raw data updated 🎉")
```
## Update the pin
Station info will also be written to a pin. This pin will be accessed by the shiny app so that it can easily get the bike station info without connecting to the database.
```{r pin_station_info}
board <- pins::board_connect()
# Write the model to the board.
pins::pin_write(
board,
x = station_info,
type = "csv",
name = "bike-predict-r-station-info-pin",
title = "Bikeshare Prediction - ETL output - Pinned Station Info",
description = "Bike station info from https://capitalbikeshare.com."
)
# prune to save only the last 10 versions
pins::pin_versions_prune(board, "bike-predict-r-station-info-pin", 10)
print("Pin 🎉")
```