-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Three monitoring functions (for use in dashboards) #92
Conversation
The three functions in action: library(vetiver)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(parsnip)
data(Chicago, package = "modeldata")
Chicago <- Chicago %>% select(ridership, date, one_of(stations))
training_data <- Chicago %>% filter(date < "2009-01-01")
testing_data <- Chicago %>% filter(date >= "2009-01-01", date < "2011-01-01")
monitoring <- Chicago %>% filter(date >= "2011-01-01", date < "2012-12-31")
lm_fit <- linear_reg() %>% fit(ridership ~ ., data = training_data)
library(pins)
b <- board_temp()
## before starting monitoring, initiate the metrics and pin
## (for example, with the testing data):
original_metrics <-
augment(lm_fit, new_data = testing_data) %>%
vetiver_compute_metrics(date, "week", ridership, .pred, .every = 4L) %>%
vetiver_pin_metrics(date, b, "lm_fit_metrics", initiate = TRUE)
#> Guessing `type = 'rds'`
#> Creating new version '20220511T031342Z-082d5'
#> Writing to pin 'lm_fit_metrics'
original_metrics
#> # A tibble: 81 × 5
#> date n .metric .estimator .estimate
#> <date> <int> <chr> <chr> <dbl>
#> 1 2009-01-01 7 mae standard 5.25
#> 2 2009-01-01 7 rmse standard 6.78
#> 3 2009-01-01 7 rsq standard 0.154
#> 4 2009-01-08 28 mae standard 2.98
#> 5 2009-01-08 28 rmse standard 4.61
#> 6 2009-01-08 28 rsq standard 0.576
#> 7 2009-02-05 28 mae standard 1.17
#> 8 2009-02-05 28 rmse standard 1.90
#> 9 2009-02-05 28 rsq standard 0.916
#> 10 2009-03-05 28 mae standard 0.937
#> # … with 71 more rows
## to continue monitoring with new data, compute metrics and update pin:
new_metrics <-
augment(lm_fit, new_data = monitoring) %>%
vetiver_compute_metrics(date, "week", ridership, .pred, .every = 4L) %>%
vetiver_pin_metrics(date, b, "lm_fit_metrics")
#> Guessing `type = 'rds'`
#> Replacing version '20220511T031342Z-082d5' with '20220511T031342Z-23a00'
#> Writing to pin 'lm_fit_metrics'
new_metrics
#> # A tibble: 162 × 5
#> date n .metric .estimator .estimate
#> <date> <int> <chr> <chr> <dbl>
#> 1 2009-01-01 7 mae standard 5.25
#> 2 2009-01-01 7 rmse standard 6.78
#> 3 2009-01-01 7 rsq standard 0.154
#> 4 2009-01-08 28 mae standard 2.98
#> 5 2009-01-08 28 rmse standard 4.61
#> 6 2009-01-08 28 rsq standard 0.576
#> 7 2009-02-05 28 mae standard 1.17
#> 8 2009-02-05 28 rmse standard 1.90
#> 9 2009-02-05 28 rsq standard 0.916
#> 10 2009-03-05 28 mae standard 0.937
#> # … with 152 more rows
library(ggplot2)
vetiver_plot_metrics(new_metrics, date) +
scale_size(range = c(2, 4)) Created on 2022-05-10 by the reprex package (v2.0.1) |
@@ -42,6 +42,9 @@ Suggests: | |||
callr, | |||
caret, | |||
covr, | |||
curl, | |||
dplyr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR adds A LOT of packages here. I put all the packages for monitoring in Suggests
because I am up against 20 packages in Imports already and I can sort of make the argument that people may want to deploy a model but not monitor it. I am very open to ideas for reducing this somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm only using dplyr in examples and tests FWIW 🤷♀️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments are mainly around two themes:
- Lots of yardstick specific tweaks to
vetiver_compute_metrics()
. I decided it was easier to dump in a rewrite of this function (you'll see that below), and I added comments about what exactly I was changing - I think
vetiver_pin_metrics()
might be two different functions:vetiver_initialize_metrics()
andvetiver_update/append_metrics()
, so you can get rid of the special "first time" behavior you have to do withinitialize = TRUE
, which I think is a little hard to discover
date_var, | ||
.period, | ||
truth, estimate, ..., | ||
metric_set = yardstick::metrics, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be an issue if yardstick is Suggests and you have this here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked with the R-CMD-check-hard
action like in broom and it was fine, so I'm going to risk it (vs. the NOTE).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked again (with CRAN back up this time) and it still looks OK to have yardstick::metrics()
here. 🤞
.estimate = .estimate, | ||
.metric = .metric, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.estimate = .estimate, | |
.metric = .metric, | |
estimate = .estimate, | |
metric = .metric, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just because I think we only ever add .
to arguments if they have the potential to conflict with named ...
(I realize it is a little weird here because the column name is .estimate
, so if you want to keep this then I wouldn't mind)
Co-authored-by: Davis Vaughan <davis@rstudio.com>
Thank you so much for the thorough review @DavisVaughan! Looks like this now: library(vetiver)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(parsnip)
data(Chicago, package = "modeldata")
Chicago <- Chicago %>% select(ridership, date, all_of(stations))
training_data <- Chicago %>% filter(date < "2009-01-01")
testing_data <- Chicago %>% filter(date >= "2009-01-01", date < "2011-01-01")
monitoring <- Chicago %>% filter(date >= "2011-01-01", date < "2012-12-31")
lm_fit <- linear_reg() %>% fit(ridership ~ ., data = training_data)
library(pins)
b <- board_temp()
## before starting monitoring, initiate the metrics and pin
## (for example, with the testing data):
original_metrics <-
augment(lm_fit, new_data = testing_data) %>%
vetiver_compute_metrics(date, "week", ridership, .pred, .every = 4L)
pin_write(b, original_metrics, "lm_fit_metrics")
#> Guessing `type = 'rds'`
#> Creating new version '20220523T032938Z-d51cc'
#> Writing to pin 'lm_fit_metrics'
## to continue monitoring with new data, compute metrics and update pin:
new_metrics <-
augment(lm_fit, new_data = monitoring) %>%
vetiver_compute_metrics(date, "week", ridership, .pred, .every = 4L) %>%
vetiver_pin_metrics(b, "lm_fit_metrics")
#> Guessing `type = 'rds'`
#> Replacing version '20220523T032938Z-d51cc' with '20220523T032938Z-73a5d'
#> Writing to pin 'lm_fit_metrics'
library(ggplot2)
vetiver_plot_metrics(new_metrics) +
scale_size(range = c(2, 4)) Created on 2022-05-22 by the reprex package (v2.0.1) I have one outstanding question that I would still appreciate any ideas on. You recommended taking out the
|
I mentioned it in the comments, but I'll add it again here: For the specific case of |
This PR adds three functions:
vetiver_compute_metrics()
vetiver_pin_metrics()
vetiver_plot_metrics()