Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider allowing user to toggle whether their adjust_predictions_custom() requires non-trivial fit #62

Open
simonpcouch opened this issue Dec 11, 2024 · 2 comments

Comments

@simonpcouch
Copy link
Contributor

# todo: should there be a user interface to tell tailor whether this
# adjustment requires fit?
requires_fit = FALSE

@simonpcouch
Copy link
Contributor Author

Took a moment to implement this but wasn't able to think of how a user could currently supply an expression that requires estimation. As a hypothetical, how could a user take the mean-center predictions based on training data using this interface? e.g. this does not work:

library(tailor)
library(tibble)

set.seed(1)
d_calibration <- tibble(y = rnorm(100), y_pred = y/2 + rnorm(100))
d_test <- tibble(y = rnorm(100), y_pred = y/2 + rnorm(100))

d_calibration
#> # A tibble: 100 × 2
#>         y y_pred
#>     <dbl>  <dbl>
#>  1 -0.626 -0.934
#>  2  0.184  0.134
#>  3 -0.836 -1.33 
#>  4  1.60   0.956
#>  5  0.330 -0.490
#>  6 -0.820  1.36 
#>  7  0.487  0.960
#>  8  0.738  1.28 
#>  9  0.576  0.672
#> 10 -0.305  1.53 
#> # ℹ 90 more rows

tlr <-
  tailor() %>%
  adjust_predictions_custom(y_pred = y_pred - mean(y_pred))

tlr_fit <- fit(tlr, outcome = y, estimate = y_pred, .data = d_calibration)

d_test_res <- predict(tlr_fit, d_test)

# the test data mean is subtracted rather than the calibration data
all(d_test_res$y_pred == d_test$y_pred - mean(d_calibration$y_pred))
#> [1] FALSE
all(d_test_res$y_pred == d_test$y_pred - mean(d_test$y_pred))
#> [1] TRUE

Created on 2024-12-11 with reprex v2.1.1

Do we want to:

  1. Try to figure out how to support arbitrary expressions that could require separate estimation for training/testing
  2. Leave this as-is, always setting requires_fit = FALSE, and document that calculations shouldn't require estimation on separate sets of data, or
  3. Do 2) and additionally try to add some logic to detect that a data-dependent operation is happening and warn about it?

cc @topepo

@topepo
Copy link
Member

topepo commented Dec 11, 2024

Regarding the third point, here's a test case:

library(tailor)
library(tibble)

set.seed(1)
d_calibration <- tibble(y = rnorm(100), y_pred = y/2 + rnorm(100))
d_test <- tibble(y = rnorm(100), y_pred = y/2 + rnorm(100))

lm_cal <- lm(y ~ y_pred, data = d_calibration)

# tesing things out
predict(lm_cal, d_test[1:3, "y_pred"])
#>          1          2          3 
#> 0.49822957 0.02988542 1.09795293

tlr <-
  tailor() %>%
  adjust_predictions_custom(y_pred = predict(lm_cal, data.frame(y_pred = y_pred)))

tlr_fit <- fit(tlr, outcome = y, estimate = y_pred, .data = d_calibration)

d_test_res <- predict(tlr_fit, d_test[1:3,])
d_test_res
#> # A tibble: 3 × 2
#>       y y_pred
#>   <dbl>  <dbl>
#> 1 0.409 0.498 
#> 2 1.69  0.0299
#> 3 1.59  1.10

# ... but 
tlr_fit$adjustments[[1]]$arguments
#> $commands
#> <list_of<quosure>>
#> 
#> $y_pred
#> <quosure>
#> expr: ^predict(lm_cal, data.frame(y_pred = y_pred))
#> env:  0x12d91ca20
#> 
#> 
#> $pkgs
#> character(0)

Created on 2024-12-11 with reprex v2.1.0

and then, in another session:

library(tailor)
library(tibble)

set.seed(1)
d_calibration <- tibble(y = rnorm(100), y_pred = y/2 + rnorm(100))
d_test <- tibble(y = rnorm(100), y_pred = y/2 + rnorm(100))

load("~/tmp/tlr_fit.RData")
tlr_fit$adjustments[[1]]$arguments
#> $commands
#> <list_of<quosure>>
#> 
#> $y_pred
#> <quosure>
#> expr: ^predict(lm_cal, data.frame(y_pred = y_pred))
#> env:  0x12b7c04a8
#> 
#> 
#> $pkgs
#> character(0)

predict(tlr_fit, d_test[1:3,])
#> Error in `dplyr::mutate()`:
#> ℹ In argument: `y_pred = predict(lm_cal, data.frame(y_pred = y_pred))`.
#> Caused by error:
#> ! object 'lm_cal' not found

Created on 2024-12-11 with reprex v2.1.0

(EDIT to show results in new session)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants