consider allowing user to toggle whether their `adjust_predictions_custom()` requires non-trivial fit #62

simonpcouch · 2024-12-11T15:28:06Z

Lines 45 to 47 in 09995d6

    
           # todo: should there be a user interface to tell tailor whether this 
        
           # adjustment requires fit? 
        
           requires_fit = FALSE

simonpcouch · 2024-12-11T19:42:11Z

Took a moment to implement this but wasn't able to think of how a user could currently supply an expression that requires estimation. As a hypothetical, how could a user take the mean-center predictions based on training data using this interface? e.g. this does not work:

library(tailor)
library(tibble)

set.seed(1)
d_calibration <- tibble(y = rnorm(100), y_pred = y/2 + rnorm(100))
d_test <- tibble(y = rnorm(100), y_pred = y/2 + rnorm(100))

d_calibration
#> # A tibble: 100 × 2
#>         y y_pred
#>     <dbl>  <dbl>
#>  1 -0.626 -0.934
#>  2  0.184  0.134
#>  3 -0.836 -1.33 
#>  4  1.60   0.956
#>  5  0.330 -0.490
#>  6 -0.820  1.36 
#>  7  0.487  0.960
#>  8  0.738  1.28 
#>  9  0.576  0.672
#> 10 -0.305  1.53 
#> # ℹ 90 more rows

tlr <-
  tailor() %>%
  adjust_predictions_custom(y_pred = y_pred - mean(y_pred))

tlr_fit <- fit(tlr, outcome = y, estimate = y_pred, .data = d_calibration)

d_test_res <- predict(tlr_fit, d_test)

# the test data mean is subtracted rather than the calibration data
all(d_test_res$y_pred == d_test$y_pred - mean(d_calibration$y_pred))
#> [1] FALSE
all(d_test_res$y_pred == d_test$y_pred - mean(d_test$y_pred))
#> [1] TRUE

^{Created on 2024-12-11 with reprex v2.1.1}

Do we want to:

Try to figure out how to support arbitrary expressions that could require separate estimation for training/testing
Leave this as-is, always setting requires_fit = FALSE, and document that calculations shouldn't require estimation on separate sets of data, or
Do 2) and additionally try to add some logic to detect that a data-dependent operation is happening and warn about it?

cc @topepo

topepo · 2024-12-11T20:05:58Z

Regarding the third point, here's a test case:

library(tailor)
library(tibble)

set.seed(1)
d_calibration <- tibble(y = rnorm(100), y_pred = y/2 + rnorm(100))
d_test <- tibble(y = rnorm(100), y_pred = y/2 + rnorm(100))

lm_cal <- lm(y ~ y_pred, data = d_calibration)

# tesing things out
predict(lm_cal, d_test[1:3, "y_pred"])
#>          1          2          3 
#> 0.49822957 0.02988542 1.09795293

tlr <-
  tailor() %>%
  adjust_predictions_custom(y_pred = predict(lm_cal, data.frame(y_pred = y_pred)))

tlr_fit <- fit(tlr, outcome = y, estimate = y_pred, .data = d_calibration)

d_test_res <- predict(tlr_fit, d_test[1:3,])
d_test_res
#> # A tibble: 3 × 2
#>       y y_pred
#>   <dbl>  <dbl>
#> 1 0.409 0.498 
#> 2 1.69  0.0299
#> 3 1.59  1.10

# ... but 
tlr_fit$adjustments[[1]]$arguments
#> $commands
#> <list_of<quosure>>
#> 
#> $y_pred
#> <quosure>
#> expr: ^predict(lm_cal, data.frame(y_pred = y_pred))
#> env:  0x12d91ca20
#> 
#> 
#> $pkgs
#> character(0)

^{Created on 2024-12-11 with reprex v2.1.0}

and then, in another session:

library(tailor)
library(tibble)

set.seed(1)
d_calibration <- tibble(y = rnorm(100), y_pred = y/2 + rnorm(100))
d_test <- tibble(y = rnorm(100), y_pred = y/2 + rnorm(100))

load("~/tmp/tlr_fit.RData")
tlr_fit$adjustments[[1]]$arguments
#> $commands
#> <list_of<quosure>>
#> 
#> $y_pred
#> <quosure>
#> expr: ^predict(lm_cal, data.frame(y_pred = y_pred))
#> env:  0x12b7c04a8
#> 
#> 
#> $pkgs
#> character(0)

predict(tlr_fit, d_test[1:3,])
#> Error in `dplyr::mutate()`:
#> ℹ In argument: `y_pred = predict(lm_cal, data.frame(y_pred = y_pred))`.
#> Caused by error:
#> ! object 'lm_cal' not found

^{Created on 2024-12-11 with reprex v2.1.0}

(EDIT to show results in new session)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consider allowing user to toggle whether their `adjust_predictions_custom()` requires non-trivial fit #62

consider allowing user to toggle whether their `adjust_predictions_custom()` requires non-trivial fit #62

simonpcouch commented Dec 11, 2024

simonpcouch commented Dec 11, 2024

topepo commented Dec 11, 2024 •

edited

Loading

consider allowing user to toggle whether their adjust_predictions_custom() requires non-trivial fit #62

consider allowing user to toggle whether their adjust_predictions_custom() requires non-trivial fit #62

Comments

simonpcouch commented Dec 11, 2024

simonpcouch commented Dec 11, 2024

topepo commented Dec 11, 2024 • edited Loading

consider allowing user to toggle whether their `adjust_predictions_custom()` requires non-trivial fit #62

consider allowing user to toggle whether their `adjust_predictions_custom()` requires non-trivial fit #62

topepo commented Dec 11, 2024 •

edited

Loading