-
Notifications
You must be signed in to change notification settings - Fork 7
3. Sample Workflow
Here we will go over some sample workflows that will show how things work.
In it's simplest form the fast_regression()
function will create 39 different model specifications (provided the packages are installed and loaded) and make predictions on the data. The function is referred to as fast because all of the model parameters are left to their defaults, so there is no model tuning happening.
Let's take a look at a sample fast regression workflow in it's simplest form.
library(recipes)
library(dplyr)
library(tidyAML)
rec_obj <- recipe(mpg ~ ., data = mtcars)
frt_tbl <- fast_regression(
.data = mtcars,
.rec_obj = rec_obj,
.parsnip_eng = c("lm","glm","gee"),
.parsnip_fns = "linear_reg"
)
glimpse(frt_tbl)
#> Rows: 3
#> Columns: 8
#> $ .model_id <int> 1, 2, 3
#> $ .parsnip_engine <chr> "lm", "gee", "glm"
#> $ .parsnip_mode <chr> "regression", "regression", "regression"
#> $ .parsnip_fns <chr> "linear_reg", "linear_reg", "linear_reg"
#> $ model_spec <list> [~NULL, ~NULL, NULL, regression, TRUE, NULL, lm, TRUE]…
#> $ wflw <list> [cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb, mp…
#> $ fitted_wflw <list> [cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb, mp…
#> $ pred_wflw <list> [<tbl_df[8 x 1]>], <NULL>, [<tbl_df[8 x 1]>]
> frt_tbl
# A tibble: 3 × 8
.model_id .parsnip_engine .parsnip_mode .parsnip_fns model_spec wflw fitted_wflw pred_wflw
<int> <chr> <chr> <chr> <list> <list> <list> <list>
1 1 lm regression linear_reg <spec[+]> <workflow> <workflow> <tibble>
2 2 gee regression linear_reg <spec[+]> <NULL> <NULL> <NULL>
3 3 glm regression linear_reg <spec[+]> <workflow> <workflow> <tibble>
Here we see that for the gee
parsnip engine that nothing was created. What this means is that the fundamental structure of the way the models are build is in its present state, flawed. Fortunately, the way these functions are written is that they utilize purrr::safely
behind the scenes so that where something fails, it does so with a modicum of grace. This does not mean however that the lm
and the glm
models are not useful. In fact as we see they have been generated successfully. Given this, let us examine each part of those models. Let's first check all of the model specs.
> frt_tbl |> pull(model_spec)
[[1]]
Linear Regression Model Specification (regression)
Computational engine: lm
[[2]]
! parsnip could not locate an implementation for `linear_reg` regression model specifications using
the `gee` engine.
ℹ The parsnip extension package multilevelmod implements support for this specification.
ℹ Please install (if needed) and load to continue.
Linear Regression Model Specification (regression)
Computational engine: gee
[[3]]
Linear Regression Model Specification (regression)
Computational engine: glm
The reason the gee
method failed is because that library multilevelmod
was not loaded. There are a few helper functions that can be used for this like load_deps()
.
> frt_tbl |> pull(wflw)
[[1]]
══ Workflow ═══════════════════════════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()
── Preprocessor ───────────────────────────────────────────────────────────────────────────────────────
0 Recipe Steps
── Model ──────────────────────────────────────────────────────────────────────────────────────────────
Linear Regression Model Specification (regression)
Computational engine: lm
[[2]]
NULL
[[3]]
══ Workflow ═══════════════════════════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()
── Preprocessor ───────────────────────────────────────────────────────────────────────────────────────
0 Recipe Steps
── Model ──────────────────────────────────────────────────────────────────────────────────────────────
Linear Regression Model Specification (regression)
Computational engine: glm
Again because of the previous failure for gee
no workflow was created.
> frt_tbl |> pull(fitted_wflw)
[[1]]
══ Workflow [trained] ═════════════════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()
── Preprocessor ───────────────────────────────────────────────────────────────────────────────────────
0 Recipe Steps
── Model ──────────────────────────────────────────────────────────────────────────────────────────────
Call:
stats::lm(formula = ..y ~ ., data = data)
Coefficients:
(Intercept) cyl disp hp drat wt qsec
42.72540 -1.99677 -0.02254 0.03581 1.90888 -0.35753 -0.14563
vs am gear carb
0.23074 3.58125 -2.93809 -1.26310
[[2]]
NULL
[[3]]
══ Workflow [trained] ═════════════════════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: linear_reg()
── Preprocessor ───────────────────────────────────────────────────────────────────────────────────────
0 Recipe Steps
── Model ──────────────────────────────────────────────────────────────────────────────────────────────
Call: stats::glm(formula = ..y ~ ., family = stats::gaussian, data = data)
Coefficients:
(Intercept) cyl disp hp drat wt qsec
42.72540 -1.99677 -0.02254 0.03581 1.90888 -0.35753 -0.14563
vs am gear carb
0.23074 3.58125 -2.93809 -1.26310
Degrees of Freedom: 23 Total (i.e. Null); 13 Residual
Null Deviance: 936.9
Residual Deviance: 59.11 AIC: 113.7
Again, gee
fails for the aforementioned reason.
Let's get the predictions:
> frt_tbl |> pull(pred_wflw)
[[1]]
# A tibble: 8 × 1
.pred
<dbl>
1 30.2
2 18.4
3 28.9
4 16.2
5 17.3
6 14.7
7 27.4
8 29.6
[[2]]
NULL
[[3]]
# A tibble: 8 × 1
.pred
<dbl>
1 30.2
2 18.4
3 28.9
4 16.2
5 17.3
6 14.7
7 27.4
8 29.6
Again we see that gee
failed.
Since this package is based off of and build off of parsnip
it fits nicely within the tidymodels
ecosystem. This means we can use things like broom on the models. Let's take a look:
> frt_tbl |> pull(fitted_wflw) |> map(broom::tidy)
[[1]]
# A tibble: 11 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 42.7 20.6 2.07 0.0589
2 cyl -2.00 1.20 -1.67 0.120
3 disp -0.0225 0.0174 -1.30 0.218
4 hp 0.0358 0.0246 1.46 0.169
5 drat 1.91 1.66 1.15 0.272
6 wt -0.358 1.89 -0.189 0.853
7 qsec -0.146 0.773 -0.188 0.853
8 vs 0.231 2.02 0.114 0.911
9 am 3.58 2.09 1.71 0.111
10 gear -2.94 1.66 -1.77 0.100
11 carb -1.26 0.738 -1.71 0.111
[[2]]
# A tibble: 0 × 0
[[3]]
# A tibble: 11 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 42.7 20.6 2.07 0.0589
2 cyl -2.00 1.20 -1.67 0.120
3 disp -0.0225 0.0174 -1.30 0.218
4 hp 0.0358 0.0246 1.46 0.169
5 drat 1.91 1.66 1.15 0.272
6 wt -0.358 1.89 -0.189 0.853
7 qsec -0.146 0.773 -0.188 0.853
8 vs 0.231 2.02 0.114 0.911
9 am 3.58 2.09 1.71 0.111
10 gear -2.94 1.66 -1.77 0.100
11 carb -1.26 0.738 -1.71 0.111
> frt_tbl |> pull(fitted_wflw) |> map(broom::glance)
[[1]]
# A tibble: 1 × 12
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 0.937 0.888 2.13 19.3 3.35e-6 10 -44.9 114. 128. 59.1 13 24
[[2]]
# A tibble: 0 × 0
[[3]]
# A tibble: 1 × 8
null.deviance df.null logLik AIC BIC deviance df.residual nobs
<dbl> <int> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 937. 23 -44.9 114. 128. 59.1 13 24
> frt_tbl |> pull(fitted_wflw) |> map(\(x) x |> broom::augment(new_data = mtcars))
[[1]]
# A tibble: 32 × 12
mpg cyl disp hp drat wt qsec vs am gear carb .pred
* <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 22.0
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 21.8
3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 30.2
4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 20.9
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 15.9
6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 20.7
7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 16.1
8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 22.6
9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 23.9
10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 18.4
# ℹ 22 more rows
# ℹ Use `print(n = ...)` to see more rows
[[2]]
# A tibble: 0 × 0
[[3]]
# A tibble: 32 × 12
mpg cyl disp hp drat wt qsec vs am gear carb .pred
* <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 22.0
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 21.8
3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 30.2
4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 20.9
5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 15.9
6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 20.7
7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 16.1
8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 22.6
9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 23.9
10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 18.4
# ℹ 22 more rows
# ℹ Use `print(n = ...)` to see more rows