Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ptype information to recipes #1329

Merged
merged 10 commits into from
Jun 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -602,6 +602,7 @@ export(recipes_extension_check)
export(recipes_names_outcomes)
export(recipes_names_predictors)
export(recipes_pkg_check)
export(recipes_ptype)
export(recipes_remove_cols)
export(remove_original_cols)
export(remove_role)
Expand Down
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

* New `extract_fit_time()` method has been added that returns the time it took to train the recipe. (#1071)

* Developer helper function `recipes_ptype()` has been added, returning expected input data for `prep()` and `bake()` for a given recipe object. (#1329)

* The `prefix` argument of `step_dummy_multi_choice()` is not properly documented. (#1298)

* `step_dummy()` now gives an informative error on attempt to generate too many columns to fit in memory. (#828)
Expand Down
5 changes: 5 additions & 0 deletions R/developer.R
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,11 @@
#'
#' # Interacting with recipe objects
#'
#' [recipes_ptype()] returns the ptype, expected variables and types, that a
#' recipe object expects at `prep()` and `bake()` time. Controlled using the
#' `stage` argument. Can be used by functions that interact with recipes to
#' verify data is correct before passing it to `prep()` and `bake()`.
#'
#' [detect_step()] returns a logical indicator to determine if a given step or
#' check is included in a recipe.
#'
Expand Down
113 changes: 113 additions & 0 deletions R/ptype.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
#' Prototype of recipe object
#'
#' This helper function returns the prototype of the input data set expected by
#' the recipe object.
#'
#' @param x A `recipe` object.
#' @param ... currently not used.
#' @param stage A single character. Must be one of `"prep"` or `"bake"`. See
#' details for more. Defaults to `"prep"`.
#'
#' @details
#' The returned ptype is a tibble of the data set that the recipe object is
#' expecting. The specifics of which columns depend on the `stage`.
#'
#' At `prep()` time, when `stage = "prep"`, the ptype is the data passed to
#' `recipe()`. The following code chunk represents a possible recipe scenario.
#' `recipes_ptype(rec_spec, stage = "prep")` and
#' `recipes_ptype(rec_prep, stage = "prep")` both return a ptype tibble
#' corresponding to `data_ptype`. This information is used internally in
#' `prep()` to verify that `data_training` has the right columns with the right
#' types.
#'
#' ```r
#' rec_spec <- recipe(outcome ~ ., data = data_ptype) %>%
#' step_normalize(all_numeric_predictors()) %>%
#' step_dummy(all_nominal_predictors())
#'
#' rec_prep <- prep(rec_spec, training = data_training)
#' ```
#'
#' At `bake()` time, when `stage = "bake"`, the ptype represents the data
#' that are required for `bake()` to run.
#'
#' ```r
#' data_bake <- bake(rec_prep, new_data = data_testing)
#' ```
#'
#' What this means in practice is that unless otherwise specified, everything
#' but outcomes and case weights are required. These requirements can be changed
#' with `update_role_requirements()`, and `recipes_ptype()` respects those
#' changes.
#'
#' Note that the order of the columns aren't guaranteed to align with
#' `data_ptype` as the data internally is ordered according to roles.
#'
#' @return A zero row tibble.
#' @keywords internal
#'
#' @seealso [developer_functions]
#'
#' @examples
#' training <- tibble(
#' y = 1:10,
#' id = 1:10,
#' x1 = letters[1:10],
#' x2 = factor(letters[1:10]),
#' cw = hardhat::importance_weights(1:10)
#' )
#' training
#'
#' rec_spec <- recipe(y ~ ., data = training)
#'
#' # outcomes and case_weights are not required at bake time
#' recipes_ptype(rec_spec, stage = "prep")
#' recipes_ptype(rec_spec, stage = "bake")
#'
#' rec_spec <- recipe(y ~ ., data = training) %>%
#' update_role(x1, new_role = "id")
#'
#' # outcomes and case_weights are not required at bake time
#' # "id" column is assumed to be needed
#' recipes_ptype(rec_spec, stage = "prep")
#' recipes_ptype(rec_spec, stage = "bake")
#'
#' rec_spec <- recipe(y ~ ., data = training) %>%
#' update_role(x1, new_role = "id") %>%
#' update_role_requirements("id", bake = FALSE)
#'
#' # update_role_requirements() is used to specify that "id" isn't needed
#' recipes_ptype(rec_spec, stage = "prep")
#' recipes_ptype(rec_spec, stage = "bake")
#'
#' @export
recipes_ptype <- function(x, ..., stage = "prep") {
check_dots_empty0(...)

if (is.null(x$ptype)) {
cli::cli_abort(
c(
x = "Doesn't work on recipes created prior to version 1.1.0.",
i = "Please recreate recipe."
)
)
}

ptype <- x$ptype

stage <- rlang::arg_match(stage, values = c("prep", "bake"))

if (stage == "bake") {
required_roles <- compute_bake_role_requirements(x)

var_info <- x$var_info
roles <- var_info$role
roles <- chr_explicit_na(roles)

required_var <- var_info$variable[required_roles[roles]]

ptype <- ptype[names(ptype) %in% required_var]
}

ptype
}
3 changes: 2 additions & 1 deletion R/recipe.R
Original file line number Diff line number Diff line change
Expand Up @@ -184,7 +184,8 @@ recipe.data.frame <-
template = x,
levels = NULL,
retained = NA,
requirements = requirements
requirements = requirements,
ptype = vctrs::vec_ptype(x)
)
class(out) <- "recipe"
out
Expand Down
1 change: 1 addition & 0 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,7 @@ reference:
- prepper
- recipes_eval_select
- recipes_extension_check
- recipes_ptype
- recipes-role-indicator
- update.step
- title: Tidy Methods
Expand Down
5 changes: 5 additions & 0 deletions man/developer_functions.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

93 changes: 93 additions & 0 deletions man/recipes_ptype.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 9 additions & 0 deletions tests/testthat/_snaps/ptype.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# recipes_ptype errors on old recipes

Code
recipes_ptype(rec)
Condition
Error in `recipes_ptype()`:
x Doesn't work on recipes created prior to version 1.1.0.
i Please recreate recipe.

Loading
Loading