-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ptype information to recipes #1329
Conversation
@juliasilge is this enough for what you need in {vetiver}? |
In the original, seeing from Davis:
And then from Emil:
A bit confused here. From what I can tell in the reprex in #793 (comment), workflows does error in this situation but does so uninformatively. That same uninformative error still occurs with this recipes PR installed—here's the same linked reprex with this PR: library(recipes)
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#>
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#>
#> step traning <- tibble(outcome = rnorm(1000), x = rnorm(1000), y = sample(letters, 1000, T))
rec1 <- recipe(outcome ~ ., traning) %>%
step_bin2factor(where(is.numeric), - all_outcomes())
# Original template `traning` has a character `y`
rec2 <- recipe(outcome ~ ., traning) %>%
step_bin2factor(all_numeric_predictors())
# But when we prep we have a numeric `y`
testing <- tibble(outcome = rnorm(1000), x = rnorm(1000), y = rnorm(1000))
prep(rec1, training = testing, fresh = TRUE)
#>
#> ── Recipe ──────────────────────────────────────────────────────────────────────
#>
#> ── Inputs
#> Number of variables by role
#> outcome: 1
#> predictor: 2
#>
#> ── Training information
#> Training data contained 1000 data points and no incomplete rows.
#>
#> ── Operations
#> • Dummy variable to factor conversion for: x and y | Trained prep(rec2, training = testing, fresh = TRUE)
#>
#> ── Recipe ──────────────────────────────────────────────────────────────────────
#>
#> ── Inputs
#> Number of variables by role
#> outcome: 1
#> predictor: 2
#>
#> ── Training information
#> Training data contained 1000 data points and no incomplete rows.
#>
#> ── Operations
#> • Dummy variable to factor conversion for: x | Trained prep(rec1, training = testing, fresh = FALSE)
#>
#> ── Recipe ──────────────────────────────────────────────────────────────────────
#>
#> ── Inputs
#> Number of variables by role
#> outcome: 1
#> predictor: 2
#>
#> ── Training information
#> Training data contained 1000 data points and no incomplete rows.
#>
#> ── Operations
#> • Dummy variable to factor conversion for: x and y | Trained prep(rec2, training = testing, fresh = FALSE)
#>
#> ── Recipe ──────────────────────────────────────────────────────────────────────
#>
#> ── Inputs
#> Number of variables by role
#> outcome: 1
#> predictor: 2
#>
#> ── Training information
#> Training data contained 1000 data points and no incomplete rows.
#>
#> ── Operations
#> • Dummy variable to factor conversion for: x | Trained rec1$var_info$type
#> [[1]]
#> [1] "double" "numeric"
#>
#> [[2]]
#> [1] "string" "unordered" "nominal"
#>
#> [[3]]
#> [1] "double" "numeric" rec2$var_info$type
#> [[1]]
#> [1] "double" "numeric"
#>
#> [[2]]
#> [1] "string" "unordered" "nominal"
#>
#> [[3]]
#> [1] "double" "numeric" library(tidymodels)
wf1 <- workflow(rec1, linear_reg())
wf2 <- workflow(rec2, linear_reg())
fit(wf1, testing)
#> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]): contrasts can be applied only to factors with 2 or more levels fit(wf2, testing)
#> Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]): contrasts can be applied only to factors with 2 or more levels Created on 2024-06-03 with reprex v2.1.0 |
Could you add some context here on how you imagine this helper being integrated into recipes/workflows? |
right now we have in short, this PR doesn't close 793, but adds the tools needed for it to be closed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great and it will help with some S3 generics that use workflows (it's hard to know what the original data was in a workflow).
I wanted to verify that we computed the ptype after the formula was used to subset the columns in the recipe 👍
That is happening, but I see I forgot to add a test to verify. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In vetiver, we currently require users to pass in their "input prototype data" explicitly, as shown in this test. With the change in this PR, we'd be able to get the ptype
out automatically from the recipe itself, which is great!
We would change vetiver_ptype.recipe()
so it reads the value off the recipe instead of looking for it from the user in the dots.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation is super well written.
I think the fact that this PR can be so simple is a testament to how well this package is engineered. Heck yeah!
Co-authored-by: Simon P. Couch <simonpatrickcouch@gmail.com>
This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex https://reprex.tidyverse.org) and link to this issue. |
This looks at how we could add ptype information to the recipe.
I think the best plan of action is to take a ptype slice of the entire dataset passed to
recipe()
and then have helper functions to pull out the needed information.Because the
ptype
information that the user/functions need will depend:prep()
should match the data passed torecipe()
bake()
shouldn't match against outcome rolesupdate_role_requirements()
Revdepcheck
I ran revdepcheck on the change to add a
ptype
field to the recipe object and it came back with no issues