-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for deploying recipes #179
Conversation
test_that("can print recipe", { | ||
expect_snapshot(v) | ||
}) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would normally have a test here like this:
test_that("can predict recipe", {
preds <- predict(v, mtcars)
expect_equal(<<blah blah blah>>)
})
But I don't think that's possible for recipes. The predict
method for a vetiver model does bundle::unbundle()
and then calls predict
on what is inside. I guess we could add a bake
method for a vetiver model if needed? This is separate from the API where we can say exactly what to do at the endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For more clarity, this is also separate from calling predict()
on a remote vetiver endpoint, which would also work. What we don't have a way to do right now is read the recipe back into memory from remote storage (a pin) and then call bake()
on it, without the user manually getting out the recipe object themselves and unbundling it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@isabelizimm do you mind summing up here what the situation is for unsupervised models from scikit-learn as deployed by vetiver? These models typically have a predict
method so this is not a problem in Python, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at just the clustering algorithms from scikit-learn, most of them have a predict method. You can use these in a Pipeline (similar to workflow), same as other models. Vetiver Python doesn't look for supervised/unsupervised models, only if it is coming from scikit-learn, so it will return the outputs of the predict method as expected.
If one of the unsupervised learning models that do NOT have a predict method are used as the last element in a Pipeline, there will be an error along the lines of model has no predict method
.
FWIW: (clustering algorithms with predict: k-means, bisecting k-means, affinity propagation, mean shift, BIRCH, Gaussian mixture. do NOT have predict: spectral clustering, agglomerative clustering, DBSCAN, OPTIC)
step_ns(wt) %>% | ||
prep(retain = FALSE) | ||
|
||
v <- vetiver_model(trained_rec, "car-splines", prototype_data = mtcars[c("disp", "wt")]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notice that we are requiring the user to pass in some prototype_data
(check out the vetiver_ptype.recipe
method). This is what we have to do for ranger because the info on the training data isn't in there anywhere. If I was understanding Max correctly, this is what he was recommending.
I want to note, though, that the original column names and types are stored in a list, at trained_rec$var_info
. Would there be a way to reconstruct the needed info (i.e. a ptype
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it stands right now, there isn't a foolproof way of going from trained_rec$var_info
to ptype
s, since there is no guarantee that a 1-1 mapping can be found. This is much clearly seen since the type will be listed as other
for any classes we don't currently specify.
I do however wish that this information was in recipes, as it is useful, even if we don't force the input checking. I will note and see if we can add such information in a future version.
Which is another thing. The variable checking in recipes is done on a optional per-step basis, and can at times be quite loose. many steps doesn't care if input is double or integer. step_dummy()
as a gross outlier doesn't do any type checking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good (in so far that I only looked at the recipes side of the PR). I think the main struggle right now is that a recipe object doesn't include a reliable way to generate a ptype
like object.
Co-authored-by: Emil Hvitfeldt <emilhhvitfeldt@gmail.com>
After |
Closes #177
This PR adds support in vetiver for deploying standalone recipes (not as part of workflows).
Created on 2023-02-22 with reprex v2.0.2