Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for deploying recipes #179

Merged
merged 7 commits into from
Mar 2, 2023
Merged

Add support for deploying recipes #179

merged 7 commits into from
Mar 2, 2023

Conversation

juliasilge
Copy link
Member

Closes #177

This PR adds support in vetiver for deploying standalone recipes (not as part of workflows).

library(recipes)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#> 
#>     step
library(embed)

split <- seq.int(1, 150, by = 9)
tr <- iris[-split, ]
te <- iris[split, ]

set.seed(11)
supervised <-
    recipe(Species ~ ., data = tr) %>%
    step_center(all_predictors()) %>%
    step_scale(all_predictors()) %>%
    step_umap(all_predictors(), outcome = vars(Species), num_comp = 2) %>%
    prep(training = tr)

library(vetiver)
v <- vetiver_model(supervised, "iris-umap", prototype_data = te[, -5])

library(plumber)
pr() %>%
    vetiver_api(v) ## next pipe to `pr_run()`
#> # Plumber router with 2 endpoints, 4 filters, and 1 sub-router.
#> # Use `pr_run()` on this object to start the API.
#> ├──[queryString]
#> ├──[body]
#> ├──[cookieParser]
#> ├──[sharedSecret]
#> ├──/logo
#> │  │ # Plumber static router serving from directory: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/library/vetiver
#> ├──/ping (GET)
#> └──/predict (POST)

Created on 2023-02-22 with reprex v2.0.2

test_that("can print recipe", {
expect_snapshot(v)
})

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would normally have a test here like this:

test_that("can predict recipe", {
    preds <- predict(v, mtcars)
    expect_equal(<<blah blah blah>>)
})

But I don't think that's possible for recipes. The predict method for a vetiver model does bundle::unbundle() and then calls predict on what is inside. I guess we could add a bake method for a vetiver model if needed? This is separate from the API where we can say exactly what to do at the endpoint.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For more clarity, this is also separate from calling predict() on a remote vetiver endpoint, which would also work. What we don't have a way to do right now is read the recipe back into memory from remote storage (a pin) and then call bake() on it, without the user manually getting out the recipe object themselves and unbundling it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@isabelizimm do you mind summing up here what the situation is for unsupervised models from scikit-learn as deployed by vetiver? These models typically have a predict method so this is not a problem in Python, right?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at just the clustering algorithms from scikit-learn, most of them have a predict method. You can use these in a Pipeline (similar to workflow), same as other models. Vetiver Python doesn't look for supervised/unsupervised models, only if it is coming from scikit-learn, so it will return the outputs of the predict method as expected.

If one of the unsupervised learning models that do NOT have a predict method are used as the last element in a Pipeline, there will be an error along the lines of model has no predict method.

FWIW: (clustering algorithms with predict: k-means, bisecting k-means, affinity propagation, mean shift, BIRCH, Gaussian mixture. do NOT have predict: spectral clustering, agglomerative clustering, DBSCAN, OPTIC)

step_ns(wt) %>%
prep(retain = FALSE)

v <- vetiver_model(trained_rec, "car-splines", prototype_data = mtcars[c("disp", "wt")])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice that we are requiring the user to pass in some prototype_data (check out the vetiver_ptype.recipe method). This is what we have to do for ranger because the info on the training data isn't in there anywhere. If I was understanding Max correctly, this is what he was recommending.

I want to note, though, that the original column names and types are stored in a list, at trained_rec$var_info. Would there be a way to reconstruct the needed info (i.e. a ptype)?

Copy link
Collaborator

@EmilHvitfeldt EmilHvitfeldt Feb 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it stands right now, there isn't a foolproof way of going from trained_rec$var_info to ptypes, since there is no guarantee that a 1-1 mapping can be found. This is much clearly seen since the type will be listed as other for any classes we don't currently specify.

I do however wish that this information was in recipes, as it is useful, even if we don't force the input checking. I will note and see if we can add such information in a future version.

Which is another thing. The variable checking in recipes is done on a optional per-step basis, and can at times be quite loose. many steps doesn't care if input is double or integer. step_dummy() as a gross outlier doesn't do any type checking

R/recipe.R Outdated Show resolved Hide resolved
Copy link
Collaborator

@EmilHvitfeldt EmilHvitfeldt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good (in so far that I only looked at the recipes side of the PR). I think the main struggle right now is that a recipe object doesn't include a reliable way to generate a ptype like object.

Co-authored-by: Emil Hvitfeldt <emilhhvitfeldt@gmail.com>
@juliasilge
Copy link
Member Author

After bake() is added to generics, we can come back and add in some methods.

@juliasilge juliasilge merged commit 0df86e1 into main Mar 2, 2023
@juliasilge juliasilge deleted the add-recipes branch March 2, 2023 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Using Vetiver for UMAPs
3 participants