Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for luz #187

Merged
merged 38 commits into from
Apr 14, 2023
Merged

Support for luz #187

merged 38 commits into from
Apr 14, 2023

Conversation

dfalbel
Copy link
Member

@dfalbel dfalbel commented Mar 10, 2023

Hi 👋!

This is a first pass at supporting luz in vetiver.

There are a few things that I'd like to ask the best way to proceed:

  1. Calling predict in the result of vetiver_model doesn't yield the same structure as calling predict in an endpoint containing a luz model - which can be confusing. I wonder if we should enforce that somehow, in this case, I think it would be nice if the same validations happening in handler_predict.luz_module_fitted also happened for predict.vetiver_model.

  2. Outputs of luz models can be arrays with arbitrary dimensions and vetiver enforces a data frame output. To handle this, we are returning a dataframe with an array column, which helps preserving the output dimensions. However, the json serializer and de-serializer somehow breaks the array column:

x <- tibble::tibble(.pred = array(1, dim = c(5, 2,2,2)))
str(x)
#> tibble [5 × 1] (S3: tbl_df/tbl/data.frame)
#>  $ .pred: num [1:5, 1:2, 1:2, 1:2] 1 1 1 1 1 1 1 1 1 1 ...

y <- jsonlite::fromJSON(jsonlite::toJSON(x))
str(y)
#> 'data.frame':    5 obs. of  1 variable:
#>  $ .pred:List of 5
#>   ..$ : int [1:2, 1:2, 1:2] 1 1 1 1 1 1 1 1
#>   ..$ : int [1:2, 1:2, 1:2] 1 1 1 1 1 1 1 1
#>   ..$ : int [1:2, 1:2, 1:2] 1 1 1 1 1 1 1 1
#>   ..$ : int [1:2, 1:2, 1:2] 1 1 1 1 1 1 1 1
#>   ..$ : int [1:2, 1:2, 1:2] 1 1 1 1 1 1 1 1

I wonder if there's a way to safely override the de-serializer for those models so the original structure is preserved.

@juliasilge
Copy link
Member

Thank you so much for this contribution @dfalbel!

  1. The design here for what goes into a handler_predict method is about common failure modes for deployed models, for example, new data coming with problems or in a different format. The use case for predict is intended to be much less broad, basically like just calling predict on the model object inside as a user would expect to be able to do.
  2. We have been thinking about non-rectangular data for a while (see Consider options for more flexible ptype specification #55) but until we get some of that worked out, what we are doing is letting folks choose between rectangular data with strict/robust checking and turning the checking off. See Details here. We'll want to take the same approach for luz as for keras in this respect.

Copy link
Member

@juliasilge juliasilge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for contributing this! Let me know if you have any questions on this feedback, and I would very much welcome your input and ideas on #55.

R/luz.R Outdated Show resolved Hide resolved
R/luz.R Outdated
#' @rdname vetiver_create_meta
#' @export
vetiver_create_meta.luz_module_fitted <- function(model, metadata) {
pkgs <- c("luz", model$model$required_pkgs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you tell me more about how this works in luz? I have trained some of the example models and they do require me to have, for example, torch and/or torchvision loaded, but then they are not stored in this slot. Instead, I see:

model$model$required_pkgs
#> NULL

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In luz we don't try to be smart about capturing used packages, but users can optionally set this field in the nn_module so it's available as metadata. Eg, one can do:

module <- torch::nn_module(
    initialize = function(in_features, out_features) {
        self$linear <- torch::nn_linear(10, 10)
    },
    forward = function(x) {
        self$linear(x)
    },
    required_pkgs = c("torch", "torchvision")
)

We could try to traverse the forward expression and find functions calls that come from other packages, but I feel this can still have many edge cases and is kind of error prone.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would they always need torch? Should we include that there? I think this is about what needs to be installed and attached for predictions to work. Getting the right packages installed for the deployment is a big part of what vetiver aims to do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

torch don't necessary need to be attached, but it definitely needs to be installed, but it should already be as it's a hard dependency for luz. In the above example, torch wouldn't need to attached for predictions to work.

R/luz.R Show resolved Hide resolved
#' @rdname handler_startup
#' @export
handler_predict.luz_module_fitted <- function(vetiver_model, ...) {
force(vetiver_model)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you tell me a little more about this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we return the closure directly, without evaluating vetiver_model, it will not be in the in the function env until the first closure call - but at this point vetiver_model could potentially have been garbage collected. This might not be the case for vetiver though, just feels like a good practice to force before returning the closure.

I'm trying to avoid something like this:

f <- function(a, force) {
    if (force) force(a)
    function() {
        a + 1
    }
}

b <- 1
fun_f <- f(a = b, force = TRUE)
fun_nf <- f(a = b, force = FALSE)
rm(b);gc()
#>           used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells  674744 36.1    1413787 75.6         NA   710975 38.0
#> Vcells 1201226  9.2    8388608 64.0      32768  1888973 14.5

fun_f()
#> [1] 2
fun_nf()
#> Error in fun_nf(): object 'b' not found

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the details!

R/luz.R Outdated Show resolved Hide resolved
tests/testthat/test-luz.R Outdated Show resolved Hide resolved
expect_error(predict(v, as.array(torch::torch_randn(10, 2))), regex = "dim error")
})

test_that("can call endpoints", {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can check out the approach for testing a local plumber session I have set up in the package already to use instead of this. Unfortunately it's not practical to set up APIs for all the model types that run in CI (just takes too long for the API to come up on some architectures) so a test like this will need to skip on CI (as well of course on CRAN).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed that test for now. I wasn't able to use local_plumber_session because an unbundled model is passed to the callr session, and that breaks the luz model. We could perhaps 'unbundle' on the first call to the API instead? But probably should pe part of another PR.

@dfalbel
Copy link
Member Author

dfalbel commented Mar 16, 2023

@juliasilge Thank you very much for the review and suggestions. I simplified the PR to make luz support very similar to Keras. I'll think about multi-output support and post on #55

@juliasilge
Copy link
Member

I made a little example for returning higher dimensional tensors and put it in inst/mtcars_luz.R. The output now looks like this, which I think is a pretty nice option:

library(vetiver)
endpoint <- vetiver_endpoint("http://127.0.0.1:8080/predict")
scaled_cars <- scale(as.matrix(mtcars))
x_test  <- scaled_cars[26:32, 2:ncol(scaled_cars)]
predict(endpoint, data.frame(x_test[1:2,]))
#> # A tibble: 2 × 1
#>   preds              
#>   <list>             
#> 1 <dbl [3 × 64 × 64]>
#> 2 <dbl [3 × 64 × 64]>

Created on 2023-04-03 with reprex v2.0.2

@dfalbel would you mind taking a look at this again and seeing if you have any feedback (other than, of course, how to extend the prototype checking to non-rectangular data, which we can handle separately)?

@juliasilge
Copy link
Member

Ah, I went to deploy one of these models on Connect and realized that we haven't set up the torch installation for the API. 🙈

What do you think is the best way to go about this @dfalbel? The way we handle installing keras is via a requirements.txt that gets bundled along to Connect. (See here and here.)

What would be a good way to handle this for torch? What do you all do for installing torch on Connect typically?

@dfalbel
Copy link
Member Author

dfalbel commented Apr 3, 2023

In theory, just setting the env var TORCH_INSTALL=1 is enough for the installation to succeed. THis is to allow downloading external files without a prompt. There's no python devs or anything else. Is it possible to set an env var? Or eg, send a .Renvrion file that would set it?

@juliasilge
Copy link
Member

Does that mean it will install torch every time the content starts, i.e. the API starts up? That's not ideal.

How do you all typically install torch into content when you are deploying on Connect? Do you have an example I can look at? We would want the install to happen when the content deploys, not each time it starts up.

@juliasilge juliasilge merged commit bc207a6 into rstudio:main Apr 14, 2023
juliasilge added a commit that referenced this pull request Apr 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants