Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add sparse argument to step_dummy() #1392

Merged
merged 4 commits into from
Nov 14, 2024
Merged

add sparse argument to step_dummy() #1392

merged 4 commits into from
Nov 14, 2024

Conversation

EmilHvitfeldt
Copy link
Member

This PR adds the creation of sparse dummy variables in step_dummy() via the use of the sparse argument.

@EmilHvitfeldt
Copy link
Member Author

EmilHvitfeldt commented Nov 13, 2024

it is set to default to FALSE because it should only be used if you need it. BUT in workflows we plan to detect if the model accepts sparsity, then toggle sparsity in steps that can do it.

Example helper function below. Currently we don't have the infrastructure yet to determine if the sparse columns produced by step_dummy() will make it unharmed through. If step_normalize() was placed after step_dummy() in this example, the sparse dummies would be ruined. But we might be able to do so in the future. For that reason i think the helper should live in {recipes} instead of {workflows}. Any objections @simonpcouch?

library(recipes)
library(modeldata)

recipes_all_sparse <- function(x) {
  for (i in seq_along(x$steps)) {
    if (!is.null(x$steps[[i]]$sparse)) {
      x$steps[[i]]$sparse <- TRUE
    }
  }
  x
}

rec_spec <- recipe(Sale_Price ~ ., data = ames) |>
  step_unknown(all_nominal_predictors()) |>
  step_impute_mean(all_numeric_predictors()) |>
  step_normalize(all_numeric_predictors()) |>
  step_dummy(all_nominal_predictors())

rec_spec |>
  prep() |>
  bake(NULL) |>
  lobstr::obj_size()
#> 7.46 MB

rec_spec |>
  recipes_all_sparse() |>
  prep() |>
  bake(NULL) |>
  lobstr::obj_size()
#> 1.71 MB

Created on 2024-11-12 with reprex v2.1.0

Copy link
Contributor

@simonpcouch simonpcouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super clean!

No objections for that helper living in recipes rather than workflows.

Good move on scoping this PR as just implementing step_dummy(sparse) and holding off on tackling the "smart" toggling the argument in the backend. Your thoughts about what that interface might look like sound reasonable!

R/dummy.R Outdated Show resolved Hide resolved
tests/testthat/test-dummy.R Show resolved Hide resolved
NEWS.md Outdated Show resolved Hide resolved
@EmilHvitfeldt EmilHvitfeldt merged commit 59345e1 into main Nov 14, 2024
13 checks passed
@EmilHvitfeldt EmilHvitfeldt deleted the sparse-dummy branch November 14, 2024 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants