Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dummy encode character and factor columns #207

Closed
Tracked by #205
spsanderson opened this issue Mar 14, 2022 · 1 comment
Closed
Tracked by #205

Dummy encode character and factor columns #207

spsanderson opened this issue Mar 14, 2022 · 1 comment
Assignees
Labels
enhancement New feature or request function A new function

Comments

@spsanderson
Copy link
Owner

No description provided.

@spsanderson
Copy link
Owner Author

spsanderson commented Apr 27, 2022

Function:

hai_knn_data_prepper <- function(.data, .recipe_formula){
  
  # Recipe ---
  rec_obj <- recipes::recipe(.recipe_formula, data = data_tbl) %>%
    recipes::step_novel(recipes::all_nominal_predictors()) %>%
    recipes::step_dummy(recipes::all_nominal_predictors(), one_hot = TRUE) %>%
    recipes::step_zv(recipes::all_predictors()) %>%
    recipes::step_normalize(recipes::all_numeric())
  
  # Return ----
  return(rec_obj)
  
}

Examples:

> hai_knn_data_prepper(iris, Species ~ .)
Recipe

Inputs:

      role #variables
   outcome          1
 predictor          4

Operations:

Dummy variables from recipes::all_nominal_predictors()
Centering and scaling for recipes::all_numeric()

> hai_knn_data_prepper(iris, Species ~ .) %>% prep() %>% bake(iris)
# A tibble: 150 x 5
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
 1       -0.898      1.02          -1.34       -1.31 setosa 
 2       -1.14      -0.132         -1.34       -1.31 setosa 
 3       -1.38       0.327         -1.39       -1.31 setosa 
 4       -1.50       0.0979        -1.28       -1.31 setosa 
 5       -1.02       1.25          -1.34       -1.31 setosa 
 6       -0.535      1.93          -1.17       -1.05 setosa 
 7       -1.50       0.786         -1.34       -1.18 setosa 
 8       -1.02       0.786         -1.28       -1.31 setosa 
 9       -1.74      -0.361         -1.34       -1.31 setosa 
10       -1.14       0.0979        -1.28       -1.44 setosa 
# ... with 140 more rows

> hai_knn_data_prepper(Titanic, Survived ~ .) %>% prep() %>% bake(Titanic)
# A tibble: 32 x 10
        n Survived Class_X1st Class_X2nd Class_X3rd Class_Crew Sex_Female Sex_Male
    <dbl> <fct>         <dbl>      <dbl>      <dbl>      <dbl>      <dbl>    <dbl>
 1 -0.506 No            1.70      -0.568     -0.568     -0.568     -0.984    0.984
 2 -0.506 No           -0.568      1.70      -0.568     -0.568     -0.984    0.984
 3 -0.248 No           -0.568     -0.568      1.70      -0.568     -0.984    0.984
 4 -0.506 No           -0.568     -0.568     -0.568      1.70      -0.984    0.984
 5 -0.506 No            1.70      -0.568     -0.568     -0.568      0.984   -0.984
 6 -0.506 No           -0.568      1.70      -0.568     -0.568      0.984   -0.984
 7 -0.381 No           -0.568     -0.568      1.70      -0.568      0.984   -0.984
 8 -0.506 No           -0.568     -0.568     -0.568      1.70       0.984   -0.984
 9  0.362 No            1.70      -0.568     -0.568     -0.568     -0.984    0.984
10  0.627 No           -0.568      1.70      -0.568     -0.568     -0.984    0.984
# ... with 22 more rows, and 2 more variables: Age_Adult <dbl>, Age_Child <dbl>

Repository owner moved this from Todo to Done in @spsanderson's Repository Issue Overview Apr 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request function A new function
Development

No branches or pull requests

1 participant