sternclean
seeks to simplify cleaning dataframes.
Multiple cleaning steps are accomplished in just one function.
For example, you can change column types, impute one set of columns' NAs with a set value, impute another set of columns' NAs with a group mean, and impute another set of columns' infinite values with another set value in a few lines of clean code
Here is the order of operations under the hood:
- Change the types
- Remove columns
- Impute NAs
- Impute infinites
This allows multiple cleaning processes to happen in this one function
We will start with simple one-step cleaning examples. Later we will take on more complex situations.
people | original_person | intelligence | evil_rank |
---|---|---|---|
Rickle | Rickle | Inf | 5 |
Mortan | Mortan | 9 | 2.75 |
Jerry | Jerry | 0.1 | 2 |
Pickle Rickle | Rickle | Inf | NA |
class(rickle_and_mortan$people)
#> [1] "factor"
sternclean("rickle_and_mortan",
class_to_strng = "people")
class(rickle_and_mortan$people)
#> [1] "character"
class(rickle_and_mortan$intelligence)
#> [1] "character"
sternclean("rickle_and_mortan",
class_to_numer = "intelligence")
class(rickle_and_mortan$intelligence)
#> [1] "numeric"
sternclean("rickle_and_mortan",
remove_columns = "intelligence")
people | original_person | evil_rank |
---|---|---|
Rickle | Rickle | 5 |
Mortan | Mortan | 2.75 |
Jerry | Jerry | 2 |
Pickle Rickle | Rickle | NA |
sternclean("rickle_and_mortan",
remove_na_rows = "evil_rank")
people | original_person | intelligence | evil_rank |
---|---|---|---|
Rickle | Rickle | Inf | 5 |
Mortan | Mortan | 9 | 2.75 |
Jerry | Jerry | 0.1 | 2 |
sternclean("rickle_and_mortan",
removeby_regex = "pe")
intelligence | evil_rank |
---|---|
Inf | 5 |
9 | 2.75 |
0.1 | 2 |
Inf | NA |
sternclean("rickle_and_mortan",
remove_all_nas = TRUE)
people | original_person | intelligence | evil_rank |
---|---|---|---|
Rickle | Rickle | Inf | 5 |
Mortan | Mortan | 9 | 2.75 |
Jerry | Jerry | 0.1 | 2 |
sternclean("rickle_and_mortan",
remove_non_num = TRUE)
intelligence | evil_rank |
---|---|
Inf | 5 |
9 | 2.75 |
0.1 | 2 |
Inf | NA |
sternclean("rickle_and_mortan",
remove_all_exc = c("people", "evil_rank"))
people | evil_rank |
---|---|
Rickle | 5 |
Mortan | 2.75 |
Jerry | 2 |
Pickle Rickle | NA |
sternclean("rickle_and_mortan",
impute_na2mean = "evil_rank")
people | original_person | intelligence | evil_rank |
---|---|---|---|
Rickle | Rickle | Inf | 5 |
Mortan | Mortan | 9 | 2.75 |
Jerry | Jerry | 0.1 | 2 |
Pickle Rickle | Rickle | Inf | 3.25 |
sternclean("rickle_and_mortan",
impute_na_cols = "evil_rank",
impute_na_with = 1738)
people | original_person | intelligence | evil_rank |
---|---|---|---|
Rickle | Rickle | Inf | 5 |
Mortan | Mortan | 9 | 2.75 |
Jerry | Jerry | 0.1 | 2 |
Pickle Rickle | Rickle | Inf | 1738 |
sternclean("rickle_and_mortan",
impute_grpmean = "evil_rank",
impute_grpwith = "original_person")
original_person | people | intelligence | evil_rank |
---|---|---|---|
Jerry | Jerry | 0.1 | 2 |
Mortan | Mortan | 9 | 2.75 |
Rickle | Rickle | Inf | 5 |
Rickle | Pickle Rickle | Inf | 5 |
sternclean("rickle_and_mortan",
impute_inf_col = "intelligence",
impute_inf_wit = 1738)
people | original_person | intelligence | evil_rank |
---|---|---|---|
Rickle | Rickle | 1738 | 5 |
Mortan | Mortan | 9 | 2.75 |
Jerry | Jerry | 0.1 | 2 |
Pickle Rickle | Rickle | 1738 | NA |
sternclean("rickle_and_mortan",
impute_cust_cl = "evil_rank",
impute_cust_fn = quantile,
probs = .25,
na.rm = TRUE
)
people | original_person | intelligence | evil_rank |
---|---|---|---|
Rickle | Rickle | Inf | 5 |
Mortan | Mortan | 9 | 2.75 |
Jerry | Jerry | 0.1 | 2 |
Pickle Rickle | Rickle | Inf | 2.375 |
Here we:
- change the people column's class to string
- change the intelligence column's class to numeric
- remove the original_person column
- impute the NAs in the evil rank with the column's mean
- impute the infite values in the intelligence column to 1738
sternclean("rickle_and_mortan",
class_to_strng = "people",
class_to_numer = "intelligence",
remove_columns = "original_person",
impute_na2mean = "evil_rank",
impute_inf_col = "intelligence",
impute_inf_wit = 1738
)
people | intelligence | evil_rank |
---|---|---|
Rickle | 1738 | 5 |
Mortan | 9 | 2.75 |
Jerry | 0.1 | 2 |
Pickle Rickle | 1738 | 3.25 |
people | original_person | intelligence | evil_rank |
---|---|---|---|
Rickle | Rickle | Inf | 5 |
Mortan | Mortan | 9 | 2.75 |
Jerry | Jerry | 0.1 | 2 |
Pickle Rickle | Rickle | Inf | NA |