Skip to content

elliotastern/sternclean

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

sternclean seeks to simplify cleaning dataframes.

Multiple cleaning steps are accomplished in just one function.

For example, you can change column types, impute one set of columns' NAs with a set value, impute another set of columns' NAs with a group mean, and impute another set of columns' infinite values with another set value in a few lines of clean code

Here is the order of operations under the hood:

  • Change the types
  • Remove columns
  • Impute NAs
  • Impute infinites

This allows multiple cleaning processes to happen in this one function

Simple Examples

We will start with simple one-step cleaning examples. Later we will take on more complex situations.

Rickle and Mortan Dataset

people original_person intelligence evil_rank
Rickle Rickle Inf 5
Mortan Mortan 9 2.75
Jerry Jerry 0.1 2
Pickle Rickle Rickle Inf NA

Class Change Parameters

class(rickle_and_mortan$people)
#> [1] "factor"

sternclean("rickle_and_mortan",
           class_to_strng = "people")

class(rickle_and_mortan$people)
#> [1] "character"
class(rickle_and_mortan$intelligence)
#> [1] "character"

sternclean("rickle_and_mortan",
           class_to_numer = "intelligence")

class(rickle_and_mortan$intelligence)
#> [1] "numeric"

Column/Row Removal Parameters

sternclean("rickle_and_mortan",
           remove_columns = "intelligence")
people original_person evil_rank
Rickle Rickle 5
Mortan Mortan 2.75
Jerry Jerry 2
Pickle Rickle Rickle NA
sternclean("rickle_and_mortan",
           remove_na_rows =  "evil_rank")
people original_person intelligence evil_rank
Rickle Rickle Inf 5
Mortan Mortan 9 2.75
Jerry Jerry 0.1 2
sternclean("rickle_and_mortan",
           removeby_regex = "pe")
intelligence evil_rank
Inf 5
9 2.75
0.1 2
Inf NA
sternclean("rickle_and_mortan",
           remove_all_nas = TRUE)
people original_person intelligence evil_rank
Rickle Rickle Inf 5
Mortan Mortan 9 2.75
Jerry Jerry 0.1 2
sternclean("rickle_and_mortan",
           remove_non_num = TRUE)
intelligence evil_rank
Inf 5
9 2.75
0.1 2
Inf NA
sternclean("rickle_and_mortan",
           remove_all_exc = c("people", "evil_rank"))
people evil_rank
Rickle 5
Mortan 2.75
Jerry 2
Pickle Rickle NA

Impute Parameters

sternclean("rickle_and_mortan",
           impute_na2mean = "evil_rank")
people original_person intelligence evil_rank
Rickle Rickle Inf 5
Mortan Mortan 9 2.75
Jerry Jerry 0.1 2
Pickle Rickle Rickle Inf 3.25
sternclean("rickle_and_mortan",
           impute_na_cols = "evil_rank",
           impute_na_with = 1738)
people original_person intelligence evil_rank
Rickle Rickle Inf 5
Mortan Mortan 9 2.75
Jerry Jerry 0.1 2
Pickle Rickle Rickle Inf 1738
sternclean("rickle_and_mortan",
           impute_grpmean = "evil_rank",
           impute_grpwith = "original_person")
original_person people intelligence evil_rank
Jerry Jerry 0.1 2
Mortan Mortan 9 2.75
Rickle Rickle Inf 5
Rickle Pickle Rickle Inf 5
sternclean("rickle_and_mortan",
           impute_inf_col = "intelligence",
           impute_inf_wit = 1738)
people original_person intelligence evil_rank
Rickle Rickle 1738 5
Mortan Mortan 9 2.75
Jerry Jerry 0.1 2
Pickle Rickle Rickle 1738 NA
sternclean("rickle_and_mortan",
           impute_cust_cl = "evil_rank",
           impute_cust_fn = quantile,
           probs = .25,
           na.rm = TRUE
           )
people original_person intelligence evil_rank
Rickle Rickle Inf 5
Mortan Mortan 9 2.75
Jerry Jerry 0.1 2
Pickle Rickle Rickle Inf 2.375

More Complex Example

Here we:

  • change the people column's class to string
  • change the intelligence column's class to numeric
  • remove the original_person column
  • impute the NAs in the evil rank with the column's mean
  • impute the infite values in the intelligence column to 1738
sternclean("rickle_and_mortan",
           class_to_strng = "people",
           class_to_numer = "intelligence",
           remove_columns = "original_person",
           impute_na2mean = "evil_rank",
           impute_inf_col = "intelligence",
           impute_inf_wit = 1738
           )
people intelligence evil_rank
Rickle 1738 5
Mortan 9 2.75
Jerry 0.1 2
Pickle Rickle 1738 3.25

Compared to Original Data Frame

people original_person intelligence evil_rank
Rickle Rickle Inf 5
Mortan Mortan 9 2.75
Jerry Jerry 0.1 2
Pickle Rickle Rickle Inf NA

About

Clean your data frame in one readable function

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages