TidierCats.jl

What is TidierCats.jl

TidierCats.jl is a 100% Julia implementation of the R forcats package.

TidierCats.jl has one main goal: to implement forcats's straightforward syntax and of ease of use while working with categorical variables for Julia users. While this package was develeoped to work seamelessly with Tidier.jl functions and macros, it can also work as a independently as a standalone package. This package is powered by CateogricalArrays.jl.

What functions does TidierCats.jl support?

cat_rev()
cat_relevel()
cat_infreq()
cat_reorder()
cat_collapse()
cat_lump_min()
cat_lump_prop()
cat_recode()
cat_other()
cat_replace_missing()
as_categorical()

Installation

For the development version:

using Pkg
Pkg.add(url = "https://github.com/TidierOrg/TidierCats.jl.git")

Examples

using TidierData
using TidierCats
using Random

Random.seed!(10)

categories = ["High", "Medium", "Low", "Zilch"]

random_indices = rand(1:length(categories), 57)


df = DataFrame(
    ID = 1:57,
    CatVar = categorical([categories[i] for i in random_indices], levels = categories)
)

`cat_relevel()`

This function changes the order of levels in a categorical variable. It accepts two arguments - a column name and an array of levels in the desired order.

custom_order = @chain df begin
    @mutate(CatVar = cat_relevel(CatVar, ["Zilch", "Medium", "High", "Low"]))
end

print(levels(df[!,:CatVar]))
print(levels(custom_order[!,:CatVar]))

["High", "Medium", "Low", "Zilch"]
["Zilch", "Medium", "High", "Low"]

`cat_rev()`

This function reverses the order of levels in a categorical variable. It only requires one argument - the column name whose levels are to be reversed.

reversed_order = @chain df begin
    @mutate(CatVar = cat_rev(CatVar))
end

print(levels(df[!,:CatVar]))
print(levels(reversed_order[!,:CatVar]))

["High", "Medium", "Low", "Zilch"]
["Zilch", "Low", "Medium", "High"]

`cat_infreq()`

This function reorders levels of a categorical variable based on their frequencies, with most frequent level first. The single argument is column name.

@chain df begin
    @count(CatVar)
end

 Row │ CatVar  n     
     │ Cat…    Int64 
─────┼───────────────
   1 │ High       19
   2 │ Medium     11
   3 │ Low        14
   4 │ Zilch      13

orderedbyfrequency = @chain df begin
    @mutate(CatVar = cat_infreq(CatVar))
end

print(levels(df[!,:CatVar]))
print(levels(orderedbyfrequency[!,:CatVar]))

["High", "Medium", "Low", "Zilch"]
["High", "Low", "Zilch", "Medium"]

`cat_lump()`

This function lumps the least frequent levels into a new "Other" level. It accepts two arguments - a column name and an integer specifying the number of levels to keep.

lumped_cats = @chain df begin
    @mutate(CatVar = cat_lump(CatVar,2))
end

print(levels(df[!,:CatVar]))
print(levels(lumped_cats[!,:CatVar]))

["High", "Medium", "Low", "Zilch"]
["High", "Low", "Other"]

`cat_reorder()`

This function reorders levels of a categorical variable based on a mean of a second variable. It takes three arguments - a categorical column , a numerical column by which to reorder, and a function to calculate the summary statistic (currently only supports mean, median). There is a fourth optional argument which defaults to true, if set to false, it order the categories in ascending order.

df3 = DataFrame(
    cat_var = repeat(["Low", "Medium", "High"], outer = 10),
    order_var = rand(30)
)

df4 = @chain df3 begin
    @mutate(cat_var= cat_reorder(cat_var, order_var, "median" ))
end

@chain df3 begin
    @mutate(catty = as_categorical(cat_var))
    @group_by(cat_var)
    @summarise(median = median(order_var))
end

print(levels(df3[!,:cat_var]))
print(levels(df4[!,:cat_var]))

 Row │ cat_var  median   
     │ String   Float64  
─────┼───────────────────
   1 │ High     0.385143
   2 │ Low      0.510809
   3 │ Medium   0.65539

["High", "Low", "Medium"]
["Medium", "Low", "High"]

`cat_collapse()`

This function collapses levels in a categorical variable according to a specified mapping. It requires two arguments - a categorical column and a dictionary that maps original levels to new ones.

df5 = @chain df begin
    @mutate(CatVar = cat_collapse(CatVar, Dict("Low" => "bad", "Zilch" => "bad")))
end

@chain df begin
    @count(CatVar)
end

@chain df5 begin 
    @count(CatVar)
end

 Row │ CatVar  n     
     │ Cat…    Int64 
─────┼───────────────
   1 │ High       19
   2 │ Medium     11
   3 │ Low        14
   4 │ Zilch      13

 Row │ CatVar  n     
     │ Cat…    Int64 
─────┼───────────────
   1 │ High       19
   2 │ Medium     11
   3 │ bad        27

`as_categorical()`

This function converts a standard Julia array to a categorical array. The only argument it needs is the colunn name to be converted.

test = DataFrame( w = ["A", "B", "C", "D"])

@chain test begin 
    @mutate(w = as_categorical(w))
end

 Row │ w    
     │ Cat… 
─────┼──────
   1 │ A
   2 │ B
   3 │ C
   4 │ D

`cat_lump_min()`

This function wil lump any category with less than the minimum number of entries and recategorize it as "Other" (the default), or a category name chosen by the user.

lumpedbymin = @chain df begin
    @mutate(CatVar = cat_lump_min(CatVar, 14))
end

print(levels(df[!,:CatVar]))
print(levels(lumpedbymin[!,:CatVar]))

["High", "Medium", "Low", "Zilch"]
["High", "Low", "Other"]

`cat_lump_prop()`

This function wil lump all categories with less than the minimum proportion and recateogrize it as "Other" (the default), or a category name chosen by the user.

lumpedbyprop = @chain df begin
    @mutate(CatVar = cat_lump_prop(CatVar, .25, "new name"))
end

print(levels(df[!,:CatVar]))
print(levels(lumpedbyprop[!,:CatVar]))

["High", "Medium", "Low", "Zilch"]
["High", "new name"]

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.github/workflows		.github/workflows
docs		docs
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
NEWS.md		NEWS.md
Project.toml		Project.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TidierCats.jl

What is TidierCats.jl

What functions does TidierCats.jl support?

Installation

Examples

`cat_relevel()`

`cat_rev()`

`cat_infreq()`

`cat_lump()`

`cat_reorder()`

`cat_collapse()`

`as_categorical()`

`cat_lump_min()`

`cat_lump_prop()`

About

Releases 4

Packages

Contributors 4

Languages

License

TidierOrg/TidierCats.jl

Folders and files

Latest commit

History

Repository files navigation

TidierCats.jl

What is TidierCats.jl

What functions does TidierCats.jl support?

Installation

Examples

cat_relevel()

cat_rev()

cat_infreq()

cat_lump()

cat_reorder()

cat_collapse()

as_categorical()

cat_lump_min()

cat_lump_prop()

About

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 4

Languages

`cat_relevel()`

`cat_rev()`

`cat_infreq()`

`cat_lump()`

`cat_reorder()`

`cat_collapse()`

`as_categorical()`

`cat_lump_min()`

`cat_lump_prop()`

Packages