Note: I gave a lightning talk on A Comparison of Packages to Generate Codebooks in R to R-Ladies NYC on Tuesday Sept. 20th, 2022. You can view slides and materials for that talk here: https://github.com/Cghlewis/rladies-nyc-codebook-comparison
I started this table as a way to compare existing r packages that assist in codebook creation. The criteria I am looking for include the following variable level metrics (specifically for working with haven::labelled()
data):
- Name
- Label
- Type
- Values (if categorical)
- Value labels (if categorical)
- NA values (Missing values: for example -99 and -98)
- NA labels (Missing value labels: for example -99 = No response, -98 = Unclear response)
- Total valid N
- Total missing N (must recognize user-define missing values)
- N per value (if categorical)
- % per value (if categorical)
- N per NA value (User-defined labelled missing value)
- % per NA value (User-defined labelled missing value)
- Range (if continuous)
- Mean (if continuous)
A table of all packages I reviewed can be found here: https://cghlewis.github.io/codebook-pkg-comparison/
There were other packages::functions() that I reviewed but I did not include them in the table if they give errors when working with haven::labelled()
data (for example both Hmisc::describe()
and dataxray::make_xray()
give errors when data include value labels). If you see that I have mistakenly marked any category for any package, please let me know and I will update!
Ultimately I have narrowed the table down to these 4 packages. These 4 packages work well with haven::labelled()
data and they met an acceptable number of the above criteria.
Other helpful resources: