Bonus Slides: Making pretty regression tables with pipes

autosize: true author: Tristan Mahr, @tjmahr date: March 18, 2015 css: assets/custom.css

Madison R Users Group

Repository for this talk: https://github.com/tjmahr/MadR_Pipelines

Plan

Now, I will demonstrate off how I use pipelines to make pretty-printed regression tables.

library("magrittr")
library("dplyr")
library("broom")
library("stringr")
library("knitr")

linear regression

title: false

I want to print a regression summary table.

model <- lm(Sepal.Length ~ Sepal.Width, iris)
summary(model)


Call:
lm(formula = Sepal.Length ~ Sepal.Width, data = iris)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.5561 -0.6333 -0.1120  0.5579  2.2226 

Coefficients:
            Estimate Std. Error t value
(Intercept)   6.5262     0.4789   13.63
Sepal.Width  -0.2234     0.1551   -1.44
            Pr(>|t|)    
(Intercept)   <2e-16 ***
Sepal.Width    0.152    
---
Signif. codes:  
  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1  ' ' 1

Residual standard error: 0.8251 on 148 degrees of freedom
Multiple R-squared:  0.01382,	Adjusted R-squared:  0.007159 
F-statistic: 2.074 on 1 and 148 DF,  p-value: 0.1519

broom::tidy

broom's tidy function extracts summary values into a data-frame.

m_table <- tidy(model)
m_table %>% print(digits = 3)

         term estimate std.error statistic
1 (Intercept)    6.526     0.479     13.63
2 Sepal.Width   -0.223     0.155     -1.44
   p.value
1 6.47e-28
2 1.52e-01

knitr::kable

knitr's kable function prints a data-frame as a Markdown table, which rmarkdown/pandoc can convert to HTML or Word.

kable(m_table, digits = 3,
      col.names = c("Param", "B", "SE", "t", "p"))

Raw Markdown output



|Param       |      B|    SE|      t|     p|
|:-----------|------:|-----:|------:|-----:|
|(Intercept) |  6.526| 0.479| 13.628| 0.000|
|Sepal.Width | -0.223| 0.155| -1.440| 0.152|

======================================================== title: false

Which renders as:

kable(m_table, digits = 3,
      col.names = c("Param", "B", "SE", "t", "p"))

Param	B	SE	t	p
(Intercept)	6.526	0.479	13.628	0.000
Sepal.Width	-0.223	0.155	-1.440	0.152

Almost there

incremental: true

Two more things that I want.

APA style for printing numbers
- Two digits of precision for estimates, errors and statistics.
- Three digits for p-values.
- Bounded values (correlations, p-values) don't need leading zero.
- Tiny p-values should not be 0.000. Use < .001 instead.
Flexibility for many related models. (I don't know the final in-text model beforehand.)

Approach

incremental: true

Build a set of generic, reusable functions for formatting numbers.
Make a custom pipeline to format the term column.
Assemble a pipeline to convert a model into a pretty table.

Digit Formatters

The formatC function is big and confusing, but after I figured out something that works well enough, I extracted a function. Now I don't have to worry about solving that problem again.

# Print with n digits of precision
fixed_digits <- function(xs, n = 2) {
  formatC(xs, digits = n, format = "f")
}

# Want .100 to print with two digits
c(1000.012, 0.0001, 0.100) %>% fixed_digits(2)

[1] "1000.01" "0.00"    "0.10"

Remove leading zeros

Here I do a sanity check to see if this operation is necessary.

# Don't print leading zero on bounded numbers.
remove_leading_zero <- function(xs) {
  # Problem if any value is greater than 1.0
  digit_matters <- xs %>% as.numeric %>%
    abs %>% is_greater_than(1)
  if (any(digit_matters)) {
    warning("Non-zero leading digit")
  }
  str_replace(xs, "^(-?)0", "\\1")
}
c(1.00, -0.12,  0.87, 0.82) %>% remove_leading_zero

[1] "1"    "-.12" ".87"  ".82"

p-value formatter

I format p-values with the previous two functions and overwrite the really small ones with the < .001 template.

# Print three digits of a p-value, but use
# the "< .001" notation on tiny values.
format_pval <- function(ps, html = FALSE) {
  tiny <- ifelse(html, "&lt;&nbsp;.001", "< .001")
  ps_chr <- ps %>% fixed_digits(3) %>%
    remove_leading_zero
  ps_chr[ps < 0.001] <- tiny
  ps_chr
}
m_table$p.value %>% format_pval

[1] "< .001" ".152"

Renaming variables

A few packages already solve the pretty-table problem--stargazer, texreg, xtable--and they do amazing things. But their functions usually rely on an argument where you can specify custom labels for the regression predictors. This is fine for one-off tables.

When I have a bunch of related models, I prefer to use a string-manipulation pipeline to convert R names into long-hand names.

Renaming variables

fix_names <- . %>%
  str_replace(".Intercept.", "Intercept") %>%
  str_replace("Species", "") %>%
  # Capitalize species names
  str_replace("setosa", "Setosa") %>%
  str_replace("versicolor", "Versicolor") %>%
  str_replace("virginica", "Virginica") %>%
  # Clean up special characters
  str_replace_all(".Width", " Width") %>%
  str_replace_all(".Length", " Length") %>%
  str_replace_all(":", " x ")

Formatting pipeline

Format the numbers
Rename the terms
Rename the column headings

two_digits <- . %>% fixed_digits(2)
table_names <- c("Parameter", "Estimate", "SE",
                 "_t_", "_p_")

format_model_table <- . %>%
  mutate_each(funs(two_digits),
              -term, -p.value) %>%
  mutate(term = fix_names(term),
         p.value = format_pval(p.value)) %>%
  set_colnames(table_names)

Overall pipeline

Fit a model
Extract values for summary table
Format table
Print as markdown

lm(..., iris) %>%
  tidy %>%
  format_model_table %>%
  kable

======================================================== title: false

lm(Sepal.Length ~ Sepal.Width, iris) %>%
  tidy %>%
  format_model_table %>%
  kable(align = "r")

Parameter	Estimate	SE	t	p
Intercept	6.53	0.48	13.63	< .001
Sepal Width	-0.22	0.16	-1.44	.152

======================================================== title: false

lm(Sepal.Length ~ Species * Sepal.Width, iris) %>%
  tidy %>%
  format_model_table %>%
  kable(align = "r")

Parameter	Estimate	SE	t	p
Intercept	2.64	0.57	4.62	< .001
Versicolor	0.90	0.80	1.13	.261
Virginica	1.27	0.82	1.55	.123
Sepal Width	0.69	0.17	4.17	< .001
Versicolor x Sepal Width	0.17	0.26	0.67	.503
Virginica x Sepal Width	0.21	0.26	0.83	.411

======================================================== title: false

lm(Sepal.Length ~ Petal.Length * Species, iris) %>%
  tidy %>%
  format_model_table %>%
  kable(align = "r")

Parameter	Estimate	SE	t	p
Intercept	4.21	0.41	10.34	< .001
Petal Length	0.54	0.28	1.96	.052
Versicolor	-1.81	0.60	-3.02	.003
Virginica	-3.15	0.63	-4.97	< .001
Petal Length x Versicolor	0.29	0.30	0.97	.334
Petal Length x Virginica	0.45	0.29	1.56	.120

======================================================== title: false

lm(Sepal.Length ~ Species * Sepal.Width + Petal.Length, iris) %>%
  tidy %>%
  format_model_table %>%
  kable(align = "r")

Parameter	Estimate	SE	t	p
Intercept	1.67	0.41	4.11	< .001
Versicolor	0.28	0.56	0.51	.614
Virginica	-0.65	0.59	-1.10	.274
Sepal Width	0.62	0.12	5.40	< .001
Petal Length	0.82	0.07	12.42	< .001
Versicolor x Sepal Width	-0.45	0.19	-2.39	.018
Virginica x Sepal Width	-0.29	0.18	-1.57	.119

======================================================== title: false

lm(Sepal.Length ~ Species * Sepal.Width * Petal.Length, iris) %>%
  tidy %>%
  format_model_table %>%
  kable(align = "r")

Parameter	Estimate	SE	t	p
Intercept	-1.37	3.91	-0.35	.727
Versicolor	3.17	5.50	0.58	.565
Virginica	-0.99	5.27	-0.19	.851
Sepal Width	1.72	1.12	1.54	.126
Petal Length	2.88	2.75	1.05	.297
Versicolor x Sepal Width	-1.36	1.86	-0.73	.466
Virginica x Sepal Width	-0.43	1.66	-0.26	.797
Versicolor x Petal Length	-2.07	2.90	-0.71	.476
Virginica x Petal Length	-1.42	2.82	-0.51	.614
Sepal Width x Petal Length	-0.74	0.79	-0.95	.345
Versicolor x Sepal Width x Petal Length	0.72	0.86	0.84	.405
Virginica x Sepal Width x Petal Length	0.56	0.81	0.69	.488

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

02_tables.md

02_tables.md

Bonus Slides: Making pretty regression tables with pipes

Plan

linear regression

broom::tidy

knitr::kable

Almost there

Approach

Digit Formatters

Remove leading zeros

p-value formatter

Renaming variables

Renaming variables

Formatting pipeline

Overall pipeline

Files

02_tables.md

Latest commit

History

02_tables.md

File metadata and controls

Bonus Slides: Making pretty regression tables with pipes

Plan

linear regression

broom::tidy

knitr::kable

Almost there

Approach

Digit Formatters

Remove leading zeros

p-value formatter

Renaming variables

Renaming variables

Formatting pipeline

Overall pipeline