Custom table #973

larmarange · 2021-08-30T17:06:19Z

Do we have an easy way to produce a ratio / incidence table, i.e. a table where we report the ratio between two variables (i.e. a numerator and a denominator)?

It could be part of a more global discussion about producing a table (one-way or two-way) where the content of the cells depends on a custom function applied to other variables (e.g. the sum of a third variable, the ratio of two variables, etc.), applied to subgroups.

Somehow, such a table would depend on (i) a list of categorical variables for rows; (ii) an optional categorical variable display in columns; (iii) a custom function applied on the subsample.

Would it be something worth considering?

larmarange · 2021-08-30T17:24:37Z

A sort of tbl_custom(data, rows, by, stat) with stat being a function take a data.frame as input and returning a character vector of statistics.

rows could be a list of variables, and by could be a single categorical variable.

tbl_custom() will generate a subsample of data (based on rows and by) but containing all variables of data and will pass it to stat function. The content of rows and by could eventually be added to the data.frame as .row and .by additional columns.

If a continuous variable is passed to rows, the entire dataset is passed through (but .row will be populated accordingly).

Such a tbl_custom() function would allow users to generate very specific and customized tables.

@ddsjoberg, what do you think?

ddsjoberg · 2021-08-30T18:32:10Z

Do we have an easy way to produce a ratio / incidence table, i.e. a table where we report the ratio between two variables (i.e. a numerator and a denominator)?

Not super easy, for sure! Users could easily merge results from a univariate Poisson regression, for example, to get those rates in a table. They could also use a custom function with add_difference() to produce RR, IRR, OR, whatever. But I do think the custom functions are not easy to implement for a passive user. We could make a push to support more of the intro epidemiology metrics in add_difference()?

It could be part of a more global discussion about producing a table (one-way or two-way) where the content of the cells depends on a custom function applied to other variables (e.g. the sum of a third variable, the ratio of two variables, etc.), applied to subgroups.

I think I need to see an example of how tbl_custom() would be used? FYI, the entire data set is available in add_stat() to add custom columns based on any number of variables in the data frame. Have you seen the new function coming that will be well suited for 2-way ANOVA?
#953

larmarange · 2021-08-31T07:07:36Z

Just thinking and trying to mature the idea.

add_stat() is great but really limited to advanced users, as you have to provide a function returning several values (so the function has to manage subsetting for categories), and location. add_stat() is also designed to add additional columns to an existing table rather than creating the table. add_stat() is receiving the full data.

I have seen tbl_continuous() but didn't play extensively with yet. But we have a similar concept here. We have a table with colums and rows according to certain variables, and content determined by a third variable. However, the third variable can only be one, only continuous. You can pass a custom function, and in that case this function will receive a vector being a subset of that third variable according to the current cell where we are.

In term of concepts, tbl_custom_stat() could be an extension of tbl_continuous(), where you provide a custom function receiving a subset of the overall dataset and calculating only one cell at a time. So the custom function could be much easier to write.

Maybe to be more consistent with other gtsummary function, that custom function would return a list of stats, and the user will have to provide a pattern, and the styling will be manage as usual.

So a concept example could be:

my_fun <- function(data) {
  num <- sum(data$cases)
  denom <- sum(data$observation_time)
  list(num = num, denom = denom, ratio = num / denom)
}

data %>%
  tbl_custom(
    include = group,
    by = sex,
    stat_fun = my_fun,
    pattern = "{ratio} ({num}/{denom})",
    digits = c(2, 1, 1)
  )

larmarange · 2021-08-31T07:27:00Z

To allow most of the customization, it should of course be possible to return character values in stat_fun

ddsjoberg · 2021-08-31T13:03:26Z

@larmarange nice, it's taking me some time to digest the suggestion. We're in agreement that we should have support/helper functions to construct custom tables.

Whatever we decide to implement will need follow the existing framework of a gtsummary table outlined here: https://www.danieldsjoberg.com/gtsummary/dev/articles/gtsummary_definition.html So this custom function will also have enough information to construct the internals.

Here's how I would construct the table with add_stat() and it's complicated...

library(gtsummary)

ratio_fun <- function(data, variable, by, tbl = NULL, ...) {
  data %>%
    dplyr::group_by_at(c(variable, by)) %>%
    dplyr::summarize(
      num = sum(death, na.rm = TRUE), 
      denom = sum(ttdeath, na.rm = TRUE),
      ratio = num / denom,
      .groups = "drop"
    ) %>%
    dplyr::mutate(
      dplyr::across(all_of(c("num", "denom", "ratio")), 
                    style_sigfig),
      stat = glue::glue("{ratio} ({num}/{denom})")
    ) %>%
    dplyr::rename(variable = all_of(variable),
                  by = all_of(by)) %>%
    dplyr::left_join(
      tbl$df_by %>% select(by, by_col) %>% mutate(by_col = paste0("add_", by_col)),
      by = "by"
    ) %>%
    select(all_of(c("variable", "by_col", "stat"))) %>%
    tidyr::pivot_wider(
      id_cols = all_of("variable"),
      values_from = all_of("stat"),
      names_from = all_of("by_col")
    ) %>%
    dplyr::arrange(variable) %>%
    dplyr::select(-variable)
}

trial %>%
  tbl_summary(
    by = trt,
    include = grade
  ) %>%
  modify_column_hide(all_stat_cols()) %>%
  add_stat(fns = everything() ~ ratio_fun,
           location = everything() ~ "level") %>%
  as_kable()

Characteristic	add_stat_1	add_stat_2
Grade
I	0.02 (16/734)	0.02 (17/690)
II	0.02 (16/651)	0.03 (20/645)
III	0.03 (20/598)	0.04 (23/607)

^{Created on 2021-08-31 by the reprex package (v2.0.1)}

ddsjoberg · 2021-09-01T14:47:08Z

Perhaps what we need is a tbl_base() function that sets the structure of a gtsummary table, and then make an interface to easily update it?

For example,

trial %>%
  select(age, grade) %>%
  tbl_base()

Then we can use functions like modify_table_body() to merge in new results, and add other helpers as needed?

This will ensure the proper internal structure is maintained.

larmarange · 2021-09-01T18:57:27Z

Thanks @ddsjoberg for your feedback. Sorry I'm currently traveling and do not always have access to Wifi.

Whatever we decide to implement will need follow the existing framework of a gtsummary table

Totally agree

Here's how I would construct the table with add_stat() and it's complicated...

That's the thing. add_stat() is great but not adapted for many users. I would say that modify_table_body() has a similar issue as it requires to have a good understanding of the structure of a tbl object. I'm sure there is room for intermediate functions.

There is no rush here but I have the feeling that it would be great to think about both very generic functions but limited to very advanced users, and probably some more limited but easier to implement functions.

In any cases, a FAQ where it would be possible to centralise many tricks around gtsummary would be great.

larmarange · 2021-09-02T16:53:02Z

tbl_base could be a good idea for code mutualisation between different functions.

It could even be used for tbl_summary

larmarange · 2021-09-02T17:48:50Z

It took me some time but I guess that I finally get your point. Using tbl_base and modify_table_body would make the writing of new functions easier and will maintain consistency.

Great idea

ddsjoberg · 2021-09-02T17:58:21Z

It took me some time but I guess that I finally get your point. Using tbl_base and modify_table_body would make the writing of new functions easier and will maintain consistency.
@larmarange

I also take your point that modify_table_body () and add_stat() are not easy to use and that some kind of easier to use, but still flexible would be helpful.

Maybe something to bind a tibble you've created to an existing gtsummary table (this would be like a modify_table_body(~bind_cols()) wrapper), and perhaps a left_join() sister function that can merge on the label column only (or maybe label and variable).

I am still not sure what the best move is. But it's good to have these conversations and let the ideas mature.

larmarange · 2021-09-02T20:55:25Z

Yes. We clearly need time to mature it. I will try to do some tests as a proof of concept as soon as I find some time

larmarange · 2021-09-09T15:23:29Z

Dear @ddsjoberg

Just to experiment a little, I have drafted a tbl_custom_summary() function. It is visible here: #976

You will also find concrete examples.

In fact, instead of creating an empty table and then use modify_table_body, it seems easier to create a custom df_stats and then to use all the already existing procedures implemented in tbl_summary().

* draft of tbl_custom_summary fix #973 * document * filter NA values * pass stat_display as well * using dplyr::group_modify instead of summarise * document * cleaning * doc update * experimental * Option for adding an overall row * documentation improvements * better example * documentation updates * Create test-tbl_custom_summary.R * doc updates * Doc update Caution section and not mentionning internal variable * first helper to tbl_custom_summary * Update test-tbl_custom_summary.R * Update test-tbl_custom_summary.R * fix for continuous_summary() * .drop = FALSE when grouping * avoiding .by and .variable * theme elements for tbl_custom_summary() * ratio_summary * proportion_summary() * improved examples * doc update * additional tests for tbl_custom_summary * spell_check * simpler syntax * too long example * misc updates * doc updates Co-authored-by: Daniel Sjoberg <danield.sjoberg@gmail.com>

larmarange changed the title ~~Ratio / Incidence table~~ Custom table Sep 2, 2021

larmarange mentioned this issue Sep 9, 2021

tbl_custom_summary() #976

Merged

ddsjoberg added this to the v1.5.0 milestone Sep 21, 2021

ddsjoberg closed this as completed in #976 Oct 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom table #973

Custom table #973

larmarange commented Aug 30, 2021

larmarange commented Aug 30, 2021

ddsjoberg commented Aug 30, 2021

larmarange commented Aug 31, 2021

larmarange commented Aug 31, 2021

ddsjoberg commented Aug 31, 2021

ddsjoberg commented Sep 1, 2021 •

edited

Loading

larmarange commented Sep 1, 2021

larmarange commented Sep 2, 2021

larmarange commented Sep 2, 2021

ddsjoberg commented Sep 2, 2021

larmarange commented Sep 2, 2021

larmarange commented Sep 9, 2021

Custom table #973

Custom table #973

Comments

larmarange commented Aug 30, 2021

larmarange commented Aug 30, 2021

ddsjoberg commented Aug 30, 2021

larmarange commented Aug 31, 2021

larmarange commented Aug 31, 2021

ddsjoberg commented Aug 31, 2021

ddsjoberg commented Sep 1, 2021 • edited Loading

larmarange commented Sep 1, 2021

larmarange commented Sep 2, 2021

larmarange commented Sep 2, 2021

ddsjoberg commented Sep 2, 2021

larmarange commented Sep 2, 2021

larmarange commented Sep 9, 2021

ddsjoberg commented Sep 1, 2021 •

edited

Loading