Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom table #973

Closed
larmarange opened this issue Aug 30, 2021 · 12 comments · Fixed by #976
Closed

Custom table #973

larmarange opened this issue Aug 30, 2021 · 12 comments · Fixed by #976
Milestone

Comments

@larmarange
Copy link
Collaborator

Do we have an easy way to produce a ratio / incidence table, i.e. a table where we report the ratio between two variables (i.e. a numerator and a denominator)?

It could be part of a more global discussion about producing a table (one-way or two-way) where the content of the cells depends on a custom function applied to other variables (e.g. the sum of a third variable, the ratio of two variables, etc.), applied to subgroups.

Somehow, such a table would depend on (i) a list of categorical variables for rows; (ii) an optional categorical variable display in columns; (iii) a custom function applied on the subsample.

Would it be something worth considering?

@larmarange
Copy link
Collaborator Author

A sort of tbl_custom(data, rows, by, stat) with stat being a function take a data.frame as input and returning a character vector of statistics.

rows could be a list of variables, and by could be a single categorical variable.

tbl_custom() will generate a subsample of data (based on rows and by) but containing all variables of data and will pass it to stat function. The content of rows and by could eventually be added to the data.frame as .row and .by additional columns.

If a continuous variable is passed to rows, the entire dataset is passed through (but .row will be populated accordingly).

Such a tbl_custom() function would allow users to generate very specific and customized tables.

@ddsjoberg, what do you think?

@ddsjoberg
Copy link
Owner

Do we have an easy way to produce a ratio / incidence table, i.e. a table where we report the ratio between two variables (i.e. a numerator and a denominator)?

Not super easy, for sure! Users could easily merge results from a univariate Poisson regression, for example, to get those rates in a table. They could also use a custom function with add_difference() to produce RR, IRR, OR, whatever. But I do think the custom functions are not easy to implement for a passive user. We could make a push to support more of the intro epidemiology metrics in add_difference()?

It could be part of a more global discussion about producing a table (one-way or two-way) where the content of the cells depends on a custom function applied to other variables (e.g. the sum of a third variable, the ratio of two variables, etc.), applied to subgroups.

I think I need to see an example of how tbl_custom() would be used? FYI, the entire data set is available in add_stat() to add custom columns based on any number of variables in the data frame. Have you seen the new function coming that will be well suited for 2-way ANOVA?
#953
image

@larmarange
Copy link
Collaborator Author

Just thinking and trying to mature the idea.

add_stat() is great but really limited to advanced users, as you have to provide a function returning several values (so the function has to manage subsetting for categories), and location. add_stat() is also designed to add additional columns to an existing table rather than creating the table. add_stat() is receiving the full data.

I have seen tbl_continuous() but didn't play extensively with yet. But we have a similar concept here. We have a table with colums and rows according to certain variables, and content determined by a third variable. However, the third variable can only be one, only continuous. You can pass a custom function, and in that case this function will receive a vector being a subset of that third variable according to the current cell where we are.

In term of concepts, tbl_custom_stat() could be an extension of tbl_continuous(), where you provide a custom function receiving a subset of the overall dataset and calculating only one cell at a time. So the custom function could be much easier to write.

Maybe to be more consistent with other gtsummary function, that custom function would return a list of stats, and the user will have to provide a pattern, and the styling will be manage as usual.

So a concept example could be:

my_fun <- function(data) {
  num <- sum(data$cases)
  denom <- sum(data$observation_time)
  list(num = num, denom = denom, ratio = num / denom)
}

data %>%
  tbl_custom(
    include = group,
    by = sex,
    stat_fun = my_fun,
    pattern = "{ratio} ({num}/{denom})",
    digits = c(2, 1, 1)
  )

@larmarange
Copy link
Collaborator Author

To allow most of the customization, it should of course be possible to return character values in stat_fun

@ddsjoberg
Copy link
Owner

@larmarange nice, it's taking me some time to digest the suggestion. We're in agreement that we should have support/helper functions to construct custom tables.

Whatever we decide to implement will need follow the existing framework of a gtsummary table outlined here: https://www.danieldsjoberg.com/gtsummary/dev/articles/gtsummary_definition.html So this custom function will also have enough information to construct the internals.

Here's how I would construct the table with add_stat() and it's complicated...

library(gtsummary)

ratio_fun <- function(data, variable, by, tbl = NULL, ...) {
  data %>%
    dplyr::group_by_at(c(variable, by)) %>%
    dplyr::summarize(
      num = sum(death, na.rm = TRUE), 
      denom = sum(ttdeath, na.rm = TRUE),
      ratio = num / denom,
      .groups = "drop"
    ) %>%
    dplyr::mutate(
      dplyr::across(all_of(c("num", "denom", "ratio")), 
                    style_sigfig),
      stat = glue::glue("{ratio} ({num}/{denom})")
    ) %>%
    dplyr::rename(variable = all_of(variable),
                  by = all_of(by)) %>%
    dplyr::left_join(
      tbl$df_by %>% select(by, by_col) %>% mutate(by_col = paste0("add_", by_col)),
      by = "by"
    ) %>%
    select(all_of(c("variable", "by_col", "stat"))) %>%
    tidyr::pivot_wider(
      id_cols = all_of("variable"),
      values_from = all_of("stat"),
      names_from = all_of("by_col")
    ) %>%
    dplyr::arrange(variable) %>%
    dplyr::select(-variable)
}

trial %>%
  tbl_summary(
    by = trt,
    include = grade
  ) %>%
  modify_column_hide(all_stat_cols()) %>%
  add_stat(fns = everything() ~ ratio_fun,
           location = everything() ~ "level") %>%
  as_kable()
Characteristic add_stat_1 add_stat_2
Grade
I 0.02 (16/734) 0.02 (17/690)
II 0.02 (16/651) 0.03 (20/645)
III 0.03 (20/598) 0.04 (23/607)

Created on 2021-08-31 by the reprex package (v2.0.1)

@ddsjoberg
Copy link
Owner

ddsjoberg commented Sep 1, 2021

Perhaps what we need is a tbl_base() function that sets the structure of a gtsummary table, and then make an interface to easily update it?

For example,

trial %>%
  select(age, grade) %>%
  tbl_base()

image

Then we can use functions like modify_table_body() to merge in new results, and add other helpers as needed?

This will ensure the proper internal structure is maintained.

@larmarange
Copy link
Collaborator Author

Thanks @ddsjoberg for your feedback. Sorry I'm currently traveling and do not always have access to Wifi.

Whatever we decide to implement will need follow the existing framework of a gtsummary table

Totally agree

Here's how I would construct the table with add_stat() and it's complicated...

That's the thing. add_stat() is great but not adapted for many users. I would say that modify_table_body() has a similar issue as it requires to have a good understanding of the structure of a tbl object. I'm sure there is room for intermediate functions.

There is no rush here but I have the feeling that it would be great to think about both very generic functions but limited to very advanced users, and probably some more limited but easier to implement functions.

In any cases, a FAQ where it would be possible to centralise many tricks around gtsummary would be great.

@larmarange
Copy link
Collaborator Author

tbl_base could be a good idea for code mutualisation between different functions.

It could even be used for tbl_summary

@larmarange
Copy link
Collaborator Author

It took me some time but I guess that I finally get your point. Using tbl_base and modify_table_body would make the writing of new functions easier and will maintain consistency.

Great idea

@larmarange larmarange changed the title Ratio / Incidence table Custom table Sep 2, 2021
@ddsjoberg
Copy link
Owner

It took me some time but I guess that I finally get your point. Using tbl_base and modify_table_body would make the writing of new functions easier and will maintain consistency.
@larmarange

I also take your point that modify_table_body () and add_stat() are not easy to use and that some kind of easier to use, but still flexible would be helpful.

Maybe something to bind a tibble you've created to an existing gtsummary table (this would be like a modify_table_body(~bind_cols()) wrapper), and perhaps a left_join() sister function that can merge on the label column only (or maybe label and variable).

I am still not sure what the best move is. But it's good to have these conversations and let the ideas mature.

@larmarange
Copy link
Collaborator Author

Yes. We clearly need time to mature it. I will try to do some tests as a proof of concept as soon as I find some time

@larmarange
Copy link
Collaborator Author

Dear @ddsjoberg

Just to experiment a little, I have drafted a tbl_custom_summary() function. It is visible here: #976

You will also find concrete examples.

In fact, instead of creating an empty table and then use modify_table_body, it seems easier to create a custom df_stats and then to use all the already existing procedures implemented in tbl_summary().

@ddsjoberg ddsjoberg added this to the v1.5.0 milestone Sep 21, 2021
ddsjoberg added a commit that referenced this issue Oct 1, 2021
* draft of tbl_custom_summary

fix #973

* document

* filter NA values

* pass stat_display as well

* using dplyr::group_modify instead of summarise

* document

* cleaning

* doc update

* experimental

* Option for adding an overall row

* documentation improvements

* better example

* documentation updates

* Create test-tbl_custom_summary.R

* doc updates

* Doc update

Caution section and not mentionning internal variable

* first helper to tbl_custom_summary

* Update test-tbl_custom_summary.R

* Update test-tbl_custom_summary.R

* fix for continuous_summary()

* .drop = FALSE when grouping

* avoiding .by and .variable

* theme elements for tbl_custom_summary()

* ratio_summary

* proportion_summary()

* improved examples

* doc update

* additional tests for tbl_custom_summary

* spell_check

* simpler syntax

* too long example

* misc updates

* doc updates

Co-authored-by: Daniel Sjoberg <danield.sjoberg@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants