Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: tbl_summary "is variable class supported?" error for numeric variables #1403

Closed
emilyvertosick opened this issue Dec 14, 2022 · 3 comments · Fixed by #1407
Closed
Milestone

Comments

@emilyvertosick
Copy link
Contributor

When running tbl_summary() with the "continuous" option for all variables, I am getting an error that states there was an issue calculating summary statistics and asks whether the variable's class is supported by "median" and "quantile". However this is simply a numeric variable. The code works correctly for categorical variables and if I use as.numeric on the swallowing variable.

library(tidyverse)
library(gtsummary)
#> Warning: package 'gtsummary' was built under R version 4.2.2

df <-
  structure(list(swallowing = structure(c(NA, 53, 100, 0, 100),
                                        names = c("", "", "", "", "")),
                 salivation = structure(c(NA, 100, 46, 62, 100),
                                        names = c("", "", "", "", ""))),
            row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"))

# this doesn't work
tbl_summary(df, type = list(everything() ~ "continuous"))
#> ✖ There was an error calculating the summary statistics for "swallowing". Is this variable's class supported by `median` and `quantile`?
#> Error in `mutate()`:
#> ! Problem while computing `df_stats = pmap(...)`.
#> Caused by error in `abort()`:
#> ! `message` must be a character vector, not a
#>   <rlang_error/error/condition> object.

# works as categorical
tbl_summary(df, type = list(everything() ~ "categorical")) %>% pluck("table_body")
#> # A tibble: 10 × 6
#>    variable   var_type    var_label  row_type label      stat_0 
#>    <chr>      <chr>       <chr>      <chr>    <chr>      <chr>  
#>  1 swallowing categorical swallowing label    swallowing <NA>   
#>  2 swallowing categorical swallowing level    0          1 (25%)
#>  3 swallowing categorical swallowing level    53         1 (25%)
#>  4 swallowing categorical swallowing level    100        2 (50%)
#>  5 swallowing categorical swallowing missing  Unknown    1      
#>  6 salivation categorical salivation label    salivation <NA>   
#>  7 salivation categorical salivation level    46         1 (25%)
#>  8 salivation categorical salivation level    62         1 (25%)
#>  9 salivation categorical salivation level    100        2 (50%)
#> 10 salivation categorical salivation missing  Unknown    1

# works if converted using as.numeric()
tbl_summary(df %>% mutate(across(everything(), as.numeric)),
                          type = list(everything() ~ "continuous")) %>%
              pluck("table_body")
#> # A tibble: 4 × 6
#>   variable   var_type   var_label  row_type label      stat_0      
#>   <chr>      <chr>      <chr>      <chr>    <chr>      <chr>       
#> 1 swallowing continuous swallowing label    swallowing 76 (40, 100)
#> 2 swallowing continuous swallowing missing  Unknown    1           
#> 3 salivation continuous salivation label    salivation 81 (58, 100)
#> 4 salivation continuous salivation missing  Unknown    1

Created on 2022-12-14 by the reprex package (v2.0.1)

@ddsjoberg
Copy link
Owner

Hmmm, can you show the results of calling median() and quantile() directly on that column?

@ddsjoberg ddsjoberg added this to the v1.6.4 milestone Dec 14, 2022
@emilyvertosick
Copy link
Contributor Author

Both of those appear to work fine...

library(tidyverse)
library(gtsummary)

df <-
  structure(list(swallowing = structure(c(NA, 53, 100, 0, 100),
                                        names = c("", "", "", "", "")),
                 salivation = structure(c(NA, 100, 46, 62, 100),
                                        names = c("", "", "", "", ""))),
            row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"))

# median and quantile both appear to work fine on their own
quantile(df$swallowing, c(0.25, 0.75), na.rm = TRUE)
#>    25%    75% 
#>  39.75 100.00
median(df$swallowing, na.rm = TRUE)
#> [1] 76.5

Created on 2022-12-15 by the reprex package (v2.0.1)

@ddsjoberg
Copy link
Owner

The issue here has to do with some nonsense I had written in to keep the attributes of the original vector.

I am trying to add the attr back to the result after the summary stat is calculated. But in this case, the attribute is an empty names attribute whose length does not match the summary statistic...therefore erring.

I wonder if it would be simpler to just unclass everything before passing it to the summary function? I think this would destroy date summaries...probably something else too. Ugh, there is always something!

safe_summarise_at <- function(data, variable, fns) {
  tryCatch({
    # ref for all this `.keep_attr()` nonsense stackoverflow.com/questions/67291199
    dplyr::summarise_at(data,
                        vars("variable"),
                        map(
                          fns,
                          function(.x) {
                            if (identical(.x, stats::median))
                              return(rlang::inject(function(x) .keep_attr(x, .f = !!.x)))
                            else return(.x)
                          }
                        ))
    },
    error = function(e) {
      # replace p[0:100] stats with `quantile`
      fns_names <- stringr::str_replace(names(fns), "^p\\d+$", "quantile") %>% unique()
      paste(
        "There was an error calculating the summary statistics",
        "for {.val {variable}}. Is this variable's class",
        "supported by {.code {fns_names}}?"
      ) %>%
        cli::cli_alert_danger()

      abort(e)
    }
  )
}

.keep_attr <- function(x, .f) {
  x_att <- attributes(x)
  res <- .f(x)
  attributes(res) <- x_att
  res
}

ddsjoberg added a commit that referenced this issue Dec 20, 2022
@ddsjoberg ddsjoberg mentioned this issue Dec 20, 2022
15 tasks
ddsjoberg added a commit that referenced this issue Dec 20, 2022
* #1403 fix

* Update test-tbl_summary.R
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants