Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: tbl_summary throws error when grouping and data contains a factor filled with NAs #977

Closed
erikvona opened this issue Sep 17, 2021 · 4 comments · Fixed by #978
Milestone

Comments

@erikvona
Copy link

I'm using tbl_summary programmatically.
If I have a table with a factor variable and all values happen to be NA, an error occurs:

Reprex:

library(gtsummary)
trial2 <- trial %>% select(trt, age, grade)
trial2$has_banana <- factor(NA) # We don't know which patients have a banana
trial2 %>% tbl_summary(by = trt) # Error :(

Error: Problem with mutate() column tbl_stats.
i tbl_stats = pmap(...).
x Can't subset columns that don't exist.
x Columns stat_1 and stat_2 don't exist.

If the factor has levels, tbl_summary works, but add_p fails:

library(gtsummary)
trial3 <- trial %>% select(trt, age, grade)
trial3$has_banana <- factor(NA, levels = c("Yes", "No")) # We don't know which patients have a banana
trial3 %>% tbl_summary(by = trt) %>% add_p() # Error :(

Error: Problem with mutate() column test.
i test = map2(...).
x missing value where TRUE/FALSE needed

(I tend to run a fct_drop across all factors so mine don't have levels, but it would be nice if both could get fixed so tbl_summary and add_p could produce valid output for any factor).

@ddsjoberg ddsjoberg added this to the v1.5.0 milestone Sep 17, 2021
@ddsjoberg
Copy link
Owner

Thank you @erikvona for reporting the bug! I am surprised this hasn't been addressed previously. We'll ensure this fix makes it into the next release

@ddsjoberg
Copy link
Owner

@erikvona

The issue here is that the factor variable has no levels, FYI. If there is a level, the function doesn't return an error.

trial %>% 
  dplyr::mutate(
    has_banana = factor(NA, levels = "level") # We don't know which patients have a banana
  ) %>% 
  tbl_summary(by = trt, include = has_banana)

image

@erikvona
Copy link
Author

@ddsjoberg That's true, but if the factor has levels, add_p() still fails, see the second example. If desired I could split this in two reports, since they could be considered separate issues.

@ddsjoberg
Copy link
Owner

ddsjoberg commented Sep 17, 2021

Thanks for adding the second example. I am surprised to see that the fisher.test() returns a p-value when all observations are missing! 😨 😨 😨

> with(df, table(has_banana, trt))
          trt
has_banana Drug A Drug B
       Yes      0      0
       No       0      0

> with(df, table(has_banana, trt)) %>%
+   fisher.test()

	Fisher's Exact Test for Count Data

data:  .
p-value = 1
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
   0 Inf
sample estimates:
odds ratio 
         0 

ddsjoberg added a commit that referenced this issue Sep 17, 2021
ddsjoberg added a commit that referenced this issue Sep 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants