Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: add_ci.tbl_svysummary() incorrect levels order #2036

Closed
BLalloue opened this issue Oct 9, 2024 · 4 comments · Fixed by #2037
Closed

Bug Report: add_ci.tbl_svysummary() incorrect levels order #2036

BLalloue opened this issue Oct 9, 2024 · 4 comments · Fixed by #2037

Comments

@BLalloue
Copy link

BLalloue commented Oct 9, 2024

Hello,

I think there's a problem managing the level order for factors in add_ci.tbl_svysummary(). The confidence intervals are not aligned on the lines corresponding to the point estimates and it seems that the same order (alphabetical order ?) is systematically used to display the CIs, even if the factor has levels ordered differently.

Here's a repex with first the normal behavior for a tibble and then the abnormal behavior on a survey object.

library(dplyr)
library(forcats)
library(survey)
library(gtsummary)

set.seed("123")

# generate sample data
testdata <- tibble(
    strata = sample(1:3, 100, replace = TRUE),
    factor = sample(c("A", "B", "C", "D", "E", NA), 100, replace = TRUE)
) %>% 
    mutate(
        factor = as.factor(factor),
        factor2 = fct_relevel(factor, "E", "A", "D"),
        fpc = case_when(
            strata == 1 ~ 100,
            strata == 2 ~ 200,
            strata == 3 ~ 300
        )
    )

# add_ci() for tbl_summary respects the level order specific to each factor -> WORKS
testdata %>% 
    tbl_summary(include = c(factor, factor2)) %>% 
    add_ci()
Characteristic N = 1001 95% CI2
factor

    A 12 (15%) 8.5%, 26%
    B 18 (23%) 15%, 34%
    C 13 (17%) 9.5%, 27%
    D 17 (22%) 14%, 33%
    E 18 (23%) 15%, 34%
    Unknown 22
factor2

    E 18 (23%) 15%, 34%
    A 12 (15%) 8.5%, 26%
    D 17 (22%) 14%, 33%
    B 18 (23%) 15%, 34%
    C 13 (17%) 9.5%, 27%
    Unknown 22
1 n (%)
2 CI = Confidence Interval
# add_ci() for tbl_svysummary wrongly keeps the same level order for both factors (alphabetical order?) -> BUG ?
testsurvey <- svydesign(ids = ~ 1, strata = ~ strata, data = testdata)
#> Warning in svydesign.default(ids = ~1, strata = ~strata, data = testdata): No
#> weights or probabilities supplied, assuming equal probability
testsurvey %>% 
    tbl_svysummary(include = c(factor, factor2)) %>% 
    add_ci()
Characteristic N = 1001 95% CI2
factor

    A 12 (15%) 8.8%, 25%
    B 18 (23%) 15%, 34%
    C 13 (17%) 9.8%, 27%
    D 17 (22%) 14%, 33%
    E 18 (23%) 15%, 34%
    Unknown 22
factor2

    E 18 (23%) 8.8%, 25%
    A 12 (15%) 15%, 34%
    D 17 (22%) 9.8%, 27%
    B 18 (23%) 14%, 33%
    C 13 (17%) 15%, 34%
    Unknown 22
1 n (%)
2 CI = Confidence Interval
# Same thing when using fpc (note how for level E of `factor2` the CI does not contains the estimate) -> BUG ?
testsurvey2 <- svydesign(ids = ~ 1, strata = ~ strata, fpc = ~fpc, data = testdata)
testsurvey2 %>% 
    tbl_svysummary(include = c(factor, factor2)) %>% 
    add_ci()
Characteristic N = 6001 95% CI2
factor

    A 69 (15%) 8.9%, 25%
    B 99 (22%) 14%, 32%
    C 78 (17%) 10%, 28%
    D 90 (20%) 13%, 30%
    E 115 (26%) 17%, 37%
    Unknown 149
factor2

    E 115 (26%) 8.9%, 25%
    A 69 (15%) 14%, 32%
    D 90 (20%) 10%, 28%
    B 99 (22%) 13%, 30%
    C 78 (17%) 17%, 37%
    Unknown 149
1 n (%)
2 CI = Confidence Interval

Created on 2024-10-09 with reprex v2.1.1

Session info
sessionInfo()
#> R version 4.4.1 (2024-06-14 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 10 x64 (build 19044)
#> 
#> Matrix products: default
#> 
#> 
#> locale:
#> [1] LC_COLLATE=French_France.utf8  LC_CTYPE=French_France.utf8   
#> [3] LC_MONETARY=French_France.utf8 LC_NUMERIC=C                  
#> [5] LC_TIME=French_France.utf8    
#> 
#> time zone: Europe/Paris
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] grid      stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#> [1] gtsummary_2.0.3 survey_4.4-2    survival_3.7-0  Matrix_1.7-0   
#> [5] forcats_1.0.0   dplyr_1.1.4    
#> 
#> loaded via a namespace (and not attached):
#>  [1] compiler_4.4.1    tidyselect_1.2.1  reprex_2.1.1      Rcpp_1.0.13      
#>  [5] xml2_1.3.6        tidyr_1.3.1       splines_4.4.1     yaml_2.3.10      
#>  [9] fastmap_1.2.0     lattice_0.22-6    R6_2.5.1          commonmark_1.9.2 
#> [13] cards_0.3.0       generics_0.1.3    knitr_1.48        backports_1.5.0  
#> [17] cardx_0.2.1       tibble_3.2.1      DBI_1.2.3         pillar_1.9.0     
#> [21] rlang_1.1.4       utf8_1.2.4        broom_1.0.7       xfun_0.48        
#> [25] sass_0.4.9        fs_1.6.4          cli_3.6.3         withr_3.0.1      
#> [29] magrittr_2.0.3    digest_0.6.37     rstudioapi_0.16.0 markdown_1.13    
#> [33] lifecycle_1.0.4   vctrs_0.6.5       evaluate_1.0.0    glue_1.8.0       
#> [37] mitools_2.4       gt_0.11.1.9000    fansi_1.0.6       rmarkdown_2.28   
#> [41] purrr_1.0.2       tools_4.4.1       pkgconfig_2.0.3   htmltools_0.5.8.1

Thank you for all your work and for your help!

@BLalloue BLalloue changed the title Bug Report: add_ci.tbl_svysummary incorrect levels order Bug Report: add_ci.tbl_svysummary() incorrect levels order Oct 9, 2024
@ayogasekaram
Copy link
Collaborator

Hey @ddsjoberg! I investigated this issue and I believe the reorder comes from gtsummary. I ran ard_categorical_ci() from cardx on the factor and factor2, the level orders are correct and the ci values match the levels.

@ddsjoberg
Copy link
Owner

Thank you @BLalloue for reporting!! Also, thank you @ayogasekaram for looking into the source.

I took a look and the group_map() in brdg_add_ci() is re-ordering the survey results alphabetically for survey results. This happens to survey data and not typical data frames because the survey package does not retain variables types (like factors) when the CIs are calculated, and so what was formerly a factor gets sorted like a character value. There are two ways to approach a solution:

  1. In {cardx}, assess whether the original class/type of a variable can be forced back onto the variable.
  2. Update the group_map() section of code to ensure re-ordering is reset after calculation.

I think the first proposal would be best, but also the most difficult because it would also involve updates to the ard_categorical() survey S3 method. So we'll go with the second.

@ddsjoberg
Copy link
Owner

Thank you @BLalloue for the lovely reproducible example! If you could please install the dev version of gtsummary from github and check your use case with the update to ensure the bug is fixed, it would be greatly appreciated.

@BLalloue
Copy link
Author

Thank you very much for your reply and quick update!
I no longer have any problems either with my repex or with my use case when I use the dev version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants