-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug Report: using chi² whereas Fisher is expected #1513
Comments
I think you calculated your expected counts incorrectly library(gtsummary)
ttt <- tibble::tibble(type = character(), answer = character()) %>%
tibble::add_row(tidyr::uncount(tibble::tibble(type = "A", answer = "C1"), 5)) %>%
tibble::add_row(tidyr::uncount(tibble::tibble(type = "B", answer = "C1"), 10)) %>%
tibble::add_row(tidyr::uncount(tibble::tibble(type = "A", answer = "C2"), 100)) %>%
tibble::add_row(tidyr::uncount(tibble::tibble(type = "B", answer = "C2"), 305)) %>%
tibble::add_row(tidyr::uncount(tibble::tibble(type = "A", answer = NA), 400)) %>%
tibble::add_row(tidyr::uncount(tibble::tibble(type = "B", answer = NA), 300))
ttt |>
tbl_cross(statistic = "{p}%") |>
as_kable()
# expected count
0.013 * 0.45 * nrow(ttt)
#> [1] 6.552
Created on 2023-05-22 with reprex v2.0.2 |
Ah, you consider the expected counts by including the NA (Unknown) entries, contrary to me. This is as if NA was another category, treated on par with the C1 and C2 categories. Considering NA as yet another category certainly makes sense in some situations, but then shouldn’t the chi² test itself also consider NA as a category? Currently, gtsummary seems to apply chi² as documented, thus, “cases with missing values are removed”. I find it therefore surprising that the expected counts consider NA as a category. Note that a supplementary side effect of this treatment is that the To put it otherwise, it seems reasonable for the user to expect that the test should behave as if the NA were removed before applying the test, but this expectation is currently not matched. |
I am somewhat confused. There are no NAs in the table.... |
I am confused as well, then. I meant to refer to the r value |
you're 100% correct, there are missing values, apologies. When the NAs are removed, the expected counts are still above 5 |
library(gtsummary)
#> #Uighur
ttt <- tibble::tibble(type = character(), answer = character()) %>%
tibble::add_row(tidyr::uncount(tibble::tibble(type = "A", answer = "C1"), 5)) %>%
tibble::add_row(tidyr::uncount(tibble::tibble(type = "B", answer = "C1"), 10)) %>%
tibble::add_row(tidyr::uncount(tibble::tibble(type = "A", answer = "C2"), 100)) %>%
tibble::add_row(tidyr::uncount(tibble::tibble(type = "B", answer = "C2"), 305)) %>%
tibble::add_row(tidyr::uncount(tibble::tibble(type = "A", answer = NA), 400)) %>%
tibble::add_row(tidyr::uncount(tibble::tibble(type = "B", answer = NA), 300))
ttt |>
tbl_cross(statistic = "{p}%", missing = "no") |>
as_kable()
#> FALSE observations with missing data have been removed.
0.036 * 0.25 * nrow(ttt)
#> [1] 10.08 Created on 2023-05-23 with reprex v2.0.2 |
(and the NAs are removed before making the expected count assessment) |
That’s with |
oh goodness, i have been giving half attention to the details here, apologies....i'll investigate! |
@oliviercailloux thank you so much for reporting this! the bug occurred when there was a large number of missing in one variable relative to the other as the rates we being calculated separately, rather than a complete case estimation by both variables. again, thank you! |
I apologize in advance if I missed something again, but the following code seems to behave differently than documented.
The doc states that "Tests default to (...) "chisq.test.no.correct" for categorical variables with all expected cell counts >=5, and "fisher.test" for categorical variables with any expected cell count <5." Here, cell A/C1 has expected count 15 * 105 / 420 = 15 / 4 < 4 but gtsummary uses a chi squared test (as it indicates in the footnote, and as appears in the warning sent by chi squared test which is unhappy about the approximation).
The text was updated successfully, but these errors were encountered: