Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: deff for tbl_svy_summary() #1486

Closed
aspina7 opened this issue Apr 11, 2023 · 3 comments · Fixed by #1487
Closed

Feature request: deff for tbl_svy_summary() #1486

aspina7 opened this issue Apr 11, 2023 · 3 comments · Fixed by #1487

Comments

@aspina7
Copy link

aspina7 commented Apr 11, 2023

Is your feature request related to a problem? Please describe.
Hello! It would be really useful for surveys if we could report the design effect using tbl_svy_summary().
This is useful because it can then be used to calculate the intra-class correlation coefficient - which can be used in future surveys to calculate the sample size required. (detail in additional context). Thanks so much!

Describe the solution you'd like
An add_deff() function would be nice - though I feel it might also fit within add_ci() given that this already calls the appropriate {survey} functions.

Describe alternatives you've considered
It would be possible to use tbl_svysummary_custom() once up to wrap some of the code discussed here . Alternatively it might be possible to use add_stat() pulling the se from the existing table body. But both of those seem really messy for end-users and deff reporting is pretty common practice.

Additional context
To account for the additional variability at the different stages of
complex designs, the sample size and sample estimates can be adjusted by
a factor known as the design effect ($deff$). This compares the variance
(i.e. the square of the Standard Error (SE)) of estimates from the more
complex design used, to the variance that would come from the same
sample size if simple random sampling had been used.

For cluster sampling, the variance can be calculated with the following
formula:

$$ SE^2 = \frac {\sum(p_i-p)^2} {m \cdot (m-1)} \cdot (1-m/M) $$

where:

  • $SE^2$ is the variance (square of the Standard Error)

  • $p_i$ is the proportion (e.g. vaccination coverage) in each cluster

  • $p$ is the estimated proportion for the whole population (e.g. 85%
    vaccination coverage)

  • $m$ is the number of clusters selected in the sample (e.g. 342
    school classes in this study)

  • $M$ is the total number of clusters in the population (e.g. XX
    school classes in the whole country)

The design effect can then be calculated by:

$$ deff = \frac {SE^2 \text { from complex design}} {SE^2 \text { from simple random sampling}} $$

The sample size will increase by the amount of the design effect. For
example, if the design effect is estimated as 1.5, this means that in
order to obtain the same precision, 50% more individuals must be studied
with the complex design than with the simple random sampling strategy.

the design effect can also be calculated
with the intra-cluster correlation coefficient, or $rho$ :

$$ deff = 1 + (n - 1) \cdot rho $$

where:

  • $n$ is the average number of subjects per cluster and

  • $rho$ is the intra-class correlation coefficient or rate of
    homogeneity for the outcome of interest.

@larmarange
Copy link
Collaborator

larmarange commented Apr 11, 2023

Could you have a look at #1487 It adds"{deff}" to the statistics that could be called in tbl_svysummary().

library(gtsummary)
#> #Uighur
data(api, package = "survey")
d <- survey::svydesign(id = ~dnum, weights = ~pw, data = apiclus1, fpc = ~fpc)
d %>%
  tbl_svysummary(
    by = both,
    include = full,
    statistic = all_continuous() ~ "{mean} ({deff})"
  ) %>%
  add_overall() %>%
  as_kable()
Characteristic Overall, N = 6,194 No, N = 1,692 Yes, N = 4,502
full 88 (8) 90 (4) 87 (5)
d %>%
  tbl_svysummary(
    by = both,
    include = stype,
    statistic = all_categorical() ~ "{p}% ({deff})"
  ) %>%
  add_overall() %>%
  as_kable()
Characteristic Overall, N = 6,194 No, N = 1,692 Yes, N = 4,502
stype
E 79% (2.4) 64% (1.9) 84% (1.7)
H 7.7% (1.9) 14% (0.91) 5.3% (1.4)
M 14% (1.4) 22% (2.2) 11% (1.3)
d %>%
  tbl_svysummary(
    by = both,
    include = stype,
    percent = "row",
    statistic = all_categorical() ~ "{p}% ({deff})"
  ) %>%
  add_overall() %>%
  as_kable()
Characteristic Overall, N = 6,194 No, N = 1,692 Yes, N = 4,502
stype
E 100% (NA) 22% (0.74) 78% (0.74)
H 100% (NA) 50% (0.72) 50% (0.72)
M 100% (NA) 44% (1.8) 56% (1.8)
d %>%
  tbl_svysummary(
    by = both,
    include = stype,
    percent = "cell",
    statistic = all_categorical() ~ "{p}% ({deff})"
  ) %>%
  add_overall() %>%
  as_kable()
Characteristic Overall, N = 6,194 No, N = 1,692 Yes, N = 4,502
stype
E 79% (2.4) 17% (0.88) 61% (1.4)
H 7.7% (1.9) 3.8% (1.3) 3.8% (1.3)
M 14% (1.4) 6.0% (2.0) 7.7% (1.2)

Created on 2023-04-11 with reprex v2.0.2

@larmarange
Copy link
Collaborator

Note: design effects are computed using svymean(deff = TRUE)

ddsjoberg added a commit that referenced this issue Apr 15, 2023
* `tbl_svysummary()` can now report design effects

fix #1486

* updates to testing file

* snap update

* increment version number

* Update DESCRIPTION

* snapshot update

---------

Co-authored-by: Daniel Sjoberg <danield.sjoberg@gmail.com>
@aspina7
Copy link
Author

aspina7 commented Apr 23, 2023

sorry for the delay - this looks perfect thanks so much for the speedy implementation!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants