-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replicate of the expected proportion of all transmission due to a given proportion of infectious cases using {epiparameter}
and {superspreading}
#93
Comments
Thanks for posting @avallecam, I've been giving this a bit of thought and to be honest I'm not 100% on the answer. Initially I thought figure 1b from Lloyd-Smith et al. (2005) could be replicated with Two main points:
If this reasoning is correct it raises the question why Fig 1b in Lloyd-Smith et al. (2005) asymptotes, and whether they are calculating the same thing. (The x-axis is confusing to me, not exactly sure what is meant by "Proportion of infected cases (ranked)".)
library(epiparameter)
library(superspreading)
library(tidyverse)
# list of diseases with offspring distribution
epidist_string <- epidist_db(
epi_dist = "offspring distribution"
) %>%
list_distributions() %>%
dplyr::select(disease) %>%
distinct() %>%
as_tibble()
#> Returning 10 results that match the criteria (10 are parameterised).
#> Use subset to filter by entry variables or single_epidist to return a single entry.
#> To retrieve the short citation for each use the 'get_citation' function
# get percent of cases that cause percent of transmission
across_offspring <- epidist_string %>%
# add column list of epidist objects
mutate(
epidist_out =
map(
.x = disease,
.f = epiparameter::epidist_db,
epi_dist = "offspring distribution",
single_epidist = TRUE
)
) %>%
# get parameters
mutate(
epidist_params =
map(
.x = epidist_out,
.f = epiparameter::get_parameters
)
) %>%
# unnest parameters
unnest_wider(col = epidist_params) %>%
# to each disease, add sequence from 0.01 to 1 (proportion of transmission)
expand_grid(percent_transmission = seq(from = 0.01, to = 1, by = 0.01)) %>%
# estimate proportion of cases responsible of proportion of transmission (row)
mutate(
transmission_output =
pmap(
.l = select(., R = mean, k = dispersion, percent_transmission),
.f = superspreading::proportion_transmission,
format_prop = FALSE,
simulate = TRUE
)
) %>%
# unnest proportion of cases results
unnest_wider(col = transmission_output) %>%
# move each result to one column
rowwise() %>%
mutate(
percent_cases =
sum(
c_across(cols = starts_with("prop_")),
na.rm = TRUE
)
) %>%
select(-starts_with("prop_"))
#> Using Lloyd-Smith J, Schreiber S, Kopp P, Getz W (2005). "Superspreading and
#> the effect of individual variation on disease emergence." _Nature_.
#> doi:10.1038/nature04153 <https://doi.org/10.1038/nature04153>..
#> To retrieve the short citation use the 'get_citation' function
#> Using Lloyd-Smith J, Schreiber S, Kopp P, Getz W (2005). "Superspreading and
#> the effect of individual variation on disease emergence." _Nature_.
#> doi:10.1038/nature04153 <https://doi.org/10.1038/nature04153>..
#> To retrieve the short citation use the 'get_citation' function
#> Using Lloyd-Smith J, Schreiber S, Kopp P, Getz W (2005). "Superspreading and
#> the effect of individual variation on disease emergence." _Nature_.
#> doi:10.1038/nature04153 <https://doi.org/10.1038/nature04153>..
#> To retrieve the short citation use the 'get_citation' function
#> Using Lloyd-Smith J, Schreiber S, Kopp P, Getz W (2005). "Superspreading and
#> the effect of individual variation on disease emergence." _Nature_.
#> doi:10.1038/nature04153 <https://doi.org/10.1038/nature04153>..
#> To retrieve the short citation use the 'get_citation' function
#> Using Lloyd-Smith J, Schreiber S, Kopp P, Getz W (2005). "Superspreading and
#> the effect of individual variation on disease emergence." _Nature_.
#> doi:10.1038/nature04153 <https://doi.org/10.1038/nature04153>..
#> To retrieve the short citation use the 'get_citation' function
#> Using Lloyd-Smith J, Schreiber S, Kopp P, Getz W (2005). "Superspreading and
#> the effect of individual variation on disease emergence." _Nature_.
#> doi:10.1038/nature04153 <https://doi.org/10.1038/nature04153>..
#> To retrieve the short citation use the 'get_citation' function
# get a position to the ggplot text annotation
across_offspring_tip <- across_offspring %>%
group_by(disease) %>%
filter(percent_transmission < 0.98, percent_transmission > 0.85) %>%
slice_max(percent_transmission) %>%
ungroup() %>%
mutate(disease = case_when(
str_detect(disease, stringr::fixed("Hantavirus")) ~ "Hantavirus",
str_detect(disease, stringr::fixed("Ebola")) ~ "Ebola",
TRUE ~ disease
))
# plot x: proportion of cases, y: proportion of transmission
across_offspring %>%
ggplot() +
geom_line(
aes(
x = percent_cases,
y = percent_transmission,
color = dispersion,
group = disease
)
) +
geom_text(
data = across_offspring_tip,
aes(
x = percent_cases,
y = percent_transmission,
label = disease
),
hjust = 0.0,
vjust = 1.0,
angle = 25,
size = 3
) +
scale_y_continuous(breaks = scales::breaks_pretty(n = 5)) +
colorspace::scale_color_continuous_diverging(trans = "log10") +
labs(
x = "Proportion of infectious cases (ranked)",
y = "Expected proportion of transmission",
color = "Dispersion\nparameter (k)"
) +
# geom_hline(aes(yintercept = 0.8),lty = 3) +
geom_vline(aes(xintercept = 0.2), lty = 2) +
coord_fixed(ratio = 1) Created on 2024-04-30 with reprex v2.1.0 |
Tagging @adamkucharski & @sbfnk to get some more input on this. |
From another look, my interpretation of Fig 1b in Lloyd-Smith et al is that it's the CDF of expected secondary cases. So what we'd obtain if we, say, generated 1000 samples from the distribution of the mean of R0 (I.e. the gamma distribution for nu, rather than the negative binomial which also accounts for poisson randomness), sorted these expected secondary cases, and plotted the normalised cumulative distribution, we'd get that curve? So the above functions are closed to Fig 1C in interpretation? |
My intention with this plot is to use it to facilitate the interpretation of the dispersion parameter using the "20/80 rule" and, if possible, compare different scenarios given k and R values within this plot.
Possibly showing Figure 1C would be more appropriate to the goal, then? library(epiparameter)
#> Warning: package 'epiparameter' was built under R version 4.3.3
library(superspreading)
#> Warning: package 'superspreading' was built under R version 4.3.3
library(tidyverse)
library(tictoc)
# list of diseases with offspring distribution
epidist_string <- epidist_db(
epi_dist = "offspring distribution"
) %>%
list_distributions() %>%
dplyr::select(disease) %>%
distinct() %>%
as_tibble()
#> Returning 10 results that match the criteria (10 are parameterised).
#> Use subset to filter by entry variables or single_epidist to return a single entry.
#> To retrieve the citation for each use the 'get_citation' function
# get percent of cases that cause percent of transmission
across_offspring <- epidist_string %>%
# add column list of epidist objects
mutate(
epidist_out =
map(
.x = disease,
.f = epiparameter::epidist_db,
epi_dist = "offspring distribution",
single_epidist = TRUE
)
) %>%
# get parameters
mutate(
epidist_params =
map(
.x = epidist_out,
.f = epiparameter::get_parameters
)
) %>%
# unnest parameters
unnest_wider(col = epidist_params) %>%
# to each disease, add sequence from 0.01 to 1 (proportion of transmission)
expand_grid(percent_transmission = seq(from = 0.01, to = 1, by = 0.01)) %>%
# estimate proportion of cases responsible of proportion of transmission (row)
mutate(
transmission_output =
pmap(
.l = dplyr::select(., R = mean, k = dispersion, percent_transmission),
.f = superspreading::proportion_transmission,
format_prop = FALSE,
simulate = TRUE # use a numerical simulation
)
) %>%
# unnest proportion of cases results
unnest_wider(col = transmission_output) %>%
# move each result to one column
rowwise() %>%
mutate(
percent_cases =
sum(
c_across(cols = starts_with("prop_")),
na.rm = TRUE
)
) %>%
dplyr::select(-starts_with("prop_")) %>%
ungroup()
#> Using Lloyd-Smith J, Schreiber S, Kopp P, Getz W (2005). "Superspreading and
#> the effect of individual variation on disease emergence." _Nature_.
#> doi:10.1038/nature04153 <https://doi.org/10.1038/nature04153>..
#> To retrieve the citation use the 'get_citation' function
#> Using Lloyd-Smith J, Schreiber S, Kopp P, Getz W (2005). "Superspreading and
#> the effect of individual variation on disease emergence." _Nature_.
#> doi:10.1038/nature04153 <https://doi.org/10.1038/nature04153>..
#> To retrieve the citation use the 'get_citation' function
#> Using Lloyd-Smith J, Schreiber S, Kopp P, Getz W (2005). "Superspreading and
#> the effect of individual variation on disease emergence." _Nature_.
#> doi:10.1038/nature04153 <https://doi.org/10.1038/nature04153>..
#> To retrieve the citation use the 'get_citation' function
#> Using Lloyd-Smith J, Schreiber S, Kopp P, Getz W (2005). "Superspreading and
#> the effect of individual variation on disease emergence." _Nature_.
#> doi:10.1038/nature04153 <https://doi.org/10.1038/nature04153>..
#> To retrieve the citation use the 'get_citation' function
#> Using Lloyd-Smith J, Schreiber S, Kopp P, Getz W (2005). "Superspreading and
#> the effect of individual variation on disease emergence." _Nature_.
#> doi:10.1038/nature04153 <https://doi.org/10.1038/nature04153>..
#> To retrieve the citation use the 'get_citation' function
#> Using Lloyd-Smith J, Schreiber S, Kopp P, Getz W (2005). "Superspreading and
#> the effect of individual variation on disease emergence." _Nature_.
#> doi:10.1038/nature04153 <https://doi.org/10.1038/nature04153>..
#> To retrieve the citation use the 'get_citation' function
# get a position to the ggplot text annotation
across_offspring_tip <- across_offspring %>%
group_by(disease) %>%
filter(percent_transmission < 0.98, percent_transmission > 0.85) %>%
slice_max(percent_transmission) %>%
ungroup() %>%
mutate(disease = case_when(
str_detect(disease, stringr::fixed("Hantavirus")) ~ "Hantavirus",
str_detect(disease, stringr::fixed("Ebola")) ~ "Ebola",
TRUE ~ disease
))
# figure 1c ---------------------------------------------------------------
fixed_percent_transmission <- across_offspring %>%
group_by(disease) %>%
filter(percent_cases < 0.21, percent_cases > 0.19) %>%
slice_max(percent_cases) %>%
ungroup()
tic()
output <- expand_grid(
mean = seq(0.7,1.7,0.1),
dispersion = seq(0.01,10,0.1)
) %>%
expand_grid(percent_transmission = seq(from = 0.55, to = 1, by = 0.01)) %>%
# estimate proportion of cases responsible of proportion of transmission (row)
mutate(
transmission_output =
pmap(
.l = dplyr::select(., R = mean, k = dispersion, percent_transmission),
.f = superspreading::proportion_transmission,
format_prop = FALSE,
simulate = FALSE # use a numerical simulation
)
) %>%
# unnest proportion of cases results
unnest_wider(col = transmission_output) %>%
# move each result to one column
rowwise() %>%
mutate(
percent_cases =
sum(
c_across(cols = starts_with("prop_")),
na.rm = TRUE
)
) %>%
dplyr::select(-starts_with("prop_")) %>%
ungroup()
toc()
#> 235.39 sec elapsed
fixed_reproduction <- output %>%
group_by(mean,dispersion) %>%
filter(percent_cases < 0.21, percent_cases > 0.19) %>%
slice_max(percent_cases) %>%
ungroup()
# fixed_reproduction %>%
# print(n=Inf)
ggplot() +
geom_line(
data = fixed_reproduction,
mapping = aes(x = k, y = percent_transmission,
group = mean, color = mean)
) +
geom_point(
data = fixed_percent_transmission,
mapping = aes(x = k, y = percent_transmission)
) +
geom_hline(aes(yintercept = 0.8), linetype = 2) +
geom_hline(aes(yintercept = 0.2), linetype = 2) +
scale_x_log10() +
geom_text(
data = across_offspring_tip,
aes(
x = k,
y = 0.25,
label = disease
),
hjust = 0.0,
vjust = 1.0,
angle = 90,
size = 3
) +
ylim(0,1) +
colorspace::scale_color_continuous_diverging(trans = "log10", rev = TRUE) +
coord_fixed(ratio = 1) +
labs(
x = "Dispersion parameter (k)",
y = "Proportion of transmission due to\nmost infectious ~20% of cases",
color = "Reproduction\nnumber (R)"
) Created on 2024-05-01 with reprex v2.1.0 |
This makes more sense. I'm unable to replicate exactly the plot of Fig 1b using the parameters extracted from the paper, but the pattern is broadly similar.
The plot of Fig 1c looks different to that pasted above. I'm trying to work out where the values of k come from, as the values plotted in Fig 1c in the paper do not match the estimates in Supplementary table 1 from the paper. As an example Ebola, SARS and smallpox all have an estimated value of > 0.1, but the plotted values < 0.1. This might explain why the reproduction of the plot differs. |
Hi all, so this issue sounds like what I had recently discussed with @joshwlambert when I came to visit: that I think this issue is complicated but essentially comes down to what each function is trying to do. For I have a working function See the following code here:
From the parameters in the comments before, we can calculate and compare p80 and t20.
We can also reproduce figures like from Lloyd-Smith et al Fig.1b and 1c.
Overall, I think both are interesting and useful ways of looking at relative transmission heterogeneity, but it is important to be clear that they are distinct and not interchangeable, meaning t20 = 80% is not the same as p80 = 20% given the same R and k. For In contrast, for The code still needs some work to make it suitable for deployment (error handling, etc.). Let's discuss this here, and let me know if I should submit a pull request. |
I didn't include the code to extract the params from |
@dcadam thanks so much for this explanation and for sharing the code to show the differences in the calculations. If you're happy to, please could you open a PR from your fork in order to add this new functionality to package? It would be a great addition to the package. We can discuss minor implementation details and documentation on the PR. |
In epiverse-trace/tutorials-middle#29 I'm adding a replacement of Figure 1b from Lloyd-Smith et al 2005 using
epiparameter
andsuperspreading
In the reprex below, I was expecting to have the asymptotic approximation in the upper right corner, as in Figure 1b, (expected proportion of transmission = 1.0, and proportion of infectious cases = 1.0). Should this be an expected behaviour? or Is there any other issue you may find in this intent of replication?
Created on 2024-04-30 with reprex v2.1.0
The text was updated successfully, but these errors were encountered: