Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replicate of the expected proportion of all transmission due to a given proportion of infectious cases using {epiparameter} and {superspreading} #93

Open
avallecam opened this issue Apr 30, 2024 · 8 comments
Assignees
Milestone

Comments

@avallecam
Copy link
Member

In epiverse-trace/tutorials-middle#29 I'm adding a replacement of Figure 1b from Lloyd-Smith et al 2005 using epiparameter and superspreading

In the reprex below, I was expecting to have the asymptotic approximation in the upper right corner, as in Figure 1b, (expected proportion of transmission = 1.0, and proportion of infectious cases = 1.0). Should this be an expected behaviour? or Is there any other issue you may find in this intent of replication?

library(epiparameter)
#> Warning: package 'epiparameter' was built under R version 4.3.3
library(superspreading)
#> Warning: package 'superspreading' was built under R version 4.3.3
library(tidyverse)

# list of diseases with offspring distribution
epidist_string <- epidist_db(
  epi_dist = "offspring distribution"
) %>%
  list_distributions() %>%
  dplyr::select(disease) %>%
  distinct() %>% 
  as_tibble()
#> Returning 10 results that match the criteria (10 are parameterised). 
#> Use subset to filter by entry variables or single_epidist to return a single entry. 
#> To retrieve the citation for each use the 'get_citation' function

# get percent of cases that cause percent of transmission
across_offspring <- epidist_string %>%
  # add column list of epidist objects
  mutate(
    epidist_out =
      map(
        .x = disease,
        .f = epiparameter::epidist_db,
        epi_dist = "offspring distribution",
        single_epidist = TRUE
      )
  ) %>%
  # get parameters
  mutate(
    epidist_params =
      map(
        .x = epidist_out,
        .f = epiparameter::get_parameters
      )
  ) %>%
  # unnest parameters
  unnest_wider(col = epidist_params) %>%
  # to each disease, add sequence from 0.01 to 1 (proportion of transmission)
  expand_grid(percent_transmission = seq(from = 0.01, to = 1, by = 0.01)) %>%
  # estimate proportion of cases responsible of proportion of transmission (row)
  mutate(
    transmission_output =
      pmap(
        .l = select(., R = mean, k = dispersion, percent_transmission),
        .f = superspreading::proportion_transmission,
        format_prop = FALSE
      )
  ) %>%
  # unnest proportion of cases results
  unnest_wider(col = transmission_output) %>%
  # move each result to one column
  rowwise() %>%
  mutate(
    percent_cases =
      sum(
        c_across(cols = starts_with("prop_")),
        na.rm = TRUE
      )
  ) %>%
  select(-starts_with("prop_"))
#> Using Lloyd-Smith J, Schreiber S, Kopp P, Getz W (2005). "Superspreading and
#> the effect of individual variation on disease emergence." _Nature_.
#> doi:10.1038/nature04153 <https://doi.org/10.1038/nature04153>.. 
#> To retrieve the citation use the 'get_citation' function
#> Using Lloyd-Smith J, Schreiber S, Kopp P, Getz W (2005). "Superspreading and
#> the effect of individual variation on disease emergence." _Nature_.
#> doi:10.1038/nature04153 <https://doi.org/10.1038/nature04153>.. 
#> To retrieve the citation use the 'get_citation' function
#> Using Lloyd-Smith J, Schreiber S, Kopp P, Getz W (2005). "Superspreading and
#> the effect of individual variation on disease emergence." _Nature_.
#> doi:10.1038/nature04153 <https://doi.org/10.1038/nature04153>.. 
#> To retrieve the citation use the 'get_citation' function
#> Using Lloyd-Smith J, Schreiber S, Kopp P, Getz W (2005). "Superspreading and
#> the effect of individual variation on disease emergence." _Nature_.
#> doi:10.1038/nature04153 <https://doi.org/10.1038/nature04153>.. 
#> To retrieve the citation use the 'get_citation' function
#> Using Lloyd-Smith J, Schreiber S, Kopp P, Getz W (2005). "Superspreading and
#> the effect of individual variation on disease emergence." _Nature_.
#> doi:10.1038/nature04153 <https://doi.org/10.1038/nature04153>.. 
#> To retrieve the citation use the 'get_citation' function
#> Using Lloyd-Smith J, Schreiber S, Kopp P, Getz W (2005). "Superspreading and
#> the effect of individual variation on disease emergence." _Nature_.
#> doi:10.1038/nature04153 <https://doi.org/10.1038/nature04153>.. 
#> To retrieve the citation use the 'get_citation' function

# get a position to the ggplot text annotation
across_offspring_tip <- across_offspring %>%
  group_by(disease) %>%
  filter(percent_transmission < 0.98, percent_transmission > 0.85) %>%
  slice_max(percent_transmission) %>%
  ungroup() %>%
  mutate(disease = case_when(
    str_detect(disease, stringr::fixed("Hantavirus")) ~ "Hantavirus",
    str_detect(disease, stringr::fixed("Ebola")) ~ "Ebola",
    TRUE ~ disease
  ))

# plot x: proportion of cases, y: proportion of transmission
across_offspring %>%
  ggplot() +
  geom_line(
    aes(
      x = percent_cases,
      y = percent_transmission,
      color = dispersion,
      group = disease
    )
  ) +
  geom_text(
    data = across_offspring_tip,
    aes(
      x = percent_cases,
      y = percent_transmission,
      label = disease
    ),
    hjust = 0.0,
    vjust = 1.0,
    angle = 25,
    size = 3
  ) +
  scale_y_continuous(breaks = scales::breaks_pretty(n = 5)) +
  colorspace::scale_color_continuous_diverging(trans = "log10") +
  labs(
    x = "Proportion of infectious cases (ranked)",
    y = "Expected proportion of transmission",
    color = "Dispersion\nparameter (k)"
  ) +
  # geom_hline(aes(yintercept = 0.8),lty = 3) +
  geom_vline(aes(xintercept = 0.2), lty = 2) +
  coord_fixed(ratio = 1)

Created on 2024-04-30 with reprex v2.1.0

@joshwlambert joshwlambert self-assigned this Apr 30, 2024
@joshwlambert
Copy link
Member

Thanks for posting @avallecam, I've been giving this a bit of thought and to be honest I'm not 100% on the answer.

Initially I thought figure 1b from Lloyd-Smith et al. (2005) could be replicated with proportion_transmission(). However, now I'm not sure what is being plotted is identical to the calculation in {superspreading}.

Two main points:

  1. Theoretically, as the value of "Expected proportion of transmission" approaches 1, should the "Proportion of infectious cases (ranked)" approach 1? The above plots makes sense if we presume that most of the cases of an outbreak are caused by a subset of infected cases (k < 1), and many cases do not infect anyone, then asking the question what proportion of cases caused 100% of the outbreak (i.e. expected proportion transmission = 1), to me the answer could be ~30% (~0.3), as is shown for SARS. This assumes that many cases do not infect anyone.

If this reasoning is correct it raises the question why Fig 1b in Lloyd-Smith et al. (2005) asymptotes, and whether they are calculating the same thing. (The x-axis is confusing to me, not exactly sure what is meant by "Proportion of infected cases (ranked)".)

  1. The plot produced by proportion_transmission() becomes more like Lloyd-Smith et al. (2005) when simulate = TRUE.
library(epiparameter)
library(superspreading)
library(tidyverse)

# list of diseases with offspring distribution
epidist_string <- epidist_db(
  epi_dist = "offspring distribution"
) %>%
  list_distributions() %>%
  dplyr::select(disease) %>%
  distinct() %>% 
  as_tibble()
#> Returning 10 results that match the criteria (10 are parameterised). 
#> Use subset to filter by entry variables or single_epidist to return a single entry. 
#> To retrieve the short citation for each use the 'get_citation' function

# get percent of cases that cause percent of transmission
across_offspring <- epidist_string %>%
  # add column list of epidist objects
  mutate(
    epidist_out =
      map(
        .x = disease,
        .f = epiparameter::epidist_db,
        epi_dist = "offspring distribution",
        single_epidist = TRUE
      )
  ) %>%
  # get parameters
  mutate(
    epidist_params =
      map(
        .x = epidist_out,
        .f = epiparameter::get_parameters
      )
  ) %>%
  # unnest parameters
  unnest_wider(col = epidist_params) %>%
  # to each disease, add sequence from 0.01 to 1 (proportion of transmission)
  expand_grid(percent_transmission = seq(from = 0.01, to = 1, by = 0.01)) %>%
  # estimate proportion of cases responsible of proportion of transmission (row)
  mutate(
    transmission_output =
      pmap(
        .l = select(., R = mean, k = dispersion, percent_transmission),
        .f = superspreading::proportion_transmission,
        format_prop = FALSE, 
        simulate = TRUE
      )
  ) %>%
  # unnest proportion of cases results
  unnest_wider(col = transmission_output) %>%
  # move each result to one column
  rowwise() %>%
  mutate(
    percent_cases =
      sum(
        c_across(cols = starts_with("prop_")),
        na.rm = TRUE
      )
  ) %>%
  select(-starts_with("prop_"))
#> Using Lloyd-Smith J, Schreiber S, Kopp P, Getz W (2005). "Superspreading and
#> the effect of individual variation on disease emergence." _Nature_.
#> doi:10.1038/nature04153 <https://doi.org/10.1038/nature04153>.. 
#> To retrieve the short citation use the 'get_citation' function
#> Using Lloyd-Smith J, Schreiber S, Kopp P, Getz W (2005). "Superspreading and
#> the effect of individual variation on disease emergence." _Nature_.
#> doi:10.1038/nature04153 <https://doi.org/10.1038/nature04153>.. 
#> To retrieve the short citation use the 'get_citation' function
#> Using Lloyd-Smith J, Schreiber S, Kopp P, Getz W (2005). "Superspreading and
#> the effect of individual variation on disease emergence." _Nature_.
#> doi:10.1038/nature04153 <https://doi.org/10.1038/nature04153>.. 
#> To retrieve the short citation use the 'get_citation' function
#> Using Lloyd-Smith J, Schreiber S, Kopp P, Getz W (2005). "Superspreading and
#> the effect of individual variation on disease emergence." _Nature_.
#> doi:10.1038/nature04153 <https://doi.org/10.1038/nature04153>.. 
#> To retrieve the short citation use the 'get_citation' function
#> Using Lloyd-Smith J, Schreiber S, Kopp P, Getz W (2005). "Superspreading and
#> the effect of individual variation on disease emergence." _Nature_.
#> doi:10.1038/nature04153 <https://doi.org/10.1038/nature04153>.. 
#> To retrieve the short citation use the 'get_citation' function
#> Using Lloyd-Smith J, Schreiber S, Kopp P, Getz W (2005). "Superspreading and
#> the effect of individual variation on disease emergence." _Nature_.
#> doi:10.1038/nature04153 <https://doi.org/10.1038/nature04153>.. 
#> To retrieve the short citation use the 'get_citation' function

# get a position to the ggplot text annotation
across_offspring_tip <- across_offspring %>%
  group_by(disease) %>%
  filter(percent_transmission < 0.98, percent_transmission > 0.85) %>%
  slice_max(percent_transmission) %>%
  ungroup() %>%
  mutate(disease = case_when(
    str_detect(disease, stringr::fixed("Hantavirus")) ~ "Hantavirus",
    str_detect(disease, stringr::fixed("Ebola")) ~ "Ebola",
    TRUE ~ disease
  ))

# plot x: proportion of cases, y: proportion of transmission
across_offspring %>%
  ggplot() +
  geom_line(
    aes(
      x = percent_cases,
      y = percent_transmission,
      color = dispersion,
      group = disease
    )
  ) +
  geom_text(
    data = across_offspring_tip,
    aes(
      x = percent_cases,
      y = percent_transmission,
      label = disease
    ),
    hjust = 0.0,
    vjust = 1.0,
    angle = 25,
    size = 3
  ) +
  scale_y_continuous(breaks = scales::breaks_pretty(n = 5)) +
  colorspace::scale_color_continuous_diverging(trans = "log10") +
  labs(
    x = "Proportion of infectious cases (ranked)",
    y = "Expected proportion of transmission",
    color = "Dispersion\nparameter (k)"
  ) +
  # geom_hline(aes(yintercept = 0.8),lty = 3) +
  geom_vline(aes(xintercept = 0.2), lty = 2) +
  coord_fixed(ratio = 1)

Created on 2024-04-30 with reprex v2.1.0

@joshwlambert
Copy link
Member

Tagging @adamkucharski & @sbfnk to get some more input on this.

@adamkucharski
Copy link
Member

From another look, my interpretation of Fig 1b in Lloyd-Smith et al is that it's the CDF of expected secondary cases. So what we'd obtain if we, say, generated 1000 samples from the distribution of the mean of R0 (I.e. the gamma distribution for nu, rather than the negative binomial which also accounts for poisson randomness), sorted these expected secondary cases, and plotted the normalised cumulative distribution, we'd get that curve?

So the above functions are closed to Fig 1C in interpretation?

@avallecam
Copy link
Member Author

avallecam commented May 1, 2024

My intention with this plot is to use it to facilitate the interpretation of the dispersion parameter using the "20/80 rule" and, if possible, compare different scenarios given k and R values within this plot.

So the above functions are closed to Fig 1C in interpretation?

Possibly showing Figure 1C would be more appropriate to the goal, then?

library(epiparameter)
#> Warning: package 'epiparameter' was built under R version 4.3.3
library(superspreading)
#> Warning: package 'superspreading' was built under R version 4.3.3
library(tidyverse)
library(tictoc)

# list of diseases with offspring distribution
epidist_string <- epidist_db(
  epi_dist = "offspring distribution"
) %>%
  list_distributions() %>%
  dplyr::select(disease) %>%
  distinct() %>% 
  as_tibble()
#> Returning 10 results that match the criteria (10 are parameterised). 
#> Use subset to filter by entry variables or single_epidist to return a single entry. 
#> To retrieve the citation for each use the 'get_citation' function

# get percent of cases that cause percent of transmission
across_offspring <- epidist_string %>%
  # add column list of epidist objects
  mutate(
    epidist_out =
      map(
        .x = disease,
        .f = epiparameter::epidist_db,
        epi_dist = "offspring distribution",
        single_epidist = TRUE
      )
  ) %>%
  # get parameters
  mutate(
    epidist_params =
      map(
        .x = epidist_out,
        .f = epiparameter::get_parameters
      )
  ) %>%
  # unnest parameters
  unnest_wider(col = epidist_params) %>%
  # to each disease, add sequence from 0.01 to 1 (proportion of transmission)
  expand_grid(percent_transmission = seq(from = 0.01, to = 1, by = 0.01)) %>%
  # estimate proportion of cases responsible of proportion of transmission (row)
  mutate(
    transmission_output =
      pmap(
        .l = dplyr::select(., R = mean, k = dispersion, percent_transmission),
        .f = superspreading::proportion_transmission,
        format_prop = FALSE, 
        simulate = TRUE # use a numerical simulation
      )
  ) %>%
  # unnest proportion of cases results
  unnest_wider(col = transmission_output) %>%
  # move each result to one column
  rowwise() %>%
  mutate(
    percent_cases =
      sum(
        c_across(cols = starts_with("prop_")),
        na.rm = TRUE
      )
  ) %>%
  dplyr::select(-starts_with("prop_")) %>% 
  ungroup()
#> Using Lloyd-Smith J, Schreiber S, Kopp P, Getz W (2005). "Superspreading and
#> the effect of individual variation on disease emergence." _Nature_.
#> doi:10.1038/nature04153 <https://doi.org/10.1038/nature04153>.. 
#> To retrieve the citation use the 'get_citation' function
#> Using Lloyd-Smith J, Schreiber S, Kopp P, Getz W (2005). "Superspreading and
#> the effect of individual variation on disease emergence." _Nature_.
#> doi:10.1038/nature04153 <https://doi.org/10.1038/nature04153>.. 
#> To retrieve the citation use the 'get_citation' function
#> Using Lloyd-Smith J, Schreiber S, Kopp P, Getz W (2005). "Superspreading and
#> the effect of individual variation on disease emergence." _Nature_.
#> doi:10.1038/nature04153 <https://doi.org/10.1038/nature04153>.. 
#> To retrieve the citation use the 'get_citation' function
#> Using Lloyd-Smith J, Schreiber S, Kopp P, Getz W (2005). "Superspreading and
#> the effect of individual variation on disease emergence." _Nature_.
#> doi:10.1038/nature04153 <https://doi.org/10.1038/nature04153>.. 
#> To retrieve the citation use the 'get_citation' function
#> Using Lloyd-Smith J, Schreiber S, Kopp P, Getz W (2005). "Superspreading and
#> the effect of individual variation on disease emergence." _Nature_.
#> doi:10.1038/nature04153 <https://doi.org/10.1038/nature04153>.. 
#> To retrieve the citation use the 'get_citation' function
#> Using Lloyd-Smith J, Schreiber S, Kopp P, Getz W (2005). "Superspreading and
#> the effect of individual variation on disease emergence." _Nature_.
#> doi:10.1038/nature04153 <https://doi.org/10.1038/nature04153>.. 
#> To retrieve the citation use the 'get_citation' function

# get a position to the ggplot text annotation
across_offspring_tip <- across_offspring %>%
  group_by(disease) %>%
  filter(percent_transmission < 0.98, percent_transmission > 0.85) %>%
  slice_max(percent_transmission) %>%
  ungroup() %>%
  mutate(disease = case_when(
    str_detect(disease, stringr::fixed("Hantavirus")) ~ "Hantavirus",
    str_detect(disease, stringr::fixed("Ebola")) ~ "Ebola",
    TRUE ~ disease
  ))


# figure 1c ---------------------------------------------------------------

fixed_percent_transmission <- across_offspring %>% 
  group_by(disease) %>% 
  filter(percent_cases < 0.21, percent_cases > 0.19) %>% 
  slice_max(percent_cases) %>% 
  ungroup() 

tic()
output <- expand_grid(
  mean = seq(0.7,1.7,0.1),
  dispersion = seq(0.01,10,0.1)
) %>% 
  expand_grid(percent_transmission = seq(from = 0.55, to = 1, by = 0.01)) %>%
  # estimate proportion of cases responsible of proportion of transmission (row)
  mutate(
    transmission_output =
      pmap(
        .l = dplyr::select(., R = mean, k = dispersion, percent_transmission),
        .f = superspreading::proportion_transmission,
        format_prop = FALSE, 
        simulate = FALSE # use a numerical simulation
      )
  ) %>%
  # unnest proportion of cases results
  unnest_wider(col = transmission_output) %>%
  # move each result to one column
  rowwise() %>%
  mutate(
    percent_cases =
      sum(
        c_across(cols = starts_with("prop_")),
        na.rm = TRUE
      )
  ) %>%
  dplyr::select(-starts_with("prop_")) %>% 
  ungroup()
toc()
#> 235.39 sec elapsed

fixed_reproduction <- output %>% 
  group_by(mean,dispersion) %>% 
  filter(percent_cases < 0.21, percent_cases > 0.19) %>% 
  slice_max(percent_cases) %>% 
  ungroup()

# fixed_reproduction %>%
#   print(n=Inf)

ggplot() +
  geom_line(
    data = fixed_reproduction,
    mapping = aes(x = k, y = percent_transmission, 
                  group = mean, color = mean)
  ) +
  geom_point(
    data = fixed_percent_transmission,
    mapping = aes(x = k, y = percent_transmission)
  ) +
  geom_hline(aes(yintercept = 0.8), linetype = 2) +
  geom_hline(aes(yintercept = 0.2), linetype = 2) +
  scale_x_log10() +
  geom_text(
    data = across_offspring_tip,
    aes(
      x = k,
      y = 0.25,
      label = disease
    ),
    hjust = 0.0,
    vjust = 1.0,
    angle = 90,
    size = 3
  ) +
  ylim(0,1) +
  colorspace::scale_color_continuous_diverging(trans = "log10", rev = TRUE) +
  coord_fixed(ratio = 1) +
  labs(
    x = "Dispersion parameter (k)",
    y = "Proportion of transmission due to\nmost infectious ~20% of cases",
    color = "Reproduction\nnumber (R)"
  )

Created on 2024-05-01 with reprex v2.1.0

@joshwlambert
Copy link
Member

From another look, my interpretation of Fig 1b in Lloyd-Smith et al is that it's the CDF of expected secondary cases.

This makes more sense. I'm unable to replicate exactly the plot of Fig 1b using the parameters extracted from the paper, but the pattern is broadly similar.

Possibly showing Figure 1C would be more appropriate to the goal, then?

The plot of Fig 1c looks different to that pasted above. I'm trying to work out where the values of k come from, as the values plotted in Fig 1c in the paper do not match the estimates in Supplementary table 1 from the paper. As an example Ebola, SARS and smallpox all have an estimated value of > 0.1, but the plotted values < 0.1. This might explain why the reproduction of the plot differs.

@dcadam
Copy link

dcadam commented Jul 25, 2024

Hi all, so this issue sounds like what I had recently discussed with @joshwlambert when I came to visit: that proportion_transmission() doesn't follow the expected behaviour e.g. when the expected proportion of transmission = 1.0, we expect the proportion of infectious cases = 1.0.

I think this issue is complicated but essentially comes down to what each function is trying to do. For proportion_transmission(), the calculation appears to be based on the negative binomial and distribution of discrete offspring counts, which calculates p80 when percentage_transmission = 0.8. In contrast, the Lloyd-Smith et al. calculation is for t20 and uses a continuous gamma distribution of the individual reproductive number, v, which is not the same thing as R.

I have a working function proportion_transmission2() and other helper functions on a forked branch new-formula-for-proposed transmission, which roughly translates the formula described by Lloyd-Smith et al in section 2.2.5 of the supplementary notes to calculate t20, or the expected proportion of transmission originating from the upper 20% of most infectious cases when proportion_transmission2(R, k, percentage_infectious = 0.2)

See the following code here:

### Defines the gamma density function in terms of the mean reproductive number (R) and the dispersion paramter (k)
dgammaRk <- function(x, R, k) {
  dgamma(x, shape = k, scale = R / k)
}

### Defines the gamma distribution function in terms of the mean reproductive number (R) and the dispersion paramter (k)
pgammaRk <- function(x, R, k) {
  pgamma(x, shape = k, scale = R / k)
}

### Gamma probability densitiy function (pdf) describing the individual reproductive number ν given R0 and k
fvx <- function(x, R, k) {
  dgammaRk(x = x, R = R, k = k)
}

### Unitroot solver for the solution to u, or the value of x from the gamma CDF given the desired proportion of transmission.
solve_for_u <- function(percent_infectious, R, k) {
  f <- function(x, percent_infectious) {
    res <- 1 - pgammaRk(x, R, k)
    res - percent_infectious
  }

  # Initial interval for u
  lower <- 0
  upper <- 1

  # Find the root using uniroot
  root <- uniroot(f, percent_infectious = percent_infectious, interval = c(lower, upper), extendInt = "yes")

  root$root
}

### Function: Calculates the expected proportion of transmission from the upper 'prop'% of most infectious cases
proportion_transmission2 <- function(R, k, percent_infectious) {
  u <- solve_for_u(percent_infectious = percent_infectious, R = R, k = k)

  integral_result <- integrate(function(u) u * fvx(u, R, k), lower = 0, upper = u)

  res <- integral_result$value / R

  1 - res
}


### Test: Given R0 = 3 and k = 1, the upper 20% most infectious individuals are expected to produce 52% of all transmissions
proportion_transmission2(R = 3, k = 1, percent_infectious = 0.2)

### Test: Given R0 = 2 and k = Inf, the upper 20% most infectious individuals are expected to produce 20% of all transmissions i.e. no heterogeneity.
proportion_transmission2(R = 2, k = 10e5, percent_infectious = 0.2)

From the parameters in the comments before, we can calculate and compare p80 and t20.

disease mean dispersion t20 p80
SARS 1.63 0.16 88.3% 13%
Mpox 0.32 0.58 62% 16.1%
Smallpox 1.49 0.72 57.9% 29.7%
Pneumonic Plague 1.32 1.37 47.4% 33.9%
Hantavirus Pulmonary Syndrome 0.70 1.66 44.8% 30.2%
Ebola Virus Disease 1.50 5.10 33.6% 43.2%

We can also reproduce figures like from Lloyd-Smith et al Fig.1b and 1c.

library(tidyverse)
## REQUIRES PREVIOUS FUNCTIONS

### Generates a log scaled sequence of real numbers
lseq <- function(from, to, length.out) {
  exp(seq(log(from), log(to), length.out = length.out))
}

#### Function to solve and plot Fig.1b numerically given R and k

get_infectious_curve <- function(R, k) {
  upper_u <- solve_for_u(percent_infectious = 0, R = R, k = k) ## upper limit of x when y = 0

  upper_u <- round(upper_u)

  x_seq <- lseq(from = 1e-5, to = upper_u, length.out = 500)

  res <- vector(mode = "list", length = length(x_seq))
  for (i in 1:length(x_seq)) {
    u <- x_seq

    fvx <- function(x, R, k) {
      dgammaRk(x = x, R = R, k = k)
    }

    integral_result <- integrate(function(u) u * fvx(x = u, R, k), lower = 0, upper = x_seq[[i]])
    res[[i]] <- integral_result$value / R
  }

  expected_v_more_than_x <- 1 - unlist(res)
  proportion_more_than_x <- 1 - pgammaRk(x_seq, R = R, k = k)

  data.table(
    exp_t = expected_v_more_than_x,
    prop_i = proportion_more_than_x,
    x_seq = x_seq
  )
}

### Plot Fig.1b
map(t20_table |>
  group_split(disease), function(x) {
  get_infectious_curve(R = x$mean, k = x$dispersion) |>
    mutate(
      disease = x$disease,
      R = x$mean, k = x$dispersion
    )
}) |>
  list_rbind() |>
  ggplot(aes(x = prop_i, y = exp_t, colour = disease)) +
  geom_line() +
  geom_abline(slope = 1) +
  geom_vline(xintercept = 0.2, linetype = 2, alpha = 0.2) + # the yintercept when x = 0.2 is the value of t_20
  scale_x_continuous(breaks = seq(0, 1, by = 0.2)) +
  theme_classic() +
  theme(
    aspect.ratio = 1,
    legend.position = "right"
  ) +
  annotate(
    geom = "text",
    angle = 45,
    size = 2.5,
    x = 0.5,
    y = 0.5,
    vjust = 1.5,
    label = "Homogeneous population"
  ) +
  labs(
    x = "Proportion of infectious cases (ranked)",
    y = "Expected proportion of transmission",
    colour = ""
  ) +
  coord_cartesian(expand = FALSE)

### Define range for k for Fig.1c
k_seq <- lseq(from = 0.01, to = 100, length.out = 1000)
y <- map_dbl(k_seq, function(x) proportion_transmission2(R = 2, k = x, percent_infectious = 0.2))

###  Plot Fig.1c curve and points of t20 for sample of diseases
tibble(k_seq, y) |>
  ggplot() +
  geom_line(mapping = aes(x = k_seq, y = y)) +
  geom_point(
    data = t20_table,
    mapping = aes(
      x = dispersion,
      y = t20,
      fill = disease,
    ),
    shape = 21,
    size = 3,
    alpha = 0.9
  ) +
  geom_hline(yintercept = c(0.2, 0.8), linetype = 2, alpha = 0.2) +
  theme_classic() +
  theme(
    aspect.ratio = 1,

  ) +
  scale_y_continuous(
    name = c("Proportion of transmission expected from\nthe most infectious 20% of cases"),
    limits = c(0, 1),
    breaks = seq(0, 1, by = 0.2)
  ) +
  scale_x_log10(name = "Dispersion parameter", expand = c(0, 0)) +
  labs(fill = "")

t20

Overall, I think both are interesting and useful ways of looking at relative transmission heterogeneity, but it is important to be clear that they are distinct and not interchangeable, meaning t20 = 80% is not the same as p80 = 20% given the same R and k. For proportion_transmission() changes in R can have a significant effect on the estimate of p80 even when k is constant, and this formulation DOES NOT allow for true homogeneity when k → ∞, that is, p80 ≠ 80%.

In contrast, for proportion_transmission2(), changes in R have no effect on the estimate of t20 when k is constant, but R is still required for the underlying calculation. This formulation DOES allow for true homogeneity when k → ∞, that is, t20 = 20%, or t80 = 80%.

The code still needs some work to make it suitable for deployment (error handling, etc.). Let's discuss this here, and let me know if I should submit a pull request.

@dcadam
Copy link

dcadam commented Jul 26, 2024

I didn't include the code to extract the params from {epiparamter} but I've added it here so you can replicate the figures above.

@joshwlambert
Copy link
Member

@dcadam thanks so much for this explanation and for sharing the code to show the differences in the calculations.

If you're happy to, please could you open a PR from your fork in order to add this new functionality to package? It would be a great addition to the package. We can discuss minor implementation details and documentation on the PR.

@joshwlambert joshwlambert added this to the v0.3.0 milestone Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants