Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Truncation adjustment does not work as expected if truncation vector is too long #735

Closed
kgostic opened this issue Aug 6, 2024 · 5 comments · Fixed by #736
Closed

Truncation adjustment does not work as expected if truncation vector is too long #735

kgostic opened this issue Aug 6, 2024 · 5 comments · Fixed by #736
Assignees
Labels
bug Something isn't working high priority

Comments

@kgostic
Copy link

kgostic commented Aug 6, 2024

Summary:
I am passing a truncation adjustment using a vector, truncation_pmf with trunc_opts(dist = NonParametric(truncation_pmf)). If the truncation_pmf vector is longer than the data timeseries, the truncation adjustment does not work as expected.

Description:
I am working with a dataset in which it takes several months for reporting to be complete. However, in order to keep our runtimes reasonable, we don't usually input more than 8 weeks of data at a time into EpiNow2. This leaves us in a situation where the maximum reporting delay (70d, chosen to capture 99% of the delay probability mass) is greater than the length of our data timeseries (56d).

Without thinking hard enough about potential dimension mismatch issues, we had been trying to pass in the full delay pmf (which has length 71) into EpiNow2 using trunc_opts(dist = NonParametric(truncation_pmf)). The delay pmf was longer than the data timeseries, but EpiNow2 ran without any errors and or convergence issues.

However, we later realized that the right truncation adjustment wasn't working as expected. There was little if any meaningful difference between the corrected imputed_obs and uncorrected obs_reports estimates, even though we were expecting a correction of about 50% on the most recent date. At first we had no idea why this was. We assumed we had passed in the wrong truncation pmf at first, but saw something that mostly resembled our inputs in the fit$estimates$args metadata, and became even more confused.

We ultimately realized there might be a dimension mismatch issue, and were able to fix the problem by shortening the delay pmf.

This was a silent error that took us a few days to figure out. It would be really great if it's possible to add some kind of validator or error message to check the truncation pmf input if there are dimension limits. Even better would be to ensure that if too long of a pmf is passed in, the right hand side is trimmed but the left hand side (where most of the delay mass resides) is left intact.

Reproducible Steps:

library(EpiNow2)
library(dplyr)
library(ggplot2)
#packageVersion("EpiNow2")
#>[1] ‘1.5.2’


## Define inputs ---------------------------------------------------------------
data <- tibble(
  date = as.Date(10000:10056),
  confirm = c(893, 814, 799, 764, 703, 812, 786, 746, 684, 712, 619, 642, 660,
            749, 748, 650, 653, 664, 623, 672, 694, 584, 582, 606, 590, 594,
            638, 581, 624, 574, 568, 544, 518, 593, 565, 581, 556, 495, 530,
            525, 558, 540, 557, 520, 488, 493, 431, 463, 527, 492, 489, 447, 
            452, 447, 464, 440, 319)
)

gi_vector <- c(0.161984261893956, 0.320610531574905, 0.242546840009363, 
               0.135245732244931, 0.0689085003071691, 0.0344192689678171, 
               0.0175329332238398, 0.00900841141261684, 0.00483264577325465, 
               0.00262731891490177, 0.00146094335487509, 0.000822612322369711)

delay_vector <- c(0, 0.00469379166470247, 0.0145198502692467, 
                  0.0278625015194289, 0.0423652784523184, 0.0558067146246651, 
                  0.0665708755644758, 0.0737921726148393, 0.0772851235270756, 
                  0.0773663904240917, 0.0746513957192722, 0.0698760916571051, 
                  0.0637664163850331, 0.0569583001075686, 0.0499601808319443, 
                  0.0431459480420878, 0.0367665037768682, 0.0309704865244699, 
                  0.0258276109386138, 0.0213506887406452, 0.0175143764945861, 
                  0.0142700142809239, 0.0115566902357672, 0.00930904466406021, 
                  0.00746242789736848, 0.00595617458762258, 0.00473530107497383, 
                  0.0037512633718438, 0.00296206198487496, 0.00233193959974804, 
                  0.00183084949354928, 0.00143381664127347, 0.00107079529581378, 
                  0.000773031105948476, 0.000539591129136582, 
                  0.00036418961146112, 0.000237735602896314, 
                  0.000150162822157894, 9.18314924898318e-05, 
                  5.44098861901558e-05, 3.12559778153059e-05, 
                  1.74208773837321e-05, 9.4273157056547e-06, 
                  4.95631877678932e-06, 2.53284783325032e-06, 
                  1.25859368979788e-06, 6.08138668553004e-07, 
                  2.85583283517172e-07, 1.30134180290484e-07, 
                  5.73301770106708e-08, 2.4220277655412e-08, 
                  9.63204783102959e-09, 3.43818003663801e-09, 
                  9.34842356069374e-10, 0)

truncation_vector <- c(0.530815489705111, 0.183074841902905, 0.0649904582370194, 
                       0.033758006190391, 0.0246345566930968, 
                       0.0188246472033798, 0.0151744026737474, 
                       0.0115998220816554, 0.00884565475518122, 
                       0.00785175488991707, 0.00686163822152995, 
                       0.00543213025871121, 0.0045906391819215, 
                       0.00407936714682641, 0.00346702970944507, 
                       0.00299467055651366, 0.00252555414376256, 
                       0.00225694716549378, 0.00224343574807583, 
                       0.00209697198326529, 0.0021212925346176, 
                       0.00200401343142982, 0.0016975744843908, 
                       0.00153057336510498, 0.00146896130167914, 
                       0.00146733993158899, 0.00142788659272858, 
                       0.0014586926244415, 0.00146409719140868, 
                       0.00137276000966336, 0.00139005462395833, 
                       0.00112685221265673, 0.00125926410335261, 
                       0.00128952967836881, 0.00129709607212286, 
                       0.00134411580473731, 0.0011954902131399, 
                       0.0011560368742795, 0.00111820490550925, 
                       0.00110955759836176, 0.00122629624485282, 
                       0.00134519671813075, 0.00127601826095086, 
                       0.00103713640100157, 0.000987414384903531, 
                       0.00108739887379633, 0.00132790210383578, 
                       0.00113117586623048, 0.00119765203992677, 
                       0.00112901403944361, 0.000886889439314009, 
                       0.000850678840633912, 0.000865811628142013, 
                       0.000900400856731956, 0.000830141486158634, 
                       0.000945799219256256, 0.000897698573248366, 
                       0.000662059453479379, 0.000650169406151586, 
                       0.000687460918225119, 0.000754477548618133, 
                       0.000799875911142434, 0.000801497281232587, 
                       0.000705295989216808, 0.000570722271734061, 
                       0.000563155877980011, 0.000596664193176518, 
                       0.000585314602545443, 0.000623146571315693, 
                       0.000660438083389226, 0.000572343641824214)



## Fit model with truncation vec of length 71 and data of length 57 ------------
length(truncation_vector)
nrow(data)
fit_long_trunc_vec <- EpiNow2::epinow(
  data = data,
  generation_time = generation_time_opts(dist = NonParametric(gi_vector)),
  delays = delay_opts(dist = NonParametric(delay_vector)),
  horizon = 21,
  rt = EpiNow2::rt_opts(list(mean = 1, sd = 0.2)),
  truncation = trunc_opts(dist = NonParametric(truncation_vector)),
  gp = EpiNow2::gp_opts(alpha_sd = 0.0075),
  stan = EpiNow2::stan_opts(
    cores = 4,
    chains = 4,
    seed = 123456,
    control = list(
      adapt_delta = 0.99,
      max_treedepth = 12

    )
  ),
  verbose = interactive()
)

## Fit identical model, but with truncation vec of length 30 and data of 
## length 57 -------------------------------------------------------------------
short_truncation_vector = truncation_vector[1:30] / sum(truncation_vector[1:30])
# Print each truncation vector to verify that they are similar and should 
# produce roughly equal truncation corrections
plot(truncation_vector, xlab = "delay form event to report (d)", ylab = "prob")
points(short_truncation_vector, col = "blue", pch = 4)
fit_short_trunc_vec <- EpiNow2::epinow(
  data = data,
  generation_time = generation_time_opts(dist = NonParametric(gi_vector)),
  delays = delay_opts(dist = NonParametric(delay_vector)),
  horizon = 21,
  rt = EpiNow2::rt_opts(list(mean = 1, sd = 0.2)),
  truncation = trunc_opts(dist = NonParametric(short_truncation_vector)),
  gp = EpiNow2::gp_opts(alpha_sd = 0.0075),
  stan = EpiNow2::stan_opts(
    cores = 4,
    chains = 4,
    seed = 123456,
    control = list(
      adapt_delta = 0.99,
      max_treedepth = 12

    )
  ),
  verbose = interactive()
)


## Extract estimates and compare
## - Extract imputed_reports and obs_reports
## - imputed_reports should include the adjustment for right truncation
## - The model with the shorter truncation pmf vector includes the expected
##   correction, but the one with the long pmf vector does not
dplyr::bind_rows(
  list(
    short_trunc_vector = fit_short_trunc_vec$estimates$fit |>
      tidybayes::spread_draws(imputed_reports[t], obs_reports[t]) |>
      tidybayes::median_qi(.width = c(0.5, 0.99)),
    long_trunc_vector = fit_long_trunc_vec$estimates$fit |>
      tidybayes::spread_draws(imputed_reports[t], obs_reports[t]) |>
      tidybayes::median_qi(.width = c(0.5, 0.99))
  ), .id = "model_version"
) |>
  ggplot2::ggplot(aes(group = interaction(.width))) +
    ggplot2::geom_ribbon(aes(x = t, 
      ymin = imputed_reports.lower, ymax = imputed_reports.upper), 
      fill = "red3", alpha = .2) +
    ggplot2::geom_ribbon(aes(x = t, 
      ymin = obs_reports.lower, ymax = obs_reports.upper), 
      fill = "blue3", alpha = .2) +
    ggplot2::geom_line(aes(x = t, y = imputed_reports), 
      color = "red3", lty = 2) +
    ggplot2::geom_line(aes(x = t, y = obs_reports), 
      color = "blue3", lty = 2) +
    facet_wrap(.~model_version)

EpiNow2 Version:
1.5.2

This screenshot shows that when we pass a shorter truncation vector, the correction at the end of the time series occurs, but it is missing when we pass a vector that is too long:

image


@seabbs seabbs added bug Something isn't working high priority labels Aug 7, 2024
@seabbs
Copy link
Contributor

seabbs commented Aug 7, 2024

Thanks for the detailed report @kgostic - this sounds like a frustrating bug.

I think this issue is related to the truncation functionality itself (based on this check

int trunc_max = min(t, num_elements(trunc_rev_cmf));
)

and then this indexing

trunc_reports[first_t:t] ./= trunc_rev_cmf[1:trunc_max];

This is in effect slicing off the back of the truncation vector and not the front due to it being reversed.

I think this https://github.com/epiforecasts/EpiNow2/pull/736/files fixes the problem but will investigate in more detail + add tests etc shortly. I think we should also open another issue to do a wider dimension check to make sure this issue isn't elsewhere in the code base (i.e here

vector convolve_with_rev_pmf(vector x, vector y, int len) {
if the end index of this is ever not ylin I think this doesn't make sense (though in the way it is used for our convolutions this can't currently be the case as xlen = len).

As a more general comment I think I might set the threshold of the delay PMF to be lower as having so many entries with such low weight (i.e <1e-6) is going to lead to a lot of computation that doesn't add a great deal to the results.

t would be really great if it's possible to add some kind of validator or error message to check the truncation pmf input if there are dimension limits. Even better would be to ensure that if too long of a pmf is passed in, the right hand side is trimmed but the left hand side (where most of the delay mass resides) is left intact.

What do we think about still adding a warning message here? I'm not sure its a terribly good idea for expected practice to be to enter delays longer than the data but on the other hand in say a outbreak setting this might be hard to avoid.

@sbfnk
Copy link
Contributor

sbfnk commented Aug 7, 2024

Ah yes that makes sense, and I agree with the fix.

I think we should also open another issue to do a wider dimension check to make sure this issue isn't elsewhere in the code base

I agree, this makes sense

I think I might set the threshold of the delay PMF to be lower as having so many entries with such low weight (i.e <1e-6) is going to lead to a lot of computation that doesn't add a great deal to the results.

Note you can use the apply_tolerance() (renamed bound_dist() in the development version) function for this

What do we think about still adding a warning message here?

A warning seems appropriate to me.

@seabbs
Copy link
Contributor

seabbs commented Aug 7, 2024

So that does look like it was the problem. Urgh! Thanks again for the clear report.

This issue should closed by the following actions:

I am not sure what the current release time line is but I think we are close to another small patch release and so this should be on CRAN by September I would guess (CRAN is currently on Holiday).

@seabbs seabbs self-assigned this Aug 7, 2024
@kgostic
Copy link
Author

kgostic commented Aug 7, 2024

Thanks @seabbs and @sbfnk!

I'll keep an eye on this and maybe we'll update to the patch release in production once it's out. For now we're just trimming the input, but I definitely would feel safer using a version of EpiNow2 that guarantees we won't run into trouble if our window length changes.

@seabbs
Copy link
Contributor

seabbs commented Aug 8, 2024

Makes sense. Thanks again for the report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working high priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants