`pathfinder` fails for large case reports #728

jamesmbaazam · 2024-07-24T17:03:19Z

In my bid to get pathfinder running for the vignette in #695, I realised that it was quite likely failing because of the extremely large case reports in the simulated data (approx 40M at peak), so I decided to run a few data scenarios to see how the method performs with different case numbers. My suspicion has so far been confirmed, as the method performs well with case numbers between 100 and 10K, but starts to fail from there. Also notice that the credible intervals get narrower as the data magnitude increases. I am wondering why this is the case from the theoretical perspective, and if there is a way to improve the method to handle such large case numbers.

The log is quite rich, so I've attached it here for further discussion.
pathfinder_scenarios_log.txt

library(EpiNow2)
#> 
#> Attaching package: 'EpiNow2'
#> The following object is masked from 'package:stats':
#> 
#>     Gamma
library(data.table)
library(purrr)
#> 
#> Attaching package: 'purrr'
#> The following object is masked from 'package:data.table':
#> 
#>     transpose
library(ggplot2)

# Function to simulate data scenarios
simulate_epidemic <- function(days, date_start, peak_day, max_cases, spread, seed) {
  set.seed(seed)
  # Calculate the cases for each day
  days_vec <- seq_len(days)
  cases <- max_cases * exp(-((days_vec - peak_day)^2) / (2 * spread^2))
  
  # Create a data frame for the result
  epidemic_curve <- data.frame(
    date = as.Date(date_start) + days_vec,
    confirm = ceiling(cases)
  )
  return(epidemic_curve)
}

# Scenarios for epidemi peaks
peaks <- c(100, 100E1, 100E2, 100E3, 100E4, 100E5)

# Create version of epinow that always succeeds
safe_epinow <- purrr::safely(epinow)
# Define epinow() inputs
## Generation time
generation_time <- Gamma(
  shape = Normal(1.3, 0.3),
  rate = Normal(0.37, 0.09),
  max = 14
)
# Run the simulations
pf_results <- lapply(peaks, function(peak) {
  test_data <- simulate_epidemic(
    seed = 123,
    days = 365,
    date_start = "2020-02-22",
    peak_day = 125,
    max_cases = peak,
    spread = 20
  )
  
  safe_epinow(
    data = as.data.table(test_data)[confirm > 1], # only the data is changing
    generation_time = generation_time_opts(generation_time),
    stan = stan_opts(method = "pathfinder", backend = "cmdstanr"),
    verbose = FALSE,
    logs = NULL
  )
})
# Annotate the results
names(pf_results) <- paste0("peak_", peaks)

# Scenarios that failed will return TRUE
sapply(pf_results, function(x) is.null(x$result))
#>   peak_100  peak_1000 peak_10000 peak_1e+05 peak_1e+06 peak_1e+07 
#>      FALSE      FALSE      FALSE       TRUE       TRUE       TRUE

# Plot those that succeeded
sapply(pf_results, function(x) {
  if (!is.null(x$result)) {
    x$result$plots$summary
  }
})
#> $peak_100

#> 
#> $peak_1000

#> 
#> $peak_10000

#> 
#> $`peak_1e+05`
#> NULL
#> 
#> $`peak_1e+06`
#> NULL
#> 
#> $`peak_1e+07`
#> NULL

^{Created on 2024-07-24 with reprex v2.1.1}

The text was updated successfully, but these errors were encountered:

seabbs · 2024-08-21T10:08:31Z

Oh interesting. I assume this is because of numerical overruns for very large numbers (i.e because the prior distribution is multiplier case values a incidence of 40 million might allow some prior values to be 10^3 * 40 million) I would guess we could improve this by improving the prior distribution

seabbs · 2024-08-21T10:08:49Z

or usinng a different default model where you enforce some concept of population size.

seabbs · 2024-08-21T10:09:09Z

Did you look at see if NUTs has the same scaling problems? I would guess it would do but maybe not? If not that is interesting

jamesmbaazam · 2024-09-02T10:14:03Z

Sorry, I missed this message. I'll try with NUTS and see.

jamesmbaazam mentioned this issue Jul 24, 2024

Add vignette on benchmarking model options #695

Draft

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`pathfinder` fails for large case reports #728

`pathfinder` fails for large case reports #728

jamesmbaazam commented Jul 24, 2024 •

edited

Loading

seabbs commented Aug 21, 2024

seabbs commented Aug 21, 2024

seabbs commented Aug 21, 2024

jamesmbaazam commented Sep 2, 2024

pathfinder fails for large case reports #728

pathfinder fails for large case reports #728

Comments

jamesmbaazam commented Jul 24, 2024 • edited Loading

seabbs commented Aug 21, 2024

seabbs commented Aug 21, 2024

seabbs commented Aug 21, 2024

jamesmbaazam commented Sep 2, 2024

`pathfinder` fails for large case reports #728

`pathfinder` fails for large case reports #728

jamesmbaazam commented Jul 24, 2024 •

edited

Loading