Skip to content

Commit

Permalink
add function to take delay and triangle and perform nowcast
Browse files Browse the repository at this point in the history
  • Loading branch information
kaitejohnson committed Jan 31, 2025
1 parent 9c21a1b commit e650d57
Show file tree
Hide file tree
Showing 9 changed files with 257 additions and 28 deletions.
3 changes: 2 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,8 @@ BugReports: https://github.com/epinowcast/baselinenowcast/issues
Depends:
R (>= 4.0.0)
Imports:
cli
cli,
checkmate
Suggests:
epinowcast,
bookdown,
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# Generated by roxygen2: do not edit by hand

export(apply_delay)
export(get_delay_estimate)
65 changes: 65 additions & 0 deletions R/apply_delay.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
#' Apply the delay to generate a point nowcast
#' @description
#' This function takes as an input the reporting triangle that we want to
#' complete with a nowcast and a delay pmf and generates a point estimate of a
#' completed reporting square (or rectangle)
#'
#'
#' @param triangle_to_nowcast matrix of the incomplete reporting triangle to be
#' nowcasted, with rows representing the time points of reference and columns
#' representing the delays
#' @param delay_pmf vector of delays assumed to be indexed starting at the
#' first delay column in `triangle_to_nowcast`
#'
#' @return expectation, a matrix of the same number of rows and columns as the
#' `triangle_to_nowcast` but with the missing values filled in as point
#' estimates
#' @export
#'
#' @examples
#' #' library(epinowcast)
#' nat_germany_hosp <-
#' germany_covid19_hosp[location == "DE"][age_group == "00+"]
#' nat_germany_hosp <- enw_filter_report_dates(
#' nat_germany_hosp,
#' latest_date = "2021-10-01"
#' )
#' pobs <- enw_preprocess_data(nat_germany_hosp, max_delay = 21)
#' triangle_raw <- pobs$reporting_triangle[[1]] |>
#' dplyr::select(-`.group`, -reference_date) |>
#' as.matrix() |>
#' unname()
#' delay_df <- get_delay_estimate(
#' triangle = triangle_raw,
#' max_delay = 20,
#' n_history = 30
#' )
#' reporting_square <- apply_delay(
#' triangle_to_nowcast = triangle_raw,
#' delay_pmf = delay_df$pmf
#' )
apply_delay <- function(triangle_to_nowcast,
delay_pmf) {
# Checks that the delay df and the triangle are compatible
validate_delay_and_triangle(
triangle,

Check warning on line 45 in R/apply_delay.R

View workflow job for this annotation

GitHub Actions / lint-changed-files

file=R/apply_delay.R,line=45,col=5,[object_usage_linter] no visible binding for global variable 'triangle'
delay_pmf
)
n_delays <- nrow(delay_df)

Check warning on line 48 in R/apply_delay.R

View workflow job for this annotation

GitHub Actions / lint-changed-files

file=R/apply_delay.R,line=48,col=20,[object_usage_linter] no visible binding for global variable 'delay_df'
n_dates <- nrow(triangle_to_nowcast)
expectation <- triangle_to_nowcast

for (co in 2:n_delays) {
block_bottom_left <- expectation[(n_dates - co + 2):n_dates, 1:(co - 1),
drop = FALSE
]
# Uses the observed data to find the expected total on that reference date
exp_total <- rowSums(block_bottom_left) / sum(delay_pmf[1:(co - 1)])
# * Note, we will have to do some correction if this is 0, ignore for now*
# Finds the expected value for the particular delay by scaling by the
# delay pmf for delay d
expectation[(n_dates - co + 2):n_dates, co] <- exp_total * delay_pmf[co]
}

return(expectation)
}
1 change: 0 additions & 1 deletion R/get_delay_estimate.R
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@
#' Modified from: https://github.com/KITmetricslab/RESPINOW-Hub/blob/7cce3ae2728116e8c8cc0e4ab29074462c24650e/code/baseline/functions.R#L55 #nolint
#' @param triangle matrix of the reporting triangle, with rows representing
#' the time points of reference and columns representing the delays
#' indexed at 0
#' @param max_delay integer indicating the maximum delay to estimate, in units
#' of the delay. The default is to use the whole reporting triangle,
#' `ncol(triangle) -1`.
Expand Down
38 changes: 38 additions & 0 deletions R/validate.R
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,13 @@
validate_triangle <- function(triangle,
max_delay,
n_history) {
# Make sure the input triangle only contains integer values
# and is of the correct class
checkmate::assert_class(triangle, "matrix")
checkmate::assert_integerish(triangle)
checkmate::assert_integerish(max_delay)
checkmate::assert_integerish(n_history)

if (nrow(triangle) <= ncol(triangle)) {
cli::cli_abort(
message = c(
Expand Down Expand Up @@ -47,3 +54,34 @@ validate_triangle <- function(triangle,
)
}
}

#' Validate triangle to nowcast and delay pmf together
#' @description
#' Various checks to make sure that the reporting triangle and the delay pmf
#' passed in to `apply_delay()` are formatted properly and compaitble
#' @param triangle a matrix of values with rows indicating the time points and
#' columns indicating the delays
#' @param delay_pmf a vector of length of the number of delays indicating the
#' probability of a case being reportined on a given delay
#'
#' @returns NULL, invisibly
validate_delay_and_triangle <- function(triangle,
delay_pmf) {
# Check that the input triangle only contains integer values
checkmate::assert_integerish(triangle)
# Check that the inputs are the correct type
checkmate::assert_class(triangle, "matrix")
checkmate::assert_class(delay_pmf, "numeric")

# Make sure the triangle has the same number of colums as the delay
if ((ncol(triangle) != length(delay_pmf))) {
cli::cli_abort(
message = c(
"Length of the delay pmf is not the same as the number of delays ",
"in the triangle to be nowcasted. Ensure that these are equivalent ",
"by generating the delay pmf using the same maximum delay as in the ",
"data you want to be nowcasted."
)
)
}
}
48 changes: 48 additions & 0 deletions man/apply_delay.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 1 addition & 2 deletions man/get_delay_estimate.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

22 changes: 22 additions & 0 deletions man/validate_delay_and_triangle.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

104 changes: 80 additions & 24 deletions vignettes/baselinenowcast.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -82,25 +82,17 @@ For this quick start, we use daily level data from the
These data represent hospitalisation counts by date of positive test and date
of test report in Germany up to October 1, 2021.

# Filtering

# Filtering and plotting the data

We will filter the data to just look at the national-level data, for all age
groups.
We will pretend that we are making a nowcast as of July 1, 2021, therefore we
will exclude all reference dates and report dates after that date.
`germany_covid19_hosp` is provided as package data from `epinowcast`
```{r load-data}
observed_long <- epinowcast::germany_covid19_hosp |>
enw_filter_report_dates(latest_date = "2021-07-01") |>
dplyr::filter(location == "DE", age_group == "00+")
```

Let's start by plotting all of the values reported on the day of admissions,
Let's start by plotting the sum of the reports at each reference date,
and then compare that to what we will eventually observe as of the final date
in the complete dataset.

# Plotting

The red line shows the cumulative number of confirmed admissions on each report
date, across all delays, using the data available as of July 1, 2021.
It demonstrates the characteristic behavior of right-truncation.
Expand All @@ -114,18 +106,23 @@ at each reference date, observed as of the "fully observed" data on October
ggplot() +
geom_line(
# Plot the data summed across reporting days
data = observed_long |>
group_by(reference_date) |>
summarise(total_confirmed = sum(confirm)),
aes(x = reference_date, y = total_confirmed), color = "darkred"
data = epinowcast::germany_covid19_hosp |>
enw_filter_report_dates(latest_date = "2021-07-01") |>
dplyr::filter(
location == "DE", age_group == "00+",
report_date == "2021-07-01"
),
aes(x = reference_date, y = confirm), color = "darkred"
) +
geom_line(
data = epinowcast::germany_covid19_hosp |>
dplyr::filter(location == "DE", age_group == "00+") |>
group_by(reference_date) |>
summarise(total_confirmed = sum(confirm)) |>
dplyr::filter(reference_date <= "2021-07-01"),
aes(x = reference_date, y = total_confirmed), color = "black"
dplyr::filter(
location == "DE", age_group == "00+",
reference_date <= "2021-07-01"
) |>
dplyr::group_by(reference_date) |>
dplyr::summarise(confirm = max(confirm)),
aes(x = reference_date, y = confirm), color = "black"
) +
theme_bw() +
xlab("Reference date") +
Expand Down Expand Up @@ -202,6 +199,16 @@ pobs <- enw_preprocess_data(
triangle_full <- pobs$reporting_triangle[[1]]
head(triangle_full)
# Check that the sum of the rows in the reporting triangle is the
# same as the sums across report dates by reference date
observed_sums <- observed_long |>
dplyr::group_by(reference_date) |>
dplyr::summarise(total_confirmed = max(confirm)) |>
dplyr::mutate(triangle_sums = rowSums(triangle_full |>
dplyr::select(-`.group`, -reference_date) |>
as.matrix() |>
unname()))
```

# Estimate delay
Expand Down Expand Up @@ -246,6 +253,7 @@ delay_df <- get_delay_estimate(
n_history = n_history
)
ggplot(delay_df) +
geom_line(aes(x = delay, y = cumsum(pmf))) +
xlab("Delay") +
Expand Down Expand Up @@ -274,12 +282,60 @@ The reporting triangle we are applying it to must have the same `max_delay`
as the delay estimate.

```{r}
# point_reporting_square <- apply_delay( #nolint
# triangle_to_nowcast = triangle,#nolint
# delay_df#nolint
# )#nolint
point_reporting_square <- apply_delay(
triangle_to_nowcast = triangle,
delay_pmf = delay_df$pmf
)
```

We'll make a quick plot to compare the nowcast to the later observed data
```{r}
# Get the confirmations through July 1, 2021, from the observations up until
# October 1, 2021.
final_data <- epinowcast::germany_covid19_hosp[location == "DE"][age_group == "00+"] |> # nolint
enw_filter_report_dates(latest_date = "2021-10-01") |>
enw_filter_reference_dates(
latest_date = "2021-07-01",
include_days = n_history - 1
) |>
dplyr::group_by(reference_date) |>
dplyr::summarise(
total_confirmed = max(confirm)
) |>
dplyr::mutate(nowcast = rowSums(point_reporting_square))
ggplot() +
geom_line(
# Plot the data summed across reporting days as of July 1,2021
data = observed_long |>
group_by(reference_date) |>
summarise(total_confirmed = max(confirm)),
aes(x = reference_date, y = total_confirmed), color = "darkred"
) +
geom_line(
data = final_data,
aes(x = reference_date, y = total_confirmed), color = "black"
) +
geom_line(
data = final_data,
aes(x = reference_date, y = nowcast), color = "darkblue"
) +
theme_bw() +
xlab("Reference date") +
ylab("Confirmed admissions") +
scale_y_continuous(trans = "log10") +
ggtitle("Comparing all admissions by reference date and those with no delay")
```
Here we can see that our point nowcast slightly overestimates what was
eventually reported (black line), but does a decent overally job of correcting
for the right-truncation observed in the red line (the data prior to the
nowcast).

What we of course really want is a probabilistic estimate of the nowcasted
case counts.


# Estimate uncertainty

So far, we've demonstrated how to generate a point estimate of a nowcast.
Expand Down

0 comments on commit e650d57

Please sign in to comment.