Skip to content

Bayesian Modelling in Epidemiology Case Line Lists

License

Notifications You must be signed in to change notification settings

jim-sheldon/EpiLine

 
 

Repository files navigation

EpiLine - Estimating epi-curves and distributions from case line list data

This package contains models for estimating epi-curves and individual infection progression distributions simultaneously. The estimators require both total case data by date as well as detailed ("line list") information for a subset of individual.

Symptom-Report Model

This models the delay between the onset of symptoms and the presenting/testing/reporting of cases to health authorities. Early in outbreaks these delays can frequently be large due to lack of awareness of the symptoms, however, with increased awareness due to public health information the delays will decrease in time. Even if these delays are constant with time, when estimating their distribution it is necessary to consider both censoring effects and the underlying dynamics of the infection to avoid biases. Conversely, when estimating the dynamics of then infection (e.g. r(t) or R(t)) from reported cases, it is necessary to know these distributions. Therefore, if both infection dynamics and reporting delays are varying at the same time, the best way to account for the biases is to simultaneously estimate both.

Model Description

The aim of the model is to understand the interaction between the symptom-report time distribution and the underlying dynamics of the infection rate, therefore we use a very simple model for the number of people developing the symptoms each day. We model the daily growth rate $r(t)$ with a Gaussian process, so the daily number of people of developing symptoms $S(t)$ is given by

$$ \begin{align} r(t) &\sim N( r( t - 1 ), \sigma^2_{r_{GP}}) \\ S(t) &= S(t-1) e^{r(t)}, \end{align} $$

where $\sigma^2_{r_{GP}}$ is the daily variance of the Gaussian process. Note that by making $r(t)$ a Gaussian process instead of $S(t)$ directly a Gaussian process, it means that the prior is that the expected daily change in $S(t)$ is the same as the previous day. Next we define $f(\tau,t)$, which is the probability of someone reporting an infection on day $t+\tau$ if they developed symptoms on day $t$. Note that $\tau$ can be negative if a case is found prior to symptoms developing (e.g. if contact-traced and tested positive). On day $t$ the expected number of cases reported is $\mu(t)$ and given by

$$ \mu(t) = \sum_{\tau = -\tau_{\rm post}}^{\tau_{\rm pre}} f(\tau,t-\tau) S(t-\tau) $$

where $\tau_{\rm pre}$ is the maximum number of days pre-reporting the case develops symptoms and $\tau_{\rm pre}$ the maximum number of days post-reporting the case develops symptoms. The number of observed reported cases $C(t)$ is modelled as negative binomial variable

$$ C(t) \sim NB(\mu(t),\phi_{OD}), $$

where $\phi_{OD}$ is the over-dispersion parameter.

The symptom-report time distribution must support both positive and negative values. In addition, empirically it is observed that this distribution can be highly skewed with heavy tails, therefore we model it using the Johnson SU distribution which contains 4 parameters $(\xi, \lambda, \gamma,\delta)$. To account for the changes in the distribution over time, we model these 4 parameters using Gaussian processes

$$ \begin{align} \xi(t) &\sim N( \xi( t - 1 ), \sigma^2_{\xi_{GP}}), \\ \lambda(t) &\sim N( \lambda( t - 1 ), \sigma^2_{\lambda_{GP}}), \\ \gamma(t) &\sim N( \gamma( t - 1 ), \sigma^2_{\gamma_{GP}}), \\ \delta(t) &\sim N( \delta( t - 1 ), \sigma^2_{\delta_{GP}}). \end{align} $$

At then end of the reporting period, there may not be many reported case for each symptoms date (since the data is right-censored), therefore there is the option to to make the distribution static after a particular time $t_{\rm static}$ i.e. when $t>t_{\rm static}$ then $\xi(t)=\xi(t_{\rm static})$ etc.. These parameters are estimated using line-list data of individual cases where the symptoms date report date are known. Note, for cases where only the report date is known, they should be included in the daily report totals $C(t)$, but not in the symptom-report line list. From the line-list, let $N_S(t)$ be number of people who reported symptoms onset as of day $t$, and let $n_{SR}(t,\tau)$ be the number of people who reported symptoms onset as of day $t$ and reported to health authorities on day $t+\tau$. For day $t$, we model { $n_{SR}(t,-\tau_{\rm post}),..., n_{SR}(t,\tau_{\rm pre})$ } using a multinomial distribution with parameters {