In this course we will consider a range of models used in epidemiology - from hierarchical modelling and spatial statistics to disease transmission modelling - and their probabilistic (Bayesian) formulation. In order to perform Bayesian inference we will use the probabilistic programing language (PPL) Numpyro.
Let's uncover each of the three key terms of the course - epidemiology, Bayesian modelling and probablistic programming. You can think of them as the 'Why?', 'What?' and 'How?' of the course, correspondingly.
(epidemiology)=
Epidemiology serves as the underlying rationale in this course, explaining WHY we develop the probabilistic models we'll be examining. Essentially, it addresses the question: 'What real-world phenomena are we aiming to analyse using these models?'
Epidemiology studies human health. To be more specific, it is the study of how diseases and health-related events are distributed within populations and the factors that influence these distributions. It is a branch of public health that focuses on understanding the patterns, causes, and effects of diseases and health conditions on a large scale. Epidemiologists collect and analyse data to investigate the occurrence of health outcomes, their risk factors, and the impact of various interventions or preventive measures.
Epidemiological studies are essential for understanding the health of populations, identifying health disparities, and guiding public health efforts to improve the well-being of communities and societies.
Here is a few examples of epidemiological study types.
If the surveillance tackles purely the temporal development of a health outcome, it suffices to construct temporal models. If space is also of interest, one needs to build spatial or spatiotemporal models.
-
Disease Surveillance: epidemiologists monitor the occurrence of diseases and health-related events over time and across different geographic areas. This involves tracking the number of cases, identifying outbreaks, and assessing trends in disease incidence and prevalence. Disease surveillance can be conducted for both infectious and non-infectious diseases.
-
Identifying Risk Factors: epidemiological studies aim to identify the factors that are associated with increased likelihood of developing a particular disease. These risk factors can include genetic predisposition, environmental exposures, lifestyle choices, and social determinants of health.
-
Disease Prevention and Control: the insights gained from epidemiological research are crucial for designing and implementing public health interventions and policies aimed at preventing and controlling diseases. This may involve vaccination campaigns, health education programs, quarantine measures, and more.
-
Outbreak Investigation: epidemiologists are often involved in investigating disease outbreaks, such as foodborne illnesses, infectious disease outbreaks, or clusters of chronic diseases. They work to identify the source of the outbreak and implement measures to contain and prevent further spread.
It is important to distinguish <font color='orange'>associative</font> studies with those where researchers try to uncover <font color='orange'>causal</font> relationships between risk factors and outcomes.
-
Public Health Planning: epidemiological data and findings play a vital role in informing public health planning and resource allocation. This includes assessing healthcare needs, identifying at-risk populations, and developing strategies to improve overall health outcomes.
-
Causality Assessment: epidemiologists use various study designs, including cohort studies, case-control studies, and randomized controlled trials, to determine if a specific factor or intervention causes a particular health outcome.
Mathematical and statistical models are frequently used in epidemiology to simulate disease spread and estimate disease distribution. These models help in making informed decisions and planning interventions.
Some models that we will build in this course are more relevant to infectious, and some to chronic diseases. The scope of applicability will be clarified for each model when it is introduced.
You must have heard a lot recently about <font color='orange'>generative AI</font> and <font color='orange'>deep generative modelling (DGM)</font>. It is indeed the same 'generative' idea as we are talking here about. The difference is that DGM uses deep learning and neural network for the generative mechanism, and in traditional epidemiology it is more common to use statistical and mechanistic models for such generation. Having said that, we will touch DGMs in this course too.
Bayesian modelling represents the fundamental focus of this course, addressing the question of "WHAT models can describe the generative process behind the observed data?". Throughout the course, we will use the terms "Bayesian" and "probabilistic" interchangeably.
Probabilistic modelling is a mathematical and statistical framework used to incorporate uncertainty and randomness into models to account for variability and its sources in real-world phenomena. It involves using probability theory to describe and quantify the uncertainty associated with different events, outcomes, or variables. The primary goal of probabilistic modelling is to make predictions, infer information, or make decisions in situations where there is inherent uncertainty. Probabilistic modelling is a powerful tool for dealing with real-world complexities in a quantitative manner. It plays a crucial role in data analysis, machine learning, and decision-making processes where probabilistic reasoning is necessary.
Probabilistic modelling in epidemiology helps epidemiologists and public health officials make informed decisions by quantifying uncertainty, simulating realistic disease dynamics, and assessing the potential impact of various interventions. It is a powerful tool for improving our understanding of health outcomes and guiding effective public health responses.
Some key concepts and components of probabilistic modelling are as follows:
-
Random variables: in probabilistic modelling, random variables are used to represent uncertain quantities or events. These variables can take on different values with associated probabilities.
-
Probability distributions: a probability distribution describes how the values of a random variable are distributed or spread out.
-
Parameters: probability distributions are often characterized by parameters that determine their shape and behavior. For example, the mean and standard deviation of a normal distribution are parameters that describe its central tendency and spread.
-
Bayesian inference: Bayesian probabilistic modelling is a framework that uses Bayes' theorem to update the probability distribution of a random variable based on new evidence or data. It combines prior beliefs (prior distribution) with observed data to form a posterior distribution, which represents the updated beliefs.
-
Monte Carlo methods: Monte Carlo methods are a class of computational techniques used to estimate complex probabilistic models through random sampling. They involve generating random samples from probability distributions to approximate quantities of interest.
Probabilistic programming is a specialised approach to building and analysing probabilistic models that offers several advantages for epidemiology and the study of infectious disease dynamics:
-
Flexibility: probabilistic programming languages (PPLs), such as Stan, Pyro, Numpyro, PyMC, Turing.jl and other, provide a flexible framework for defining and customising probabilistic models. This flexibility is crucial in epidemiology, where the complexity of disease transmission models can vary widely depending on the specific disease and the population under study.
-
Abstract modelling from inference: probabilistic programming languages abstract model formulation from inference. We can focus on the applied question, and do not need to write samplers by hand. Instead, we can use robus and tests in battle samples provided by the PPLs.
-
Uncertainty quantification: probabilistic programming allows for the explicit representation and quantification of uncertainty. Epidemiological models often involve uncertain parameters and data, and probabilistic programming makes it easier to incorporate this uncertainty into the modelling process.
-
Hierarchical modelling: many epidemiological models involve hierarchical structures, where data at multiple levels (e.g., individuals, households, communities) are analysed simultaneously. Probabilistic programming makes it easier to specify and fit hierarchical models capturing such structure.
-
Model validation: probabilistic programming facilitates model validation by enabling researchers to compare model predictions with observed data using techniques like posterior predictive checks.
-
Model selection and comparison: epidemiologists often need to compare different model structures or assess the fit of alternative hypotheses. Probabilistic programming facilitates model selection and comparison through techniques like Bayesian model averaging and model evidence calculation.
-
Data integration: probabilistic programming can enable the integration of various types of data. This integration can improve the accuracy and informativeness of epidemiological models.
-
Transparent communication: probabilistic programming encourages transparency in modelling and analysis. Researchers can clearly specify their assumptions, priors, and likelihood functions, making it easier to communicate and collaborate with other experts and stakeholders.
-
Extensible libraries: probabilistic programming languages often come with extensive libraries and tools for model development, inference, and visualization, reducing the implementation and computation burden.
:class: tip
Find a publication that applies Bayesian inference in the field of epidemiology, such as in spatial statistics or disease transmission modelling.
- Identify which Bayesian methods (such as MCMC, VI, ABC, etc) and models were employed in the paper.
- Determine the inference tools applied in the study, such as PPL usage, custom MCMC samplers, or specialised libraries.
- Do you think the modelling part of the study could be improved or extended in some way?