P1) Combining spatio-temporal model predictions through logarithmic pooling
Concentration areas: probability, theoretical statistics, model combination.
As the COVID-19 pandemic made even more clear, having reliable prediction models is of utmost importance for decision. In this project the candidate will be expected to exploit the results of Carvalho et al. (2022) to study novel ensemble formation techniques using logarithmic pooling. Special attention will be paid to spatio-temporal epidemic models for COVID-19, Influenza, Dengue and Zika. The project will involve theoretical and computational aspects.
This is joint work with Drs. Alvaro Faria and Fadlalla Elfadaly from the Open Universty in the UK.
References:
Expected skills: strong mathematical statistics and computational statistics background. Competent R or C++ or Python programming.
P2) Optimal scaling for adaptive MCMC in phylogenetics
Concentration areas: Statistical Phylogenetics, Markov Chain Monte Carlo.
In my Chapter 2 of PhD thesis I have proposed a class of simple adaptive candidate-generating mechanisms for time-calibrated phylogenies in Metropolis-Hastings. A major question, however, is that of which target acceptance probability leads to optimal performance. In this project the candidate will be expected to use a combination of MCMC theory and computational experiments to understand whether it is possible to obtain general results for the optimal scaling of MCMC for (time-calibrated) phylogenies. In the process, the candidate will be expected to expand and strengthen previous results in this area, such as proving or disproving the existence of geometrically-ergodic MH-type chains for this problem. The candidate will be also familiarise themselves with the development of BEAST, a software package for Bayesian phylogenetic estimation.
References:
- Andrieu & Thoms (2008).
- Excellent presentation by Natesh Pillai.
- My MCQMC 2020 presentation.
Expected skills: strong mathematical statistics and computational statistics background. Competent JAVA and/or Julia programming.
P3) Balancing local and global exploration for MCMC in treespace
Concentration areas: Applied Probability, Statistical Phylogenetics, Markov Chain Monte Carlo.
Markov chain Monte Carlo is the main tool for the treatment of Bayesian inference problems in phylogenetics. The main impediment to efficient exploration of the ambient space seems to be the discrete tree structure. The aim of this project is to bring recent developments in the field of MCMC for discrete spaces to the phylogenetics realm. In particular, we are concerned with complicated, real-world time-calibrated trees. The natural questions to ask are how do we find multiple modes and how do we explore them. Finding a balance between mode-jumping and mode exploration is crucial for efficiency. Two major questions present themselves: (i) how do we find efficient mode-jumping and local proposals? And (ii) what is the role of parallel tempering-type strategies in facilitating mode-finding?
References:
- Zanella (2017).
- Chapters 1-3 of my PhD thesis.
- Power & Goldman (2019)
- Syed et al. (2021).
Expected skills: strong probability and computational statistics background. Competent R or C++ or Python programming.