12th Summer Institute in Statistics and Modeling in Infectious Diseases (SISMID)

Module 14: Evolutionary Dynamics and Molecular Epidemiology of Viruses

Mon Jul 27 to Wed Jul 29

Module dates/times: Monday, July 27; Tuesday, July 28, and Wednesday, July 29. 

Prerequisites: This module assumes knowledge of the material in Module 1: Probability and Statistical Inference, though not necessarily from taking that module.

Genetic sequence data is increasingly being used to track or characterize how diseases spread. During the current COVID-19 pandemic in particular, genetic sequence data has become an important source of information. Initially, it gave us evidence for human to human transmission. It was then used to learn about how outbreaks in different parts of the world are related and to track the speed at which COVID-19 spreads. The reason why we can use genetic sequence data to do so is because random errors occur when SARS-CoV-2 (the virus that causes COVID-19) infect individuals and those errors are then transmitted to other hosts.  The further apart in the transmission history pathogens isolated from different hosts are, the more divergent their genetic sequences will be. This in turn means that we can use genetic sequences of pathogens to learn something about the transmission dynamics of pathogens. The most popular way to do so is by using phylogenetic and phylodynamic methods. These allow us to reconstruct how individual pathogens isolated from different patients are related and to reconstruct past transmission dynamics.

In this module, we will learn how to go from genetic sequences to learning something about the transmission dynamics. To do so, we will look at how to use phylogenetic and bioinformatic tools to reconstruct the spread of pathogens from genetic sequence data. In particular, we will be focusing on Bayesian phylogenetics. 

We will first briefly cover the different components of  Bayesian phylogenetic analyses, such as different evolutionary models. Additionally, we will introduce different phylodynamic models, such as coalescent and birth-death models, that allow us to extract information about past population dynamics from genetic sequence data. 

As the main software, we will be using BEAST (Bayesian Evolutionary Analysis by Sampling Trees) and BEAST2. We will be covering how to setup and interpret analyses using lectures and tutorials that are focused on estimating evolutionary rates and population dynamics through time. Additionally, we will look into evolutionary processes including recombination and reassortment.