24th Summer Institute in Statistical Genetics (SISG)

Module 18: MCMC for Genetics

Wed, July 24 to Fri, July 26

Module dates/times: Wednesday, July 24, 1:30-5 p.m.; Thursday, July 25, 8:30 a.m.-5 p.m., and Friday, July 26, 8:30 a.m.-5 p.m.

This module examines the use of Bayesian Statistics and Markov chain Monte Carlo methods in modern analyses of genetic data. It assumes a solid foundation in basic statistics and the concept of likelihood. Some population genetics and a basic familiarity with the R statistical package, or other computing language, will be helpful.

The first day includes an introduction to Bayesian statistics, Monte Carlo, and MCMC. Mathematical concepts covered include expectation, laws of large numbers, and ergodic and time-reversible Markov chains. Algorithms include the Metropolis-Hastings algorithm and Gibbs sampling. Some mathematical detail is given; however, there is considerable emphasis on concepts and practical issues arising in applications.

Mathematical ideas are illustrated with simple examples and reinforced with a computer practical using the R statistical language. With that background, two applications of MCMC are investigated in detail: inference of population structure (using the program STRUCTURE) and haplotype inference (using the program PHASE). Computer exercises using both programs are included.

Further topics include the use of MCMC in model evaluation and model checking, strategies for assessing MCMC convergence and diagnosing MCMC mixing problems, importance sampling, and Metropolis-coupled MCMC.

Suggested pairing: Modules 8 and 13.

Access 2018 course materials through the Summer Institutes archives.

Matthew Robinson is Professor of Computational Biology at the University of Lausanne. He develops and applies statistical methodology for large human phenotype-genotype datasets to address questions in population, quantitative, and medical genetics. His current work focuses on improved testing for sex-, age-, or environment-specific genetic effects, quantifying maternal genetic and social genetic effects, and investigating the role of genetic interactions between microbial and host genotype in shaping phenotype in the human populations. He recently published “Genotype-covariate interaction effects and the heritability of adult body mass index.” Nature Genetics 49:1174, 2017.

Matthew Stephens is Professor of Statistics and Human Genetics at the University of Chicago. He was a developer of STRUCTURE, a widely used computer program for determining population structure and estimating individual admixture. He also was a developer of the influential Li and Stephens model as an efficient model for linkage disequilibrium. His recent publications include “Bayesian large-scale multiple regression with summary statistics from genome-wide association studies.” Annals of Applied Statistics 11:1561-1592.