The 27th Summer Institute in Statistical Genetics

Module 10: MCMC for Genetics

Mon, July 18 to Wed, July 20

This module examines the use of Bayesian Statistics and Markov chain Monte Carlo methods in modern analyses of genetic data. It assumes a solid foundation in basic statistics and the concept of likelihood. Some population genetics and a basic familiarity with the R statistical package, or other computing language, will be helpful.

The first day includes an introduction to Bayesian statistics, Monte Carlo, and MCMC. Mathematical concepts covered include expectation, laws of large numbers, and ergodic and time-reversible Markov chains. Algorithms include the Metropolis-Hastings algorithm and Gibbs sampling. Some mathematical detail is given; however, there is considerable emphasis on concepts and practical issues arising in applications.

Mathematical ideas are illustrated with simple examples and reinforced with a computer practical using the R statistical language. 

Further topics include strategies for assessing MCMC convergence and diagnosing MCMC mixing problems, and Metropolis-coupled MCMC.

Suggested pairing: Module 7: Application of Population Genetics and Module 14: Advanced Quantitative Genetics.

Learning Objectives: After attending this module, participants will be able to:

  1. Derive the (analytic) posterior distribution for a Binomial proportion given a conjugate (Beta) prior.
  2. Implement a Metropolis-Hastings algorithm to sample from this posterior distribution and check that it matches the analytic form.
  3. Derive the posterior distribution for cluster memberships given a prior on clusters and a likelihood for each cluster.
  4. Implement a Gibbs sampler to sample from cluster memberships given data from a mixture of product-Bernoulli distributions.