24th Summer Institute in Statistical Genetics (SISG)

Module 1: Probability and Statistical Inference

Mon, July 8 to Wed, July 10
Instructor(s):

Module dates/times: Monday, July 8; 8:30 a.m. -5 p.m.; Tuesday, July 9, 8:30 a.m.-5 p.m., and Wednesday, July 10, 8:30 a.m.-Noon

This module serves as an introduction to statistical inference using tools from mathematical statistics and probability. It introduces core elements of statistical modeling, beginning with a review of basic probability and some common distributions (such as the binomial, multinomial, and normal distributions). Maximum likelihood estimation is motivated and described. The central limit theorem and frequentist confidence intervals are introduced, along with simple Bayes methods.

We then cover classical hypothesis testing scenarios such as one-sample tests, two-sample tests, chi-square tests for categorical data analysis, and permutation tests. The course concludes with an overview of resampling methods, such as the bootstrap and jackknife, and a discussion of multiple testing corrections such as false discovery rate control. This module serves as a foundation for almost all of the later modules.

Training in calculus is not a prerequisite for this module, but a willingness to attempt math problems and some comfort with basic algebra will be necessary. Suggested pairing: Modules 5 and 8.

Access 2018 course materials through the Summer Institutes archives.

 

Jim Hughes is Professor of Biostatistics at the University of Washington. He is interested in the application of statistical methods to problems in AIDS and other sexually transmitted diseases. He is particularly interested in cluster randomized trial designs and statistical methods for dealing with misclassified data. He is heavily involved in graduate and undergraduate teaching and graduate student advising, and he has won teaching awards. He recently published “On the design and analysis of stepped wedge trials.” Contemporary Clinical Trials. 45(Pt A):55-60, 2015.

Mauricio Sadinle is Genentech Distinguished Assistant Professor of Biostatistics at the University of Washington. He develops methodology for a variety of applied and data-driven problems. He has worked on: Record linkage techniques to combine datafiles that contain information on overlapping sets of individuals but lack unique identifiers; Nonignorable missing data modeling, and the usage of auxiliary information to identify nonignorable missing data mechanisms; Classification techniques that output sets of plausible labels for ambiguous sample points. He also has experience working with social network models for valued ties, and capture-recapture models in the context of human rights violations. His recent publications include ``Detecting duplicates in a homicide registry using a Bayesian partitioning approach.'' Annals of Applied Statistics 8:2404-2434.