Summer Institutes

6th Summer Institute in Statistics for Clinical and Epidemiological Research (SISCER)

Module 4: Modern Statistical Learning Methods for Observational Data (July 22-23)

Mon, July 22

Instructor(s):

Marco Carone, David Benkeser, Larry Kessler

Note this is a two-day module which takes place July 22 and July 23.

While clinical trials provide the highest level of evidence to compare clinical treatments or public health interventions, they are often not feasible due to ethical, logistic or economic constraints. Observational studies provide an opportunity to learn about the effect of interventions for which little or no trial data are available. These studies constitute a potentially rich and relatively cheap source of information. However, in such studies, treatment or intervention allocation may be strongly confounded by other important patient characteristics and much care is needed to disentangle observed relationships and infer causal effects.

In this course, we will provide an overview of modern statistical techniques for analyzing observational data. We will focus primarily on recent advances in the field of targeted learning, which facilitaties the use of state-of-the-art machine learning tools to flexibly adjust for confounding while yielding valid statistical inference. In contrast, conventional techniques for confounding adjustment rely on restrictive statistical models and may, therefore, lead to severely biased inference. Use of the Super Learner framework, an implementation of model stacking, will be discussed as a particularly appealing means of performing flexible, pre-specified adjustment for confounding.

We will discuss methods for comparative effectiveness studies for single time-point interventions. We will also introduce the multi time-point extension of these methods and discuss strategies for dealing with missing data. Methods will be illustrated using data from recent observational studies and extracted from electronic medical records. Analyses will be illustrated in R but knowledge of R is not required for this module. In addition to lectures, the course will include in-class, hands-on activities to allow students to familiarize themselves with the methods and tools.

The two-day course is geared towards health science researchers with at least basic experience in data analysis and statistics. A basic understanding of the following concepts would be helpful: confounding, probability (e.g., what is meant by the distribution of random variable, its mean and its variance), statistical inference (confidence intervals, hypothesis tests), and regression (linear and logistic). Advanced knowledge of these topics is useful, but not necessary. Equivalent UW SPH course pre-requisites are BIOS 511/512 (or BIOS 514/515). If you are interested in the course, but are unsure whether the level is right for you, please feel free to contact the instructors prior to registration to discuss further.