Modern Statistical Learning for Observational Data

Registration for this module is closed.

While clinical trials provide the highest level of evidence to compare clinical treatments or public health interventions, they are often not feasible due to ethical, logistic or economic constraints. Observational studies provide an opportunity to learn about the effect of interventions for which little or no trial data are available. These studies constitute a potentially rich and relatively cheap source of information. However, in such studies, treatment or intervention allocation may be strongly confounded by other important patient characteristics and much care is needed to disentangle observed relationships and infer causal effects.

In this course, we will provide an overview of modern statistical techniques for analyzing observational data. We will focus primarily on recent advances in the field of targeted learning, which facilitates the use of state-of-the-art machine learning tools to flexibly adjust for confounding while yielding valid statistical inference. In contrast, conventional techniques for confounding adjustment rely on restrictive statistical models and may, therefore, lead to severely biased inference. Use of the Super Learner framework, an implementation of model stacking, will be discussed as a particularly appealing means of performing flexible, pre-specified adjustment for confounding.

We will discuss methods for comparative effectiveness studies for single time-point interventions. We will also introduce the multi time-point extension of these methods and discuss strategies for dealing with missing data. Methods will be illustrated using data from recent observational studies and extracted from electronic medical records. Analyses will be illustrated in R but knowledge of R is not required for this module. In addition to lectures, the course will include in-class, hands-on activities to allow students to familiarize themselves with the methods and tools.

The four-day course is geared towards health science researchers with at least basic experience in data analysis and statistics. A basic understanding of the following concepts would be helpful: confounding, probability (e.g., what is meant by the distribution of random variable, its mean and its variance), statistical inference (confidence intervals, hypothesis tests), and regression (linear and logistic). Advanced knowledge of these topics is useful, but not necessary.

Regular Price	$825
Acad/Gov't/Non-Profit	$625

Summer Institutes

Department of Biostatistics

SISCER 2026 Module 11 Modern Statistical Learning for Observational Data

Module info

Marco Carone

Charles Wolock