Data from multistage surveys such as NHANES and BRFSS has been important in health research for many years. More recently, two-phase sampling has allowed more efficient subsampling from existing cohorts and databases. This module will provide an overview of data analysis for complex surveys.
We will introduce the basic concepts that distinguish complex samples from more familiar data: clusters, strata, and weights. We will then cover basic summary statistics, exploratory analysis, and regression modelling for multistage surveys. Finally, we will briefly discuss two-phase sampling and the use of raking to bring in information from the whole cohort when analyzing a subsample
Mathematical details will be kept to a minimum, and there will be data examples with code in R for all topics and in Stata for most. Familiarity with linear and logistic regression will be assumed.