2024-07-12

Outline:

  • Covariate Adjustment Tutorials page
  • Worked Example: Binary Outcome - Standardization Estimator
    • Example Data: MISTIE III trial - Hemorrhagic Stroke
    • Outcome: Modified Rankin Score - Dichotomized
    • Unadjusted & Covariate Adjusted Analyses
    • Confidence Intervals: Nonparametric Bootstrap

Hands-On Tutorials for Covariate Adjustment

https://bit.ly/rct_tutorials

Tutorials on Covariate Adjustment

https://bit.ly/rct_tutorials

  • Simulated data: mimic features of trials - Scale, correlations, missingness
    • Continuous, Ordinal, Binary, Time-to-Event outcomes: Covariates
    • Example datasets: Substance Abuse, Hemorrhagic Stroke
    • Stratified randomization: improve precision
  • Common estimands of interest; Analytic approaches
  • R code for tabulating, plotting, analyzing data:
    • Unadjusted & Adjusted
  • Links to resources on learning and using R

Example: based on MISTIE III

Functional Outcome & Mortality in Hemorrhagic Stroke

MISTIE-III Trial: (Hanley et al. 2019)

  • Hemorrhagic Stroke: Greater morbidity, mortality than ischemic stroke
    • Intracerebral Hemorrhage (ICH), possibly with Intraventricular Hemorrhage (IVH)
  • Consent: monitor daily for ICH stability by CT
  • 1:1 randomized - minimally invasive surgery + thrombolytic vs. SOC medical management
  • Safety & Efficacy: Functional outcome on Modified Rankin Scale (MRS)
    • MRS at 30, 180, and 365 days post randomization
  • Good Outcome: MRS 0-3 vs. 4-6 - independent vs. not
  • Simulated data based on actual trial data: not actual study data.

Simulated MISTIE Data:

  • Codebook on covariateadjustment.github.io
  • Baseline Covariates
    • age: Age in years
    • male: male sex
    • hx_cvd: cardiovascular disease history
    • hx_hyperlipidemia: hyperlipidemia
    • on_anticoagulants: on anticoagulants
    • on_antiplatelets: on antiplatelets
    • ich_location: ICH: (Lobar, Deep)
    • ich_s_volume: ICH volume on stability scan
    • ivh_s_volume: IVH volume on stability scan
    • gcs_category: presenting Glasgow Coma Score

  • Treatment:
    • arm: treatment arm
    • ich_eot_volume: intracerebral hemorrhage volume on end-of-treatment scan
  • Outcome:
    • Modified Rankin: _complete: completely observed
    • mrs_30d: MRS at 30 days (0-3, 4, 5, 6)
    • mrs_180d: MRS at 180 days (0-2, 3, 4, 5, 6)
    • mrs_365d: MRS at 365 days (0-1, 2, 3, 4, 5, 6)
      • Primary Outcome
    • days_on_study: days until death/censoring
    • died_on_study: participant died (1) or censored (0)

Standardization Estimator

  • \(Y\) denotes the outcome: 1 = 1-Year MRS 0-3; 0 = 1-Year MRS 4-6
  • \(A\) denotes treatment assignment: 1 = Treatment, 0 = Control
  • Fit a regression model for the outcome:
    • Undjusted: \(logit(Pr\{Y = 1 \vert A \}) = \beta_{0} + \beta_{A}A\)
    • Adjusted: \(logit(Pr\{Y = 1 \vert A \}) = \beta_{0} + \beta_{A}A + \beta_{1}X_{1} + \ldots \beta_{p}X_{p}\)
  • Predict each individual’s outcome using the fitted model
    • \(\hat{y}^{(1)}_{i} = logit^{-1}\{\hat{\beta}_{0} + \hat{\beta}_{A} + \hat{\beta}_{1}X_{i1} + \ldots \hat{\beta}_{p}X_{ip}\}\)
    • \(\hat{y}^{(0)}_{i} = logit^{-1}\{\hat{\beta}_{0} + \hat{\beta}_{1}X_{i1} + \ldots \hat{\beta}_{p}X_{ip}\}\)
  • Average these predictions over the sample
    • \(\hat{\mu}^{(1)} = \frac{1}{n}\sum_{i=1}^{n}\hat{y}^{(1)}_{i} \qquad \hat{\mu}^{(0)} = \frac{1}{n}\sum_{i=1}^{n}\hat{y}^{(0)}_{i}\)
  • Contrast the averaged predictions
    • \(\hat{\theta}_{RD} = \hat{\mu}^{(1)} - \hat{\mu}^{(0)} \qquad \hat{\theta}_{RR} = \hat{\mu}^{(1)}/\hat{\mu}^{(0)}\)

Unadjusted Analysis

Logistic Regression

Fitting Logistic Model

mrs_unadjusted_logistic_glm <-
  stats::glm(
    formula = 
      mrs_356d_binary ~ arm,
    data = sim_miii,
    family = binomial(link = "logit")
  )

summary(mrs_unadjusted_logistic_glm)
## 
## Call:
## stats::glm(formula = mrs_356d_binary ~ arm, family = binomial(link = "logit"), 
##     data = sim_miii)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)  -0.1942     0.1276  -1.522    0.128
## armsurgical   0.1212     0.1803   0.673    0.501
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 684.01  on 494  degrees of freedom
## Residual deviance: 683.56  on 493  degrees of freedom
##   (5 observations deleted due to missingness)
## AIC: 687.56
## 
## Number of Fisher Scoring iterations: 3

Predict Outcomes

pr_outcome_unadj_control <-
  stats::predict(
    object = mrs_unadjusted_logistic_glm,
    newdata = 
      within(data = sim_miii,
             expr = {arm = "medical"}),
    type = "response"
  )

pr_outcome_unadj_treatment <-
  stats::predict(
    object = mrs_unadjusted_logistic_glm,
    newdata = 
      within(data = sim_miii,
             expr = {arm = "surgical"}),
    type = "response"
  )

table(pr_outcome_unadj_control)
## pr_outcome_unadj_control
## 0.451612903331144 
##               500
table(pr_outcome_unadj_treatment)
## pr_outcome_unadj_treatment
## 0.481781376518338 
##               500
  • Treatment is only covariate in model
  • Predictions are generated for each person assigning them to each treatment
  • Unadjusted predictions will be identical

Average & Contrast Predictions

e_y_0_unadj <- mean(pr_outcome_unadj_control)
e_y_1_unadj <- mean(pr_outcome_unadj_treatment)

# Risk Difference
e_y_1_unadj - e_y_0_unadj
## [1] 0.03016847
# Relative Risk
e_y_1_unadj/e_y_0_unadj
## [1] 1.066802
# Odds Ratio
(e_y_1_unadj*(1 - e_y_0_unadj))/
  (e_y_0_unadj*(1 - e_y_1_unadj))
## [1] 1.128906

  • Compare two counterfactual worlds using information from each arm:
    • All eligible patients in the population receive surgical intervention
    • All eligible patients in the population receive medical intervention
    • Probability of good outcome is 3% higher in population if everyone receives surgical intervention than if everyone received standard medical care.
    • Probability of “good outcome” is 7% greater: Ratio
    • Odds of a “good outcome” is 13% higher: Ratio
  • Odds Ratio overestimates relative risk: Outcome is not rare

Average & Contrast Predictions

e_y_0_unadj <- mean(pr_outcome_unadj_control)
e_y_1_unadj <- mean(pr_outcome_unadj_treatment)

# Risk Difference
e_y_1_unadj - e_y_0_unadj
## [1] 0.03016847
# Relative Risk
e_y_1_unadj/e_y_0_unadj
## [1] 1.066802
# Odds Ratio
(e_y_1_unadj*(1 - e_y_0_unadj))/
  (e_y_0_unadj*(1 - e_y_1_unadj))
## [1] 1.128906

unadj_glm_beta <- coef(mrs_unadjusted_logistic_glm)
pr_medical <-
  plogis(unadj_glm_beta["(Intercept)"])

pr_surgical <-
  plogis(unadj_glm_beta["(Intercept)"] +
           unadj_glm_beta["armsurgical"])

pr_surgical - pr_medical # Risk Difference
## (Intercept) 
##  0.03016847
pr_surgical/pr_medical # Relative Risk
## (Intercept) 
##    1.066802
exp(unadj_glm_beta["armsurgical"]) # Odds Ratio
## armsurgical 
##    1.128906

Compute CIs using Bootstrap

- For ate_binary() code, see the workshop materials

ate_unadj_boot <-
  boot::boot(
    data = sim_miii,
    statistic = ate_binary,
    R = 10000,
    formula =
      mrs_356d_binary ~ tx,
    link = "logit",
    tx_var = "tx"
  )

ate_unadj_results <-
  all_boot_cis(ate_unadj_boot)

Unadjusted Estimates: Standardization
Estimate SE Var LCL UCL CI Width
RD 0.03 0.04 0.0020 -0.06 0.12 0.18
RR 1.07 0.10 0.0107 0.88 1.29 0.41
OR 1.13 0.21 0.0433 0.80 1.62 0.83
E[Y|A=1] 0.48 0.03 0.0010 0.42 0.55 0.13
E[Y|A=0] 0.45 0.03 0.0010 0.39 0.51 0.12

Covariate adjusted Analysis

Standardization (also known as G-Computation)

Unadjusted vs Adjusted Standardization Estimates

  • Standardization Estimates vs. Logistic Regression Output
    • Same estimates of RD, RR, and OR
  • This will not be true in general for adjusted analyses:
    • Regression: conditional - Standardization: marginal
  • Only change necessary for adjusted analysis: Add covariates to model
    • All other steps identical

Adding Covariates to Logistic Model

mrs_adjusted_logistic_glm <-
  stats::glm(
    formula = 
      mrs_356d_binary ~ arm +
      age + 
      male +
      hx_cvd +
      hx_hyperlipidemia +
      on_anticoagulants +
      on_antiplatelets +
      ich_location +
      ich_s_volume +
      ivh_s_volume + 
      gcs_category,
    data = sim_miii,
    family =
      binomial(link = "logit")
  )
## 
## Call:
## stats::glm(formula = mrs_356d_binary ~ arm + age + male + hx_cvd + 
##     hx_hyperlipidemia + on_anticoagulants + on_antiplatelets + 
##     ich_location + ich_s_volume + ivh_s_volume + gcs_category, 
##     family = binomial(link = "logit"), data = sim_miii)
## 
## Coefficients:
##                                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                     1.766439   0.647076   2.730  0.00634 ** 
## armsurgical                    -0.005617   0.200224  -0.028  0.97762    
## age                            -0.038824   0.009150  -4.243 2.20e-05 ***
## male1. Male                     0.199103   0.216383   0.920  0.35750    
## hx_cvd1. Yes                   -0.472331   0.307549  -1.536  0.12459    
## hx_hyperlipidemia1. Yes        -0.398504   0.232548  -1.714  0.08659 .  
## on_anticoagulants1. Yes        -0.056718   0.461113  -0.123  0.90211    
## on_antiplatelets1. Yes          0.160736   0.254020   0.633  0.52689    
## ich_locationLobar               0.013685   0.224661   0.061  0.95143    
## ich_s_volume                   -0.013739   0.006431  -2.136  0.03264 *  
## ivh_s_volume                   -0.004274   0.029671  -0.144  0.88545    
## gcs_category2. Moderate (9-12)  1.229103   0.261272   4.704 2.55e-06 ***
## gcs_category3. Mild (13-15)     1.988115   0.288645   6.888 5.67e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 684.01  on 494  degrees of freedom
## Residual deviance: 591.52  on 482  degrees of freedom
##   (5 observations deleted due to missingness)
## AIC: 617.52
## 
## Number of Fisher Scoring iterations: 4

Covariate Adjusted Estimate: Standardization

ate_adj_boot <-
  boot::boot(
    data = sim_miii,
    statistic = ate_binary,
    # Number of Bootstrap Replicates
    R = 10000,
    formula = 
      mrs_356d_binary ~ tx +
      age + male + hx_cvd +
      hx_hyperlipidemia +
      on_anticoagulants +
      on_antiplatelets +
      ich_location +
      ich_s_volume +
      ivh_s_volume + 
      gcs_category,
    link = "logit",
    tx_var = "tx"
  )

ate_adj_results <-
  all_boot_cis(ate_adj_boot)
Adjusted Estimates: Standardization
Estimate SE Var LCL UCL CI Width
RD 0.00 0.04 0.0017 -0.08 0.08 0.16
RR 1.00 0.09 0.0078 0.84 1.19 0.35
OR 1.00 0.17 0.0280 0.72 1.38 0.67
E[Y|A=1] 0.47 0.03 0.0009 0.41 0.53 0.12
E[Y|A=0] 0.47 0.03 0.0009 0.41 0.53 0.12

Compare Results: Adjusted vs. Unadjusted

Changes in precision and variance from covariate adjustment.
Estimate Relative Efficiency Relative Change in Precision Relative Change in Variance Relative CI width
RD 1.17 0.169 -0.1 0.92
RR 1.37 0.366 -0.3 0.86
OR 1.55 0.546 -0.4 0.81
E[Y|A=1] 1.09 0.088 -0.1 0.94
E[Y|A=0] 1.08 0.079 -0.1 0.98

Summary

  • Standardization always gives a marginal treatment effect:
    • Same estimand/target with or without covariate adjustment
  • Logistic regression
    • No covariates: marginal treatment effect
    • Include covariates: conditional treatment effect
  • Conditional & Marginal rarely coincide except with a null effect or linear link
  • Bootstrap CI gives appropriate coverage: Important for testing
    • Does not assume logistic model is correctly specified
    • Adjusted analysis at least as efficient as unadjusted analysis asymptotically

References

Hanley, Daniel F, Richard E Thompson, Michael Rosenblum, Gayane Yenokyan, Karen Lane, Nichol McBee, Steven W Mayo, et al. 2019. “Efficacy and Safety of Minimally Invasive Surgery with Thrombolysis in Intracerebral Haemorrhage Evacuation ( MISTIE III): A Randomised, Controlled, Open-Label, Blinded Endpoint Phase 3 Trial.” The Lancet 393 (10175): 1021–32. https://doi.org/10.1016/s0140-6736(19)30195-3.