diff --git a/NEWS.md b/NEWS.md index 9b05e62b..f1a23c61 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,6 +1,7 @@ # rbmi (development version) +* Include vignette on how to obtain frequentist and information-anchored inference with conditional mean imputation using `rbmi` * Added FAQ vignette # rbmi 1.2.6 diff --git a/vignettes/CondMean_Inference.Rmd b/vignettes/CondMean_Inference.Rmd new file mode 100644 index 00000000..4df01b45 --- /dev/null +++ b/vignettes/CondMean_Inference.Rmd @@ -0,0 +1,339 @@ +--- +title: "rbmi: Inference with Conditional Mean Imputation" +author: Alessandro Noci, Marcel Wolbers, Craig Gower-Page +output: + bookdown::html_document2: + toc: true + toc_depth: 4 + number_sections: true + citation_package: natbib + base_format: rmarkdown::html_vignette +bibliography: "references.bib" +link-citations: true +linkcolor: blue +pkgdown: + as_is: true +vignette: > + %\VignetteIndexEntry{rbmi: Inference with Conditional Mean Imputation} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>" +) +``` + +# Introduction + +As described in section 3.10.2 of the statistical specifications of the package (`vignette(topic = "stat_specs", package = "rbmi")`), two different types of variance estimators have been proposed for reference-based imputation methods in the statistical literature (@Bartlett2021). +The first is the frequentist variance which describes the actual repeated sampling variability of the estimator and results in inference which is correct in the frequentist sense, i.e. hypothesis tests have accurate type I error control and confidence intervals have correct coverage probabilities under repeated sampling if the reference-based assumption is correctly specified (@Bartlett2021, @Wolbers2021). +Reference-based missing data assumption are strong and borrow information from the control arm for imputation in the active arm. +As a consequence, the size of frequentist standard errors for treatment effects may decrease with increasing amounts of missing data. +The second is the so-called "information-anchored" variance which was originally proposed in the context of sensitivity analyses (@CroEtAl2019). This variance estimator is based on disentangling point estimation and variance estimation altogether. +The resulting information-anchored variance is typically very similar to the variance under missing-at-random (MAR) imputation and increases with increasing amounts of missing data at approximately the same rate as MAR imputation. +However, the information-anchored variance does not reflect the actual variability of the reference-based estimator and the resulting frequentist inference is highly conservative resulting in a substantial power loss. + +Reference-based conditional mean imputation combined with a resampling method such as the jackknife or the bootstrap was first introduced in @Wolbers2021. +This approach naturally targets the frequentist variance. The information-anchored variance is typically estimated using Rubin's rules for Bayesian multiple imputation which are not applicable within the conditional mean imputation framework. +However, an alternative information-anchored variance proposed by @Lu2021 can easily be obtained as we show below. +The basic idea of @Lu2021 is to obtain the information-anchored variance via a MAR imputation combined with a delta-adjustment where delta is selected in a data-driven way to match the reference-based estimator. +For conditional mean imputation, the proposal by @Lu2021 can be implemented by choosing the delta-adjustment as the difference between the conditional mean imputation under the chosen reference-based assumption and MAR on the original dataset. +The variance can then be obtained via the jackknife or the bootstrap while keeping the delta-adjustment fixed. The resulting variance estimate is very similar to Rubin's variance. +Moreover, as shown in @CroEtAl2019, the variance of MAR-imputation combined with a delta-adjustment achieves even better information-anchoring properties than Rubin's variance for reference-based imputation. +Reference-based missing data assumptions are strong and borrow information from the control arm for imputation in the active arm. + +This vignette demonstrates first how to obtain frequentist inference using reference-based conditional mean imputation using `rbmi`, and then shows that an information-anchored inference can also be easily implemented using the package. + +# Data and model specification + +We use a publicly available example dataset from an antidepressant clinical trial of an active drug versus placebo. +The relevant endpoint is the Hamilton 17-item depression rating scale (HAMD17) which was assessed at baseline and at weeks 1, 2, 4, and 6. +Study drug discontinuation occurred in 24% of subjects from the active drug and 26% of subjects from placebo. +All data after study drug discontinuation are missing and there is a single additional intermittent missing observation. + +We consider an imputation model with the mean change from baseline in the HAMD17 score as the outcome (variable CHANGE in the dataset). +The following covariates are included in the imputation model: the treatment group (THERAPY), the (categorical) visit (VISIT), treatment-by-visit interactions, the baseline HAMD17 score (BASVAL), and baseline HAMD17 score-by-visit interactions. +A common unstructured covariance matrix structure is assumed for both groups. The analysis model is an ANCOVA model with the treatment group as the primary factor and adjustment for the baseline HAMD17 score. +For this example, we assume that the imputation strategy after the ICE "study-drug discontinuation" is Jump To Reference (JR) for all subjects and the imputation is based on conditional mean imputation combined with jackknife resampling (but the bootstrap could also have been selected). + +# Reference-based conditional mean imputation - frequentist inference + +Conditional mean imputation combined with a resampling method such as jackknife or bootstrap naturally targets a frequentist estimation of the standard error of the treatment effect, thus providing a valid frequentist inference. +Here we provide the code to obtain frequentist inference for reference-based conditional mean imputation using `rbmi`. + +The code used in this section is almost identical to the code in the quickstart vignette (`vignette(topic = "quickstart", package = "rbmi")`) except that we use conditional mean imputation combined with the jackknife (`method_condmean(type = "jackknife")`) here rather than Bayesian multiple imputation (`method_bayes()`). +We therefore refer to that vignette and the help files for the individual functions for further explanations and details. + +## Draws {#draws} + +We will make use of `rbmi::expand_locf()` to expand the dataset in order to have one row per subject per visit with missing outcomes denoted as `NA`. We will then construct the `data_ice`, `vars` and `method` input arguments to the first core `rbmi` function, `draws()`. +Finally, we call the function `draws()` to derive the parameter estimates of the base imputation model for the full dataset and all leave-one-subject-out samples. + +```{r draws_condmean} +library(rbmi) +library(dplyr) + +dat <- antidepressant_data + +# Use expand_locf to add rows corresponding to visits with missing outcomes to +# the dataset +dat <- expand_locf( + dat, + PATIENT = levels(dat$PATIENT), # expand by PATIENT and VISIT + VISIT = levels(dat$VISIT), + vars = c("BASVAL", "THERAPY"), # fill with LOCF BASVAL and THERAPY + group = c("PATIENT"), + order = c("PATIENT", "VISIT") +) + +# create data_ice and set the imputation strategy to JR for +# each patient with at least one missing observation +dat_ice <- dat %>% + arrange(PATIENT, VISIT) %>% + filter(is.na(CHANGE)) %>% + group_by(PATIENT) %>% + slice(1) %>% + ungroup() %>% + select(PATIENT, VISIT) %>% + mutate(strategy = "JR") + +# In this dataset, subject 3618 has an intermittent missing values which +# does not correspond to a study drug discontinuation. We therefore remove +# this subject from `dat_ice`. (In the later imputation step, it will +# automatically be imputed under the default MAR assumption.) +dat_ice <- dat_ice[-which(dat_ice$PATIENT == 3618),] + +# Define the names of key variables in our dataset and +# the covariates included in the imputation model using `set_vars()` +vars <- set_vars( + outcome = "CHANGE", + visit = "VISIT", + subjid = "PATIENT", + group = "THERAPY", + covariates = c("BASVAL*VISIT", "THERAPY*VISIT") +) + +# Define which imputation method to use (here: conditional mean imputation +# with jackknife as resampling) +method <- method_condmean(type = "jackknife") + +# Create samples for the imputation parameters by running the draws() function +drawObj <- draws( + data = dat, + data_ice = dat_ice, + vars = vars, + method = method, + quiet = TRUE +) +drawObj +``` + +## Impute + +We can use now the function `impute()` to perform the imputation of the original dataset and of each leave-one-out samples using the results obtained at the previous step. + +```{r} +references <- c("DRUG" = "PLACEBO", "PLACEBO" = "PLACEBO") +imputeObj <- impute(drawObj, references) +imputeObj +``` + +## Analyse + +Once the datasets have been imputed, we can call the `analyse()` function to apply the complete-data analysis model (here ANCOVA) to each imputed dataset. + +```{r} + +# Set analysis variables using rbmi function "set_vars" +vars_an <- set_vars( + group = vars$group, + visit = vars$visit, + outcome = vars$outcome, + covariates = "BASVAL" +) + +# Analyse MAR imputation with derived delta adjustment +anaObj <- analyse( + imputeObj, + rbmi::ancova, + vars = vars_an +) +anaObj +``` + +## Pool + +Finally, we can extract the treatment effect estimates and perform inference using the jackknife variance estimator. This is done by calling the `pool()` function. + +```{r} +poolObj <- pool(anaObj) +poolObj +``` + +This gives an estimated treatment effect of +`r paste(formatC(poolObj$pars$trt_7$est,format="f",digits=2)," (95% CI ", + formatC(poolObj$pars$trt_7$ci[1],format="f",digits=2), " to ", + formatC(poolObj$pars$trt_7$ci[2],format="f",digits=2),")",sep="")` +at the last visit with an associated p-value of `r formatC(poolObj$pars$trt_7$pval,format="f",digits=3)`. + +# Reference-based conditional mean imputation - information-anchored inference + +In this section, we present how the estimation process based on conditional mean imputation combined with the jackknife can be adapted to obtain an information-anchored variance following the proposal by @Lu2021. + +## Draws + +The code for the pre-processing of the dataset and for the "draws" step is equivalent to the code provided for the frequentist inference. Please refer to [that section](#draws) for details about this step. + +```{r, eval=FALSE} + +<> + +``` + +## Imputation step including calculation of delta-adjustment + +The proposal by @Lu2021 is to replace the reference-based imputation by a MAR imputation combined with a delta-adjustment where delta is selected in a data-driven way to match the reference-based estimator. +In `rbmi`, this is implemented by first performing the imputation under the defined reference-based imputation strategy (here JR) as well as under MAR separately. +Second, the delta-adjustment is defined as the difference between the conditional mean imputation under reference-based and MAR imputation, respectively, on the original dataset. + +To simplify the implementation, we have written a function `get_delta_match_refBased` that performs this step. +The function takes as input arguments the `draws` object, `data_ice` (i.e. the `data.frame` containing the information about the intercurrent events and the imputation strategies), and `references`, a named vector that identifies the references to be used for reference-based imputation methods. +The function returns a list containing the imputation objects under both reference-based and MAR imputation, plus a `data.frame` which contains the delta-adjustment. + +```{r} + +#' Get delta adjustment that matches reference-based imputation +#' +#' @param draws: A `draws` object created by `draws()`. +#' @param data_ice: `data.frame` containing the information about the intercurrent +#' events and the imputation strategies. Must represent the desired imputation +#' strategy and not the MAR-variant. +#' @param references: A named vector. Identifies the references to be used +#' for reference-based imputation methods. +#' +#' @return +#' The function returns a list containing the imputation objects under both +#' reference-based and MAR imputation, plus a `data.frame` which contains the +#' delta-adjustment. +#' +#' @seealso `draws()`, `impute()`. +get_delta_match_refBased <- function(draws, data_ice, references) { + + # Impute according to `data_ice` + imputeObj <- impute( + draws = drawObj, + update_strategy = data_ice, + references = references + ) + + vars <- imputeObj$data$vars + + # Access imputed dataset (index=1 for method_condmean(type = "jackknife")) + cmi <- extract_imputed_dfs(imputeObj, index = 1, idmap = TRUE)[[1]] + idmap <- attributes(cmi)$idmap + cmi <- cmi[, c(vars$subjid, vars$visit, vars$outcome)] + colnames(cmi)[colnames(cmi) == vars$outcome] <- "y_imp" + + # Map back original patients id since rbmi re-code ids to ensure id uniqueness + + cmi[[vars$subjid]] <- idmap[match(cmi[[vars$subjid]], names(idmap))] + + # Derive conditional mean imputations under MAR + dat_ice_MAR <- data_ice + dat_ice_MAR[[vars$strategy]] <- "MAR" + + # Impute under MAR + # Note that in this specific context, it is desirable that an update + + # from a reference-based strategy to MAR uses the exact same data for + # fitting the imputation models, i.e. that available post-ICE data are + # omitted from the imputation model for both. This is the case when + # using argument update_strategy in function impute(). + # However, for other settings (i.e. if one is interested in switching to + # a standard MAR imputation strategy altogether), this behavior is + # undesirable and, consequently, the function throws a warning which + # we suppress here. + suppressWarnings( + imputeObj_MAR <- impute( + draws, + update_strategy = dat_ice_MAR + ) + ) + + # Access imputed dataset (index=1 for method_condmean(type = "jackknife")) + cmi_MAR <- extract_imputed_dfs(imputeObj_MAR, index = 1, idmap = TRUE)[[1]] + idmap <- attributes(cmi_MAR)$idmap + cmi_MAR <- cmi_MAR[, c(vars$subjid, vars$visit, vars$outcome)] + colnames(cmi_MAR)[colnames(cmi_MAR) == vars$outcome] <- "y_MAR" + + # Map back original patients id since rbmi re-code ids to ensure id uniqueness + cmi_MAR[[vars$subjid]] <- idmap[match(cmi_MAR[[vars$subjid]], names(idmap))] + + # Derive delta adjustment "aligned with ref-based imputation", + # i.e. difference between ref-based imputation and MAR imputation + delta_adjust <- merge(cmi, cmi_MAR, by = c(vars$subjid, vars$visit), all = TRUE) + delta_adjust$delta <- delta_adjust$y_imp - delta_adjust$y_MAR + + ret_obj <- list( + imputeObj = imputeObj, + imputeObj_MAR = imputeObj_MAR, + delta_adjust = delta_adjust + ) + + return(ret_obj) +} + +references <- c("DRUG" = "PLACEBO", "PLACEBO" = "PLACEBO") + +res_delta_adjust <- get_delta_match_refBased(drawObj, dat_ice, references) + +``` + +## Analyse + +We use the function `analyse()` to add the delta-adjustment and perform the analysis of the imputed datasets under MAR. +`analyse()` will take as the input argument `imputations = res_delta_adjust$imputeObj_MAR`, i.e. the imputation object corresponding to the MAR imputation (and not the JR imputation). +The argument `delta` can be used to add a delta-adjustment prior to the analysis and we set this to the delta-adjustment obtained in the previous step: `delta = res_delta_adjust$delta_adjust`. + +```{r} + +# Set analysis variables using rbmi function "set_vars" +vars_an <- set_vars( + group = vars$group, + visit = vars$visit, + outcome = vars$outcome, + covariates = "BASVAL" +) + +# Analyse MAR imputation with derived delta adjustment +anaObj_MAR_delta <- analyse( + res_delta_adjust$imputeObj_MAR, + rbmi::ancova, + delta = res_delta_adjust$delta_adjust, + vars = vars_an +) +``` + +## Pool + +We can finally use the `pool()` function to extract the treatment effect estimate (as well as the estimated marginal means) at each visit and apply the jackknife variance estimator to the analysis estimates from all the imputed leave-one-out samples. + +```{r} + +poolObj_MAR_delta <- pool(anaObj_MAR_delta) +poolObj_MAR_delta +``` + +This gives an estimated treatment effect of +`r paste(formatC(poolObj_MAR_delta$pars$trt_7$est,format="f",digits=2)," (95% CI ", + formatC(poolObj_MAR_delta$pars$trt_7$ci[1],format="f",digits=2), " to ", + formatC(poolObj_MAR_delta$pars$trt_7$ci[2],format="f",digits=2),")",sep="")` +at the last visit with an associated p-value of `r formatC(poolObj_MAR_delta$pars$trt_7$pval,format="f",digits=3)`. +Per construction of the delta-adjustment, the point estimate is identical to the frequentist analysis. However, its standard error is much larger (`r formatC(poolObj_MAR_delta$pars$trt_7$se,format="f",digits=2)` vs. `r formatC(poolObj$pars$trt_7$se,format="f",digits=2)`). +Indeed, the information-anchored standard error (and the resulting inference) is very similar to the results for Baysesian multiple imputation using Rubin's rules for which a standard error of `r 1.13` was reported in the quickstart vignette (`vignette(topic = "quickstart", package = "rbmi"`). +Of note, as shown e.g. in @Wolbers2021, hypothesis testing based on the information-anchored inference is very conservative, i.e. the actual type I error is much lower than the nominal value. Hence, confidence intervals and $p$-values based on information-anchored inference should be interpreted with caution. + +# References {.unlisted .unnumbered} \ No newline at end of file diff --git a/vignettes/CondMean_Inference.html b/vignettes/CondMean_Inference.html new file mode 100644 index 00000000..4ca09405 --- /dev/null +++ b/vignettes/CondMean_Inference.html @@ -0,0 +1,834 @@ + + + + + + + + + + + + + + + +rbmi: Inference with Conditional Mean Imputation + + + + + + + + + + + + + + + + + + + + + + + + + + +

rbmi: Inference with Conditional Mean Imputation

+

Alessandro Noci, Marcel Wolbers, Craig Gower-Page

+ + +
+ +
+ +
+

1 Introduction

+

As described in section 3.10.2 of the statistical specifications of the package (vignette(topic = "stat_specs", package = "rbmi")), two different types of variance estimators have been proposed for reference-based imputation methods in the statistical literature (Bartlett (2023)). +The first is the frequentist variance which describes the actual repeated sampling variability of the estimator and results in inference which is correct in the frequentist sense, i.e. hypothesis tests have accurate type I error control and confidence intervals have correct coverage probabilities under repeated sampling if the reference-based assumption is correctly specified (Bartlett (2023), Wolbers et al. (2022)). +Reference-based missing data assumption are strong and borrow information from the control arm for imputation in the active arm. +As a consequence, the size of frequentist standard errors for treatment effects may decrease with increasing amounts of missing data. +The second is the so-called “information-anchored” variance which was originally proposed in the context of sensitivity analyses (Cro, Carpenter, and Kenward (2019)). This variance estimator is based on disentangling point estimation and variance estimation altogether. +The resulting information-anchored variance is typically very similar to the variance under missing-at-random (MAR) imputation and increases with increasing amounts of missing data at approximately the same rate as MAR imputation. +However, the information-anchored variance does not reflect the actual variability of the reference-based estimator and the resulting frequentist inference is highly conservative resulting in a substantial power loss.

+

Reference-based conditional mean imputation combined with a resampling method such as the jackknife or the bootstrap was first introduced in Wolbers et al. (2022). +This approach naturally targets the frequentist variance. The information-anchored variance is typically estimated using Rubin’s rules for Bayesian multiple imputation which are not applicable within the conditional mean imputation framework. +However, an alternative information-anchored variance proposed by Lu (2021) can easily be obtained as we show below. +The basic idea of Lu (2021) is to obtain the information-anchored variance via a MAR imputation combined with a delta-adjustment where delta is selected in a data-driven way to match the reference-based estimator. +For conditional mean imputation, the proposal by Lu (2021) can be implemented by choosing the delta-adjustment as the difference between the conditional mean imputation under the chosen reference-based assumption and MAR on the original dataset. +The variance can then be obtained via the jackknife or the bootstrap while keeping the delta-adjustment fixed. The resulting variance estimate is very similar to Rubin’s variance. +Moreover, as shown in Cro, Carpenter, and Kenward (2019), the variance of MAR-imputation combined with a delta-adjustment achieves even better information-anchoring properties than Rubin’s variance for reference-based imputation. +Reference-based missing data assumptions are strong and borrow information from the control arm for imputation in the active arm.

+

This vignette demonstrates first how to obtain frequentist inference using reference-based conditional mean imputation using rbmi, and then shows that an information-anchored inference can also be easily implemented using the package.

+
+
+

2 Data and model specification

+

We use a publicly available example dataset from an antidepressant clinical trial of an active drug versus placebo. +The relevant endpoint is the Hamilton 17-item depression rating scale (HAMD17) which was assessed at baseline and at weeks 1, 2, 4, and 6. +Study drug discontinuation occurred in 24% of subjects from the active drug and 26% of subjects from placebo. +All data after study drug discontinuation are missing and there is a single additional intermittent missing observation.

+

We consider an imputation model with the mean change from baseline in the HAMD17 score as the outcome (variable CHANGE in the dataset). +The following covariates are included in the imputation model: the treatment group (THERAPY), the (categorical) visit (VISIT), treatment-by-visit interactions, the baseline HAMD17 score (BASVAL), and baseline HAMD17 score-by-visit interactions. +A common unstructured covariance matrix structure is assumed for both groups. The analysis model is an ANCOVA model with the treatment group as the primary factor and adjustment for the baseline HAMD17 score. +For this example, we assume that the imputation strategy after the ICE “study-drug discontinuation” is Jump To Reference (JR) for all subjects and the imputation is based on conditional mean imputation combined with jackknife resampling (but the bootstrap could also have been selected).

+
+
+

3 Reference-based conditional mean imputation - frequentist inference

+

Conditional mean imputation combined with a resampling method such as jackknife or bootstrap naturally targets a frequentist estimation of the standard error of the treatment effect, thus providing a valid frequentist inference. +Here we provide the code to obtain frequentist inference for reference-based conditional mean imputation using rbmi.

+

The code used in this section is almost identical to the code in the quickstart vignette (vignette(topic = "quickstart", package = "rbmi")) except that we use conditional mean imputation combined with the jackknife (method_condmean(type = "jackknife")) here rather than Bayesian multiple imputation (method_bayes()). +We therefore refer to that vignette and the help files for the individual functions for further explanations and details.

+
+

3.1 Draws

+

We will make use of rbmi::expand_locf() to expand the dataset in order to have one row per subject per visit with missing outcomes denoted as NA. We will then construct the data_ice, vars and method input arguments to the first core rbmi function, draws(). +Finally, we call the function draws() to derive the parameter estimates of the base imputation model for the full dataset and all leave-one-subject-out samples.

+
library(rbmi)
+library(dplyr)
+
+dat <- antidepressant_data
+
+# Use expand_locf to add rows corresponding to visits with missing outcomes to
+# the dataset
+dat <- expand_locf(
+  dat,
+  PATIENT = levels(dat$PATIENT), # expand by PATIENT and VISIT 
+  VISIT = levels(dat$VISIT),
+  vars = c("BASVAL", "THERAPY"), # fill with LOCF BASVAL and THERAPY
+  group = c("PATIENT"),
+  order = c("PATIENT", "VISIT")
+)
+
+# create data_ice and set the imputation strategy to JR for
+# each patient with at least one missing observation
+dat_ice <- dat %>% 
+  arrange(PATIENT, VISIT) %>% 
+  filter(is.na(CHANGE)) %>% 
+  group_by(PATIENT) %>% 
+  slice(1) %>%
+  ungroup() %>% 
+  select(PATIENT, VISIT) %>% 
+  mutate(strategy = "JR")
+
+# In this dataset, subject 3618 has an intermittent missing values which
+# does not correspond to a study drug discontinuation. We therefore remove
+# this subject from `dat_ice`. (In the later imputation step, it will
+# automatically be imputed under the default MAR assumption.)
+dat_ice <- dat_ice[-which(dat_ice$PATIENT == 3618),]
+
+# Define the names of key variables in our dataset and
+# the covariates included in the imputation model using `set_vars()`
+vars <- set_vars(
+  outcome = "CHANGE",
+  visit = "VISIT",
+  subjid = "PATIENT",
+  group = "THERAPY",
+  covariates = c("BASVAL*VISIT", "THERAPY*VISIT")
+)
+
+# Define which imputation method to use (here: conditional mean imputation
+# with jackknife as resampling) 
+method <- method_condmean(type = "jackknife")
+
+# Create samples for the imputation parameters by running the draws() function
+drawObj <- draws(
+  data = dat,
+  data_ice = dat_ice,
+  vars = vars,
+  method = method,
+  quiet = TRUE
+)
+drawObj
+#> 
+#> Draws Object
+#> ------------
+#> Number of Samples: 1 + 172
+#> Number of Failed Samples: 0
+#> Model Formula: CHANGE ~ 1 + THERAPY + VISIT + BASVAL * VISIT + THERAPY * VISIT
+#> Imputation Type: condmean
+#> Method:
+#>     name: Conditional Mean
+#>     covariance: us
+#>     threshold: 0.01
+#>     same_cov: TRUE
+#>     REML: TRUE
+#>     type: jackknife
+
+
+

3.2 Impute

+

We can use now the function impute() to perform the imputation of the original dataset and of each leave-one-out samples using the results obtained at the previous step.

+
references <- c("DRUG" = "PLACEBO", "PLACEBO" = "PLACEBO")
+imputeObj <- impute(drawObj, references)
+imputeObj
+#> 
+#> Imputation Object
+#> -----------------
+#> Number of Imputed Datasets: 1 + 172
+#> Fraction of Missing Data (Original Dataset):
+#>     4:   0%
+#>     5:   8%
+#>     6:  13%
+#>     7:  25%
+#> References:
+#>     DRUG    -> PLACEBO
+#>     PLACEBO -> PLACEBO
+
+
+

3.3 Analyse

+

Once the datasets have been imputed, we can call the analyse() function to apply the complete-data analysis model (here ANCOVA) to each imputed dataset.

+

+# Set analysis variables using rbmi function "set_vars"
+vars_an <- set_vars(
+  group = vars$group,
+  visit = vars$visit,
+  outcome = vars$outcome,
+  covariates = "BASVAL"
+)
+
+# Analyse MAR imputation with derived delta adjustment
+anaObj <- analyse(
+  imputeObj,
+  rbmi::ancova,
+  vars = vars_an
+)
+anaObj
+#> 
+#> Analysis Object
+#> ---------------
+#> Number of Results: 1 + 172
+#> Analysis Function: rbmi::ancova
+#> Delta Applied: FALSE
+#> Analysis Estimates:
+#>     trt_4
+#>     lsm_ref_4
+#>     lsm_alt_4
+#>     trt_5
+#>     lsm_ref_5
+#>     lsm_alt_5
+#>     trt_6
+#>     lsm_ref_6
+#>     lsm_alt_6
+#>     trt_7
+#>     lsm_ref_7
+#>     lsm_alt_7
+
+
+

3.4 Pool

+

Finally, we can extract the treatment effect estimates and perform inference using the jackknife variance estimator. This is done by calling the pool() function.

+
poolObj <- pool(anaObj)
+poolObj
+#> 
+#> Pool Object
+#> -----------
+#> Number of Results Combined: 1 + 172
+#> Method: jackknife
+#> Confidence Level: 0.95
+#> Alternative: two.sided
+#> 
+#> Results:
+#> 
+#>   ==================================================
+#>    parameter   est     se     lci     uci     pval  
+#>   --------------------------------------------------
+#>      trt_4    -0.092  0.695  -1.453   1.27   0.895  
+#>    lsm_ref_4  -1.616  0.588  -2.767  -0.464  0.006  
+#>    lsm_alt_4  -1.708  0.396  -2.484  -0.931  <0.001 
+#>      trt_5    1.305   0.878  -0.416  3.027   0.137  
+#>    lsm_ref_5  -4.133  0.688  -5.481  -2.785  <0.001 
+#>    lsm_alt_5  -2.828  0.604  -4.011  -1.645  <0.001 
+#>      trt_6    1.929   0.862  0.239   3.619   0.025  
+#>    lsm_ref_6  -6.088  0.671  -7.402  -4.773  <0.001 
+#>    lsm_alt_6  -4.159  0.686  -5.503  -2.815  <0.001 
+#>      trt_7    2.126   0.858  0.444   3.807   0.013  
+#>    lsm_ref_7  -6.965  0.685  -8.307  -5.622  <0.001 
+#>    lsm_alt_7  -4.839  0.762  -6.333  -3.346  <0.001 
+#>   --------------------------------------------------
+

This gives an estimated treatment effect of +2.13 (95% CI 0.44 to 3.81) +at the last visit with an associated p-value of 0.013.

+
+
+
+

4 Reference-based conditional mean imputation - information-anchored inference

+

In this section, we present how the estimation process based on conditional mean imputation combined with the jackknife can be adapted to obtain an information-anchored variance following the proposal by Lu (2021).

+
+

4.1 Draws

+

The code for the pre-processing of the dataset and for the “draws” step is equivalent to the code provided for the frequentist inference. Please refer to that section for details about this step.

+

+library(rbmi)
+library(dplyr)
+
+dat <- antidepressant_data
+
+# Use expand_locf to add rows corresponding to visits with missing outcomes to
+# the dataset
+dat <- expand_locf(
+  dat,
+  PATIENT = levels(dat$PATIENT), # expand by PATIENT and VISIT 
+  VISIT = levels(dat$VISIT),
+  vars = c("BASVAL", "THERAPY"), # fill with LOCF BASVAL and THERAPY
+  group = c("PATIENT"),
+  order = c("PATIENT", "VISIT")
+)
+
+# create data_ice and set the imputation strategy to JR for
+# each patient with at least one missing observation
+dat_ice <- dat %>% 
+  arrange(PATIENT, VISIT) %>% 
+  filter(is.na(CHANGE)) %>% 
+  group_by(PATIENT) %>% 
+  slice(1) %>%
+  ungroup() %>% 
+  select(PATIENT, VISIT) %>% 
+  mutate(strategy = "JR")
+
+# In this dataset, subject 3618 has an intermittent missing values which
+# does not correspond to a study drug discontinuation. We therefore remove
+# this subject from `dat_ice`. (In the later imputation step, it will
+# automatically be imputed under the default MAR assumption.)
+dat_ice <- dat_ice[-which(dat_ice$PATIENT == 3618),]
+
+# Define the names of key variables in our dataset and
+# the covariates included in the imputation model using `set_vars()`
+vars <- set_vars(
+  outcome = "CHANGE",
+  visit = "VISIT",
+  subjid = "PATIENT",
+  group = "THERAPY",
+  covariates = c("BASVAL*VISIT", "THERAPY*VISIT")
+)
+
+# Define which imputation method to use (here: conditional mean imputation
+# with jackknife as resampling) 
+method <- method_condmean(type = "jackknife")
+
+# Create samples for the imputation parameters by running the draws() function
+drawObj <- draws(
+  data = dat,
+  data_ice = dat_ice,
+  vars = vars,
+  method = method,
+  quiet = TRUE
+)
+drawObj
+
+
+

4.2 Imputation step including calculation of delta-adjustment

+

The proposal by Lu (2021) is to replace the reference-based imputation by a MAR imputation combined with a delta-adjustment where delta is selected in a data-driven way to match the reference-based estimator. +In rbmi, this is implemented by first performing the imputation under the defined reference-based imputation strategy (here JR) as well as under MAR separately. +Second, the delta-adjustment is defined as the difference between the conditional mean imputation under reference-based and MAR imputation, respectively, on the original dataset.

+

To simplify the implementation, we have written a function get_delta_match_refBased that performs this step. +The function takes as input arguments the draws object, data_ice (i.e. the data.frame containing the information about the intercurrent events and the imputation strategies), and references, a named vector that identifies the references to be used for reference-based imputation methods. +The function returns a list containing the imputation objects under both reference-based and MAR imputation, plus a data.frame which contains the delta-adjustment.

+

+#' Get delta adjustment that matches reference-based imputation
+#' 
+#' @param draws: A `draws` object created by `draws()`.
+#' @param data_ice: `data.frame` containing the information about the intercurrent
+#' events and the imputation strategies. Must represent the desired imputation
+#' strategy and not the MAR-variant.
+#' @param references: A named vector. Identifies the references to be used
+#' for reference-based imputation methods.
+#' 
+#' @return 
+#' The function returns a list containing the imputation objects under both
+#' reference-based and MAR imputation, plus a `data.frame` which contains the
+#' delta-adjustment.
+#' 
+#' @seealso `draws()`, `impute()`.
+get_delta_match_refBased <- function(draws, data_ice, references) {
+  
+  # Impute according to `data_ice`
+  imputeObj <- impute(
+    draws = drawObj,
+    update_strategy = data_ice,
+    references = references
+  )
+  
+  vars <- imputeObj$data$vars
+  
+  # Access imputed dataset (index=1 for method_condmean(type = "jackknife"))
+  cmi <- extract_imputed_dfs(imputeObj, index = 1, idmap = TRUE)[[1]]
+  idmap <- attributes(cmi)$idmap
+  cmi <- cmi[, c(vars$subjid, vars$visit, vars$outcome)]
+  colnames(cmi)[colnames(cmi) == vars$outcome] <- "y_imp"
+  
+  # Map back original patients id since rbmi re-code ids to ensure id uniqueness
+  
+  cmi[[vars$subjid]] <- idmap[match(cmi[[vars$subjid]], names(idmap))]
+  
+  # Derive conditional mean imputations under MAR
+  dat_ice_MAR <- data_ice 
+  dat_ice_MAR[[vars$strategy]] <- "MAR"
+  
+  # Impute under MAR 
+  # Note that in this specific context, it is desirable that an update   
+
+  # from a reference-based strategy to MAR uses the exact same data for 
+  # fitting the imputation models, i.e. that available post-ICE data are 
+  # omitted from the imputation model for both. This is the case when    
+  # using argument update_strategy in function impute(). 
+  # However, for other settings (i.e. if one is interested in switching to
+  # a standard MAR imputation strategy altogether), this behavior is  
+  # undesirable and, consequently, the function throws a warning which 
+  # we suppress here. 
+  suppressWarnings(
+    imputeObj_MAR <- impute(
+      draws,
+      update_strategy = dat_ice_MAR
+    )
+  ) 
+  
+  # Access imputed dataset (index=1 for method_condmean(type = "jackknife"))
+  cmi_MAR <- extract_imputed_dfs(imputeObj_MAR, index = 1, idmap = TRUE)[[1]]
+  idmap <- attributes(cmi_MAR)$idmap
+  cmi_MAR <- cmi_MAR[, c(vars$subjid, vars$visit, vars$outcome)]
+  colnames(cmi_MAR)[colnames(cmi_MAR) == vars$outcome] <- "y_MAR"
+  
+  # Map back original patients id since rbmi re-code ids to ensure id uniqueness
+  cmi_MAR[[vars$subjid]] <- idmap[match(cmi_MAR[[vars$subjid]], names(idmap))]
+  
+  # Derive delta adjustment "aligned with ref-based imputation",
+  # i.e. difference between ref-based imputation and MAR imputation
+  delta_adjust <- merge(cmi, cmi_MAR, by = c(vars$subjid, vars$visit), all = TRUE)
+  delta_adjust$delta <- delta_adjust$y_imp - delta_adjust$y_MAR
+
+  ret_obj <- list(
+    imputeObj = imputeObj,
+    imputeObj_MAR = imputeObj_MAR,
+    delta_adjust = delta_adjust
+  )
+  
+  return(ret_obj)
+}
+
+references <- c("DRUG" = "PLACEBO", "PLACEBO" = "PLACEBO")
+
+res_delta_adjust <- get_delta_match_refBased(drawObj, dat_ice, references)
+
+
+

4.3 Analyse

+

We use the function analyse() to add the delta-adjustment and perform the analysis of the imputed datasets under MAR. +analyse() will take as the input argument imputations = res_delta_adjust$imputeObj_MAR, i.e. the imputation object corresponding to the MAR imputation (and not the JR imputation). +The argument delta can be used to add a delta-adjustment prior to the analysis and we set this to the delta-adjustment obtained in the previous step: delta = res_delta_adjust$delta_adjust.

+

+# Set analysis variables using rbmi function "set_vars"
+vars_an <- set_vars(
+  group = vars$group,
+  visit = vars$visit,
+  outcome = vars$outcome,
+  covariates = "BASVAL"
+)
+
+# Analyse MAR imputation with derived delta adjustment
+anaObj_MAR_delta <- analyse(
+  res_delta_adjust$imputeObj_MAR,
+  rbmi::ancova,
+  delta = res_delta_adjust$delta_adjust,
+  vars = vars_an
+)
+
+
+

4.4 Pool

+

We can finally use the pool() function to extract the treatment effect estimate (as well as the estimated marginal means) at each visit and apply the jackknife variance estimator to the analysis estimates from all the imputed leave-one-out samples.

+

+poolObj_MAR_delta <- pool(anaObj_MAR_delta)
+poolObj_MAR_delta
+#> 
+#> Pool Object
+#> -----------
+#> Number of Results Combined: 1 + 172
+#> Method: jackknife
+#> Confidence Level: 0.95
+#> Alternative: two.sided
+#> 
+#> Results:
+#> 
+#>   ==================================================
+#>    parameter   est     se     lci     uci     pval  
+#>   --------------------------------------------------
+#>      trt_4    -0.092  0.695  -1.453   1.27   0.895  
+#>    lsm_ref_4  -1.616  0.588  -2.767  -0.464  0.006  
+#>    lsm_alt_4  -1.708  0.396  -2.484  -0.931  <0.001 
+#>      trt_5    1.305   0.944  -0.545  3.156   0.167  
+#>    lsm_ref_5  -4.133  0.738  -5.579  -2.687  <0.001 
+#>    lsm_alt_5  -2.828  0.603  -4.01   -1.646  <0.001 
+#>      trt_6    1.929   0.993  -0.018  3.876   0.052  
+#>    lsm_ref_6  -6.088  0.758  -7.574  -4.602  <0.001 
+#>    lsm_alt_6  -4.159  0.686  -5.504  -2.813  <0.001 
+#>      trt_7    2.126   1.123  -0.076  4.327   0.058  
+#>    lsm_ref_7  -6.965  0.85   -8.63   -5.299  <0.001 
+#>    lsm_alt_7  -4.839  0.763  -6.335  -3.343  <0.001 
+#>   --------------------------------------------------
+

This gives an estimated treatment effect of +2.13 (95% CI -0.08 to 4.33) +at the last visit with an associated p-value of 0.058. +Per construction of the delta-adjustment, the point estimate is identical to the frequentist analysis. However, its standard error is much larger (1.12 vs. 0.86). +Indeed, the information-anchored standard error (and the resulting inference) is very similar to the results for Baysesian multiple imputation using Rubin’s rules for which a standard error of 1.13 was reported in the quickstart vignette (vignette(topic = "quickstart", package = "rbmi"). +Of note, as shown e.g. in Wolbers et al. (2022), hypothesis testing based on the information-anchored inference is very conservative, i.e. the actual type I error is much lower than the nominal value. Hence, confidence intervals and \(p\)-values based on information-anchored inference should be interpreted with caution.

+
+
+
+

References

+
+
+Bartlett, Jonathan W. 2023. “Reference-Based Multiple Imputation - What Is the Right Variance and How to Estimate It.” Statistics in Biopharmaceutical Research 15 (1): 178–86. +
+
+Cro, Suzie, James R Carpenter, and Michael G Kenward. 2019. “Information-Anchored Sensitivity Analysis: Theory and Application.” Journal of the Royal Statistical Society: Series A (Statistics in Society) 182 (2): 623–45. +
+
+Lu, Kaifeng. 2021. “An Alternative Implementation of Reference-Based Controlled Imputation Procedures.” Statistics in Biopharmaceutical Research 13 (4): 483–91. +
+
+Wolbers, Marcel, Alessandro Noci, Paul Delmar, Craig Gower-Page, Sean Yiu, and Jonathan W Bartlett. 2022. “Standard and Reference-Based Conditional Mean Imputation.” Pharmaceutical Statistics 21 (6): 1246–57. +
+
+
+ + + + + + + + + + + diff --git a/vignettes/CondMean_Inference.html.asis b/vignettes/CondMean_Inference.html.asis new file mode 100644 index 00000000..09b9d84a --- /dev/null +++ b/vignettes/CondMean_Inference.html.asis @@ -0,0 +1,2 @@ +%\VignetteIndexEntry{rbmi: Inference with Conditional Mean Imputation} +%\VignetteEngine{R.rsp::asis} \ No newline at end of file diff --git a/vignettes/build.R b/vignettes/build.R index 5ac313ce..e5361697 100644 --- a/vignettes/build.R +++ b/vignettes/build.R @@ -25,6 +25,12 @@ rmarkdown::render( output_file = "advanced.html" ) +rmarkdown::render( + input = "./vignettes/CondMean_Inference.Rmd", + output_dir = "./vignettes/", + output_file = "CondMean_Inference.html" +) + rmarkdown::render( input = "./vignettes/FAQ.Rmd", output_dir = "./vignettes/", diff --git a/vignettes/references.bib b/vignettes/references.bib index 524afcd5..520d61ec 100644 --- a/vignettes/references.bib +++ b/vignettes/references.bib @@ -31,9 +31,11 @@ @article{Bartlett2021 title={Reference-based multiple imputation - what is the right variance and how to estimate it}, author={Bartlett, Jonathan W}, journal={Statistics in Biopharmaceutical Research}, - year={2021}, - publisher={Taylor \& Francis}, - url={https://doi.org/10.1080/19466315.2021.1983455} + volume={15}, + number={1}, + pages={178--186}, + year={2023}, + publisher={Taylor \& Francis} } @article{Carpenter2000, @@ -244,6 +246,17 @@ @article{LiuPang2016 publisher={Taylor \& Francis} } +@article{Lu2021, + title={An alternative implementation of reference-based controlled imputation procedures}, + author={Lu, Kaifeng}, + journal={Statistics in Biopharmaceutical Research}, + volume={13}, + number={4}, + pages={483--491}, + year={2021}, + publisher={Taylor \& Francis} +} + @article{Mallinckrodt2008, title={Recommendations for the primary analysis of continuous endpoints in longitudinal clinical trials}, author={Mallinckrodt, Craig H and Lane, Peter W and Schnell, Dan and Peng, Yahong and Mancuso, James P}, @@ -442,9 +455,12 @@ @article{White2011multiple } @article{Wolbers2021, -author = {Wolbers, Marcel and Noci, Alessandro and Delmar, Paul and Gower-Page, Craig and Yiu, Sean and Bartlett, Jonathan W.}, -title = {Standard and reference-based conditional mean imputation}, -journal = {Pharmaceutical Statistics}, -year = {2022}, -doi = {10.1002/pst.2234} + title={Standard and reference-based conditional mean imputation}, + author={Wolbers, Marcel and Noci, Alessandro and Delmar, Paul and Gower-Page, Craig and Yiu, Sean and Bartlett, Jonathan W}, + journal={Pharmaceutical statistics}, + volume={21}, + number={6}, + pages={1246--1257}, + year={2022}, + publisher={Wiley Online Library} } diff --git a/vignettes/stat_specs.Rmd b/vignettes/stat_specs.Rmd index 161e663b..c7e8d0a3 100644 --- a/vignettes/stat_specs.Rmd +++ b/vignettes/stat_specs.Rmd @@ -436,16 +436,33 @@ All approaches provide consistent treatment effect estimates for standard and re Treatment effects based on conditional mean imputation are deterministic. All other methods are affected by Monte Carlo sampling error and the precision of estimates depends on the number of imputations or bootstrap samples, respectively. - ### Standard errors of the treatment effect -All approaches provide frequentist consistent estimates of the standard error for imputation under a MAR assumption. For reference-based imputation methods, methods based on conditional mean imputation or bootstrapped MI provide frequentist consistent estimates of the standard error whereas Rubin's rules applied to conventional MI methods provides so-called information anchored inference (@Bartlett2021, @CroEtAl2019, @vonHippelBartlett2021, @Wolbers2021). Frequentist consistent estimates of the standard error lead to confidence intervals and tests which have (asymptotically) correct coverage and type I error control under the assumption that the reference-based assumption reflects the true data-generating mechanism. For finite samples, simulations for a sample size of $n=100$ per group reported in @Wolbers2021 demonstrated that conditional mean imputation combined with the jackknife provided exact protection of the type one error rate whereas the bootstrap was associated with a small type I error inflation (between 5.1\% to 5.3\% for a nominal level of 5\%). - -It is well known that Rubin's rules do not provide frequentist consistent estimates of the standard error for reference-based imputation methods (@Seaman2014, @LiuPang2016, @Tang2017, @CroEtAl2019, @Bartlett2021). Standard errors from Rubin's rule are typically larger than frequentist standard error estimates leading to conservative inference and a corresponding loss of statistical power, see e.g. the simulations reported in @Wolbers2021. -Intuitively, this occurs because reference-based imputation methods borrow information from the reference group for imputations in the intervention group leading to a reduction in the frequentist variance of the resulting treatment effect contrast which is not captured by Rubin’s variance estimator. Formally, this occurs because the imputation and analysis models are uncongenial for reference-based imputation methods (@Meng1994, @Bartlett2021). -@CroEtAl2019 argued that Rubin’s rule is nevertheless valid for reference-based imputation methods because it is approximately information-anchored, i.e. that the proportion of information lost due to missing data under MAR is approximately preserved in reference-based analyses. In contrast, frequentist standard errors for reference based imputation are not information anchored for reference-based imputation and standard errors under reference-based assumptions are typically smaller than those for MAR imputation. - -Information anchoring is a sensible concept for sensitivity analyses, whereas for a primary analyses, it may be more important to adhere to the principles of frequentist inference. Analyses of data with missing observations generally rely on unverifiable missing data assumptions and the assumptions for reference-based imputation methods are relatively strong. Therefore, these assumptions need to be clinically justified as appropriate or at least conservative for the considered disease area and the anticipated mechanism of action of the intervention. +All approaches for imputation under a MAR assumption provide consistent estimates of the frequentist standard error. + +For reference-based imputation methods, the situation is more complicated and two different types of variance estimators have been proposed in the statistical literature (@Bartlett2021). +The first is the frequentist variance which describes the actual repeated sampling variability of the estimator. +If the reference-based missing data assumption is correctly specified, then the resulting inference based on this variance is correct in the frequentist sense, i.e. hypothesis tests have asymptotically correct type I error control and confidence intervals have correct coverage probabilities under repeated sampling (@Bartlett2021, @Wolbers2021). +Reference-based missing data assumptions are strong and borrow information from the reference arm for imputation in the active arm. As a consequence, the size of frequentist standard errors for treatment effects may decrease with increasing amounts of missing data. +The second proposal is the so-called "information-anchored" variance which was originally proposed in the context of sensitivity analyses (@CroEtAl2019). This variance estimator is based on disentangling point estimation and variance estimation altogether. +The information-anchoring principle described in @CroEtAl2019 states that the relative increase in the variance of the treatment effect estimator under MAR imputation with increasing amounts of missing data should be preserved for reference-based imputation methods. +The resulting information-anchored variance is typically very similar to the variance under MAR imputation and typically increases with increasing amounts of missing data. +However, the information-anchored variance does not reflect the actual variability of the reference-based estimator under repeated sampling and the resulting inference is highly conservative resulting in a substantial power loss (@Wolbers2021). +Moreover, to date, no Bayesian or frequentist framework has been developed under which the information-anchored variance provides correct inference for reference-based missingness assumptions, nor is it clear whether such a framework can even be developed. + +Reference-based conditional mean imputation (`method_condmean()`) and bootstrapped likelihood-based multiple methods (`method = method_bmlmi()`) obtain standard errors via resampling and hence target the frequentist variance (@Wolbers2021, @vonHippelBartlett2021). +For finite samples, simulations for a sample size of $n=100$ per group reported in @Wolbers2021 demonstrated that conditional mean imputation combined with the jackknife (`method_condmean(type = "jackknife")`) provided exact protection of the type one error rate whereas the bootstrap (`method_condmean(type = "bootstrap")`) was associated with a small type I error inflation (between 5.1\% to 5.3\% for a nominal level of 5\%). +For reference-based conditional mean imputation, an alternative information-anchored variance can be obtained by following a proposal by @Lu2021. +The basic idea of @Lu2021 is to obtain the information-anchored variance via a MAR imputation combined with a delta-adjustment where delta is selected in a data-driven way to match the reference-based estimator. +For conditional mean imputation, the proposal by @Lu2021 can be implemented by choosing the delta-adjustment as the difference between the conditional mean imputation under the chosen reference-based assumption and MAR on the original dataset. +An illustration of how the different variances can be obtained for conditional mean imputation in `rbmi` is provided in the vignette "Frequentist and information-anchored inference for reference-based conditional mean imputation" (`vignette(topic = "CondMean_Inference", package = "rbmi")`). + +Reference-based Bayesian (or approximate Bayesian) multiple imputation methods combined with Rubin's rules (`method_bayes()` and `method_approxbayes()`) target the information-anchored variance (@CroEtAl2019). +A frequentist variance for these methods could in principle be obtained via bootstrap or jackknife re-sampling of the treatment effect estimates but this would be very computationally intensive and is not directly supported by `rbmi`. + +Our view is that for primary analyses, accurate type I error control (which can be obtained by using the frequentist variance) is more important than adherence to the information anchoring principle which, to us, is +not fully compatible with the strong reference-based assumptions. In any case, if reference-based imputation is used for the primary analysis, it is critical that the chosen +reference-based assumption can be clinically justified, and that suitable sensitivity analyses are conducted to stress-test these assumptions. Conditional mean imputation combined with the jackknife is the only method which leads to deterministic standard error estimates and, consequently, confidence intervals and $p$-values are also deterministic. This is particularly important in a regulatory setting where it is important to ascertain whether a calculated $p$-value which is close to the critical boundary of 5% is truly below or above that threshold rather than being uncertain about this because of Monte Carlo error. diff --git a/vignettes/stat_specs.html b/vignettes/stat_specs.html index b5dc78b2..a8163188 100644 --- a/vignettes/stat_specs.html +++ b/vignettes/stat_specs.html @@ -695,18 +695,34 @@

3.10.1 Treatment effect estimatio

3.10.2 Standard errors of the treatment effect

-

All approaches provide frequentist consistent estimates of the standard error for imputation under a MAR assumption. For reference-based imputation methods, methods based on conditional mean imputation or bootstrapped MI provide frequentist consistent estimates of the standard error whereas Rubin’s rules applied to conventional MI methods provides so-called information anchored inference (Bartlett (2021), Cro, Carpenter, and Kenward (2019), von Hippel and Bartlett (2021), Wolbers et al. (2022)). Frequentist consistent estimates of the standard error lead to confidence intervals and tests which have (asymptotically) correct coverage and type I error control under the assumption that the reference-based assumption reflects the true data-generating mechanism. For finite samples, simulations for a sample size of \(n=100\) per group reported in Wolbers et al. (2022) demonstrated that conditional mean imputation combined with the jackknife provided exact protection of the type one error rate whereas the bootstrap was associated with a small type I error inflation (between 5.1% to 5.3% for a nominal level of 5%).

-

It is well known that Rubin’s rules do not provide frequentist consistent estimates of the standard error for reference-based imputation methods (Seaman, White, and Leacy (2014), Liu and Pang (2016), Tang (2017), Cro, Carpenter, and Kenward (2019), Bartlett (2021)). Standard errors from Rubin’s rule are typically larger than frequentist standard error estimates leading to conservative inference and a corresponding loss of statistical power, see e.g. the simulations reported in Wolbers et al. (2022). -Intuitively, this occurs because reference-based imputation methods borrow information from the reference group for imputations in the intervention group leading to a reduction in the frequentist variance of the resulting treatment effect contrast which is not captured by Rubin’s variance estimator. Formally, this occurs because the imputation and analysis models are uncongenial for reference-based imputation methods (Meng (1994), Bartlett (2021)). -Cro, Carpenter, and Kenward (2019) argued that Rubin’s rule is nevertheless valid for reference-based imputation methods because it is approximately information-anchored, i.e. that the proportion of information lost due to missing data under MAR is approximately preserved in reference-based analyses. In contrast, frequentist standard errors for reference based imputation are not information anchored for reference-based imputation and standard errors under reference-based assumptions are typically smaller than those for MAR imputation.

-

Information anchoring is a sensible concept for sensitivity analyses, whereas for a primary analyses, it may be more important to adhere to the principles of frequentist inference. Analyses of data with missing observations generally rely on unverifiable missing data assumptions and the assumptions for reference-based imputation methods are relatively strong. Therefore, these assumptions need to be clinically justified as appropriate or at least conservative for the considered disease area and the anticipated mechanism of action of the intervention.

+

All approaches provide frequentist consistent estimates of the standard error for imputation under a MAR assumption.

+

For reference-based imputation methods, the situation is more complicated and two different types of variance estimators have been proposed in the statistical literature (Bartlett (2023)). +The first is the frequentist variance which describes the actual repeated sampling variability of the estimator. +If the reference-based missing data assumption is correctly specified, then the resulting inference based on this variance is correct in the frequentist sense, i.e. hypothesis tests have asymptotically correct type I error control and confidence intervals have correct coverage probabilities under repeated sampling (Bartlett (2023), Wolbers et al. (2022)). +Reference-based missing data assumption are strong and borrow information from the reference arm for imputation in the active arm. As a consequence, the size of frequentist standard errors for treatment effects may decrease with increasing amounts of missing data. +The second proposal is the so-called “information-anchored” variance which was originally proposed in the context of sensitivity analyses (Cro, Carpenter, and Kenward (2019)). This variance estimator is based on disentangling point estimation and variance estimation altogether. +The information-anchoring principle described in Cro, Carpenter, and Kenward (2019) states that the relative increase in the variance of the treatment effect estimator under MAR imputation with increasing amounts of missing data should be preserved for reference-based imputation methods. +The resulting information-anchored variance is typically very similar to the variance under MAR imputation and typically increases with increasing amounts of missing data. +However, the information-anchored variance does not reflect the actual variability of the reference-based estimator under repeated sampling and the resulting inference is highly conservative resulting in a substantial power loss (Wolbers et al. (2022)). +Moreover, to date, no Bayesian or frequentist framework has been developed under which the information-anchored variance provides correct inference for reference-based missingness assumptions, nor is it clear whether such a framework can even be developed.

+

Reference-based conditional mean imputation (method_condmean()) and bootstrapped likelihood-based multiple methods (method = method_bmlmi()) obtain standard errors via resampling and hence target the frequentist variance (Wolbers et al. (2022), von Hippel and Bartlett (2021)). +For finite samples, simulations for a sample size of \(n=100\) per group reported in Wolbers et al. (2022) demonstrated that conditional mean imputation combined with the jackknife (method_condmean(type = "jackknife")) provided exact protection of the type one error rate whereas the bootstrap (method_condmean(type = "bootstrap")) was associated with a small type I error inflation (between 5.1% to 5.3% for a nominal level of 5%). +For reference-based conditional mean imputation, an alternative information-anchored variance can be obtained by following a proposal by Lu (2021). +The basic idea of Lu (2021) is to obtain the information-anchored variance via a MAR imputation combined with a delta-adjustment where delta is selected in a data-driven way to match the reference-based estimator. +For conditional mean imputation, the proposal by Lu (2021) can be implemented by choosing the delta-adjustment as the difference between the conditional mean imputation under the chosen reference-based assumption and MAR on the original dataset. +An illustration of how the different variances can be obtained for conditional mean imputation in rbmi is provided in the vignette “Frequentist and information-anchored inference for reference-based conditional mean imputation” (vignette(topic = "CondMean_Inference", package = "rbmi")).

+

Reference-based Bayesian (or approximate Bayesian) multiple imputation methods combined with Rubin’s rules (method_bayes() and method_approxbayes()) target the information-anchored variance (Cro, Carpenter, and Kenward (2019)). +A frequentist variance for these methods could in principle be obtained via bootstrap or jackknife re-sampling of the treatment effect estimates but this would be very computationally intensive and is not directly supported by rbmi.

+

Our view is that for primary analyses, accurate type I error control (which can be obtained by using the frequntist variance) is more important than adherence to the information anchoring principle which, to us, is +not fully compatible with the strong reference-based assumptions. In any case, if reference-based imputation is used for the primary analysis, it is critical that the chosen +reference-based assumption can be clinically justified, and that suitable sensitivity analyses are conducted to stress-test these assumptions.

Conditional mean imputation combined with the jackknife is the only method which leads to deterministic standard error estimates and, consequently, confidence intervals and \(p\)-values are also deterministic. This is particularly important in a regulatory setting where it is important to ascertain whether a calculated \(p\)-value which is close to the critical boundary of 5% is truly below or above that threshold rather than being uncertain about this because of Monte Carlo error.

3.10.3 Computational complexity

Bayesian MI methods rely on the specification of prior distributions and the usage of Markov chain Monte Carlo (MCMC) methods. All other methods based on multiple imputation or bootstrapping require no other tuning parameters than the specification of the number of imputations \(M\) or bootstrap samples \(B\) and rely on numerical optimization for fitting the MMRM imputation models via REML. Conditional mean imputation combined with the jackknife has no tuning parameters.

-

In our rbmi implementation, the fitting of the MMRM imputation model via REML is computationally most expensive. MCMC sampling using rstan (Stan Development Team (2020)) is typically relatively fast in our setting and requires only a small burn-in and burn-between of the chains. In addition, the number of random imputations for reliable inference using Rubin’s rules is often smaller than the number of resamples required for the jackknife or the bootstrap (see e.g. the discussions in I. R. White, Royston, and Wood (2011, sec. 7) for Bayesian MI and the Appendix of Wolbers et al. (2022) for the bootstrap). Thus, for many applications, we expect that conventional MI based on Bayesian posterior draws will be fastest, followed by conventional MI using approximate Bayesian posterior draws and conditional mean imputation combined with the jackknife. Conditional mean imputation combined with the bootstrap and bootstrapped MI methods will typically be most computationally demanding. Of note, all implemented methods are conceptually straightforward to parallelise and some parallelization support is provided by rbmi.

+

In our rbmi implementation, the fitting of the MMRM imputation model via REML is computationally most expensive. MCMC sampling using rstan (Stan Development Team (2020)) is typically relatively fast in our setting and requires only a small burn-in and burn-between of the chains. In addition, the number of random imputations for reliable inference using Rubin’s rules is often smaller than the number of resamples required for the jackknife or the bootstrap (see e.g. the discussions in I. R. White, Royston, and Wood (2011, sec. 7) for Bayesian MI and the Appendix of Wolbers et al. (2022) for the bootstrap). Thus, for many applications, we expect that conventional MI based on Bayesian posterior draws will be fastest, followed by conventional MI using approximate Bayesian posterior draws and conditional mean imputation combined with the jackknife. Conditional mean imputation combined with the bootstrap and bootstrapped MI methods will typically be most computationally demanding. Of note, all implemented methods are conceptually straightforward to parallelise and some parallelisation support is provided by rbmi.

@@ -747,7 +763,7 @@

References

Barnard, John, and Donald B Rubin. 1999. “Miscellanea. Small-Sample Degrees of Freedom with Multiple Imputation.” Biometrika 86 (4): 948–55.
-Bartlett, Jonathan W. 2021. “Reference-Based Multiple Imputation - What Is the Right Variance and How to Estimate It.” Statistics in Biopharmaceutical Research. https://doi.org/10.1080/19466315.2021.1983455. +Bartlett, Jonathan W. 2023. “Reference-Based Multiple Imputation - What Is the Right Variance and How to Estimate It.” Statistics in Biopharmaceutical Research 15 (1): 178–86.
Carpenter, James R, James H Roger, and Michael G Kenward. 2013. “Analysis of Longitudinal Trials with Protocol Deviation: A Framework for Relevant, Accessible Assumptions, and Inference via Multiple Imputation.” Journal of Biopharmaceutical Statistics 23 (6): 1352–71. @@ -782,8 +798,8 @@

References

Little, Roderick JA, and Donald B Rubin. 2002. Statistical Analysis with Missing Data, Second Edition. John Wiley & Sons.
-
-Liu, G Frank, and Lei Pang. 2016. “On Analysis of Longitudinal Clinical Trials with Missing Data Using Reference-Based Imputation.” Journal of Biopharmaceutical Statistics 26 (5): 924–36. +
+Lu, Kaifeng. 2021. “An Alternative Implementation of Reference-Based Controlled Imputation Procedures.” Statistics in Biopharmaceutical Research 13 (4): 483–91.
Mallinckrodt, CH, J Bell, G Liu, B Ratitch, M O’Kelly, I Lipkovich, P Singh, L Xu, and G Molenberghs. 2020. “Aligning Estimators with Estimands in Clinical Trials: Putting the ICH E9 (R1) Guidelines into Practice.” Therapeutic Innovation & Regulatory Science 54 (2): 353–64. @@ -791,9 +807,6 @@

References

McGrath, Kevin, and Ian White. 2021. “RefBasedMI: Reference-Based Imputation for Longitudinal Clinical Trials with Protocol Deviation.” https://github.com/UCL/RefbasedMI.
-
-Meng, Xiao-Li. 1994. “Multiple-Imputation Inferences with Uncongenial Sources of Input.” Statistical Science 9 (4): 538–58. -
Patterson, H Desmond, and Robin Thompson. 1971. “Recovery of Inter-Block Information When Block Sizes Are Unequal.” Biometrika 58 (3): 545–54.
@@ -803,15 +816,9 @@

References

Roger, James. 2021. “Reference-Based MI via Multivariate Normal RM (the ‘Five Macros’ and MIWithD).” https://www.lshtm.ac.uk/research/centres-projects-groups/missing-data#dia-missing-data.
-
-Seaman, Shaun R, Ian R White, and Finbarr P Leacy. 2014. “Comment on Analysis of Longitudinal Trials with Protocol Deviations: A Framework for Relevant, Accessible Assumptions, and Inference via Multiple Imputation,’ by Carpenter, Roger, and Kenward.” Journal of Biopharmaceutical Statistics 24 (6): 1358–62. -
Stan Development Team. 2020. RStan: The R Interface to Stan.” https://mc-stan.org/.
-
-Tang, Yongqiang. 2017. “On the Multiple Imputation Variance Estimator for Control-Based and Delta-Adjusted Pattern Mixture Models.” Biometrics 73 (4): 1379–87. -
von Hippel, Paul T, and Jonathan W Bartlett. 2021. “Maximum Likelihood Multiple Imputation: Faster Imputations and Consistent Standard Errors Without Posterior Draws.” Statistical Science 36 (3): 400–420.
@@ -822,7 +829,7 @@

References

White, Ian, Joseph Royes, and Nicky Best. 2020. “A Causal Modelling Framework for Reference-Based Imputation and Tipping Point Analysis in Clinical Trials with Quantitative Outcome.” Journal of Biopharmaceutical Statistics 30 (2): 334–50.
-Wolbers, Marcel, Alessandro Noci, Paul Delmar, Craig Gower-Page, Sean Yiu, and Jonathan W. Bartlett. 2022. “Standard and Reference-Based Conditional Mean Imputation.” Pharmaceutical Statistics. https://doi.org/10.1002/pst.2234. +Wolbers, Marcel, Alessandro Noci, Paul Delmar, Craig Gower-Page, Sean Yiu, and Jonathan W Bartlett. 2022. “Standard and Reference-Based Conditional Mean Imputation.” Pharmaceutical Statistics 21 (6): 1246–57.