MICE Imputation Constraint #507

MikeDereviankin · 2022-10-24T20:26:06Z

MikeDereviankin
Oct 24, 2022

I'd like to impute an NHANES dataset that contains both missing values and values that need to get imputed since they are below a threshold (Limit of detection(LOD)). MICE imputation is straightforward, but I'd like to constrain the prediction between 0 and the LOD. However, if I just run MICE there can be the introduction of negative values.

Here is my current workflow:

Remove values that are below LOD and introduce them as "NA" when creating a dataframe in R.
Add co-variates that affect other columns
Impute using the following MICE method imputed_data <- mice(df, m = 5, maxit = 10, meth = "norm.predict", seed = 3985)
As mentioned earlier, values can be introduced as either negative or above the LOD (physically impossible). In the NHANES dataset there are columns that indicate that a value is below LOD. These correspond to columns in the dataset. For example, LCB044LA is the column I'd like to impute and I can use column LCB044LC to determine if a value is below detection (binary where 1-below LOD and 0-above LOD). How can I constrain MICE to impute a value between 0 and what is in that cell?

There is an example here of constraint between 0 - 25 (https://www.gerkovink.com/miceVignettes/Passive_Post_processing/Passive_imputation_post_processing.html), but my upper bounds are specific to the cell itself so this does not help.

There are 3 formats I can make my datasets if that helps:

NHANES dataset with values below LOD removed. Indicator columns for which ones are below are added to the dataset (ends with LC in column name).
NHANES dataset is identical to the above, except instead of empty cells the LOD is imputed into the cell. Indicator columns are still within the dataset.
LOD values for samples that are below the LOD.
Please let me know if it is possible to constrain MICE using these inputs or if there is some general documentation to constrain MICE with these inputs.

Here is the code so far:

#Author: M. Dereviankin
#Date: 16-Aug-2022
#Title: NHANES Imputation based on demographics & Predictive Modelling

library(mice)
library(tidyverse)
library(VIM)
library(GGally)
library(caret)
library(tidymodels)
library(dplyr)
library(yardstick)
library(mosaic)


# MICE Imputation 2003-2004 -----------------------------------------------

df <- read.csv('2003_2004_template_2.csv', stringsAsFactors = TRUE, na.strings = c("", NA))

#Specify the Non-Detects
Detect <- read.csv("Detect.csv", header = TRUE, stringsAsFactors = FALSE)
dim(Detect)# 50 44
NonDetect <- read.csv("Non_Detect.csv", header = TRUE, stringsAsFactors = FALSE)
dim(NonDetect)# 50 44

# Remove the first 2 columns ( we dont need them for the imputation)
Detected <- Detect[,-c(1,1)]
dim(Detected)
NonDetected <- NonDetect[,-c(1,1)]
dim(NonDetected)

# Transform to a matrix
Detect.Matrix <- as.matrix(Detected)
dim(Detect.Matrix)
res_detect <- colSums(Detect.Matrix==0)/nrow(Detect.Matrix)*100
res.detect.matrix <- rbind(Detect.Matrix, res_detect)

dim(Detect.Matrix)
NonDetect.Matrix <- as.matrix(NonDetected)
dim(NonDetect.Matrix)

#Subset the 2 matrices
Dn.M   <- Detect.Matrix[,colSums(Detect.Matrix != 0) >=5]
Dn.M.removed <- Detect.Matrix[,colSums(Detect.Matrix != 0) < 5 ]
dim(Dn.M)

Dn.N.M <- NonDetect.Matrix[,colSums(NonDetect.Matrix== 0) >=5]
dim(Dn.N.M)

#Now apply the right method

imputed_data <- mice(Detect.Matrix, m = 5, maxit = 10, meth = "norm.predict", post = Dn.N.M, seed = 3985)
summary(imputed_data)

#finish the dataset

finished_imputed_data <- complete(imputed_data)

#Print off finished dataset

write_csv(finished_imputed_data, "finished_imputed_data_norm.predict.csv")

stefvanbuuren · 2022-10-24T21:04:14Z

stefvanbuuren
Oct 24, 2022
Maintainer

I never did this, but perhaps you can do this with post-processing imputation using the ideas from the vignette. Try something like

post["myvar"] <- "imp[[j]][, i] <- squeeze_vec(imp[[j]][, i], c(imp[["lower"]][, i], imp[["upper"]][, i]))"

where "lower" and "upper" are the names of columns in your data with the subject-specific bounds you specify for "myvar". The squeeze_vec() function - which you must write yourself - could be a vectorized version of the mice::squeeze() that takes vector bounds.

Experimental. No guarantees.

2 replies

MikeDereviankin Oct 25, 2022
Author

Thanks for the suggestion as per the vignette. I tried to create a custom function where I would squeeze the constraints, but the code still imputed bound the specified bounds.

Other imputation packages (zCompositions) for example let you put a constraint based on a separate dataset (what I've tried to do in my code - Dn.N.M). This is added to the 'dl' in zComposition imputations. See the example here:

Complete.Matrix.1 <- multKM(Dn.M, label=0, dl= Dn.N.M, n.draws = 1000)

If this helps, I can arrange my dataset in the following manners depending on which one would work better with the package.

SEQN LBX156LA LBX156LA_upper
1 21005 NA 0.1555635
2 21008 2.30 NA
3 21009 6.92 NA

Where '_upper' corresponds to the detection limit when the sample is below detection (is NA and has no value for _upper).

I can also keep the detection limits in a separate matrix similar to what I've done in the code above.

Additionally, I want to constrain the imputations to only look at the variables associated with concentration and not these "_upper" columns. Meaning, I don't want these to be imputed because they will contain NAs.

MikeDereviankin Oct 26, 2022
Author

@stefvanbuuren anyway we can replicate this sample by sample basis, "squeeze", post process and stated? Let me know if there is any better way to communicate this with you. Happy to book a consulting call to go over this.

stefvanbuuren · 2022-10-27T08:22:54Z

stefvanbuuren
Oct 27, 2022
Maintainer

It's not clear to me how you have tried to implement my suggestion, and why it does not work. No time and interest for consulting on this.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MICE Imputation Constraint #507

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

MICE Imputation Constraint #507

MikeDereviankin Oct 24, 2022

Replies: 2 comments · 2 replies

stefvanbuuren Oct 24, 2022 Maintainer

MikeDereviankin Oct 25, 2022 Author

MikeDereviankin Oct 26, 2022 Author

stefvanbuuren Oct 27, 2022 Maintainer

MikeDereviankin
Oct 24, 2022

Replies: 2 comments 2 replies

stefvanbuuren
Oct 24, 2022
Maintainer

MikeDereviankin Oct 25, 2022
Author

MikeDereviankin Oct 26, 2022
Author

stefvanbuuren
Oct 27, 2022
Maintainer