MICE Imputation Constraint #507
Replies: 2 comments 2 replies
-
I never did this, but perhaps you can do this with post-processing imputation using the ideas from the vignette. Try something like
where Experimental. No guarantees. |
Beta Was this translation helpful? Give feedback.
-
It's not clear to me how you have tried to implement my suggestion, and why it does not work. No time and interest for consulting on this. |
Beta Was this translation helpful? Give feedback.
-
I'd like to impute an NHANES dataset that contains both missing values and values that need to get imputed since they are below a threshold (Limit of detection(LOD)). MICE imputation is straightforward, but I'd like to constrain the prediction between 0 and the LOD. However, if I just run MICE there can be the introduction of negative values.
Here is my current workflow:
Remove values that are below LOD and introduce them as "NA" when creating a dataframe in R.
Add co-variates that affect other columns
Impute using the following MICE method
imputed_data <- mice(df, m = 5, maxit = 10, meth = "norm.predict", seed = 3985)
As mentioned earlier, values can be introduced as either negative or above the LOD (physically impossible). In the NHANES dataset there are columns that indicate that a value is below LOD. These correspond to columns in the dataset. For example, LCB044LA is the column I'd like to impute and I can use column LCB044LC to determine if a value is below detection (binary where 1-below LOD and 0-above LOD). How can I constrain MICE to impute a value between 0 and what is in that cell?
There is an example here of constraint between 0 - 25 (https://www.gerkovink.com/miceVignettes/Passive_Post_processing/Passive_imputation_post_processing.html), but my upper bounds are specific to the cell itself so this does not help.
There are 3 formats I can make my datasets if that helps:
NHANES dataset with values below LOD removed. Indicator columns for which ones are below are added to the dataset (ends with LC in column name).
NHANES dataset is identical to the above, except instead of empty cells the LOD is imputed into the cell. Indicator columns are still within the dataset.
LOD values for samples that are below the LOD.
Please let me know if it is possible to constrain MICE using these inputs or if there is some general documentation to constrain MICE with these inputs.
Here is the code so far:
Beta Was this translation helpful? Give feedback.
All reactions