You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An emmeans user posted a question about emmprep whereby he wasn't able to use emmprep() for a model because it had nested fixed effects, and that caused rank deficiencies in the model.
Reproducible Example (if applicable)
library(metafor)
## Loading required package: Matrix## Loading required package: metadat## Loading required package: numDeriv## ## Loading the 'metafor' package (version 4.2-0). For an## introduction to the package please type: help(metafor)
library(emmeans)
dat<- escalc(measure="RR", ai=tpos, bi=tneg, ci=cpos, di=cneg, data=dat.bcg)
dat<- transform(dat,
acat=factor(ablat<40, labels= c("high", "low")),
era=factor(1+as.integer((year-1940)/15),
labels= c("early","mid","late")))
with(dat, table(acat, era))
## era## acat early mid late## high 3 2 1## low 0 2 5mod<- rma(yi, vi, mods=~acat*era, data=dat)
## Warning: Redundant predictors dropped from the model.mod## ## Mixed-Effects Model (k = 13; tau^2 estimator: REML)## ## tau^2 (estimated amount of residual heterogeneity): 0.0854 (SE = 0.0810)## tau (square root of estimated tau^2 value): 0.2923## I^2 (residual heterogeneity / unaccounted variability): 68.48%## H^2 (unaccounted variability / sampling variability): 3.17## R^2 (amount of heterogeneity accounted for): 72.73%## ## Test for Residual Heterogeneity:## QE(df = 8) = 23.1259, p-val = 0.0032## ## Test of Moderators (coefficients 2:5):## QM(df = 4) = 18.5649, p-val = 0.0010## ## Model Results:## ## estimate se zval pval ci.lb ci.ub ## intrcpt -0.9706 0.2437 -3.9823 <.0001 -1.4483 -0.4929 *** ## acatlow 1.1731 0.3627 3.2348 0.0012 0.4623 1.8839 ** ## eramid -0.3952 0.4240 -0.9320 0.3513 -1.2262 0.4359 ## eralate -0.4710 0.4060 -1.1600 0.2461 -1.2667 0.3248 ## acatlow:eramid -0.1059 0.6060 -0.1747 0.8613 -1.2937 1.0819 ## ## ---## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1RG<- emmprep(mod)
## Error in emmprep(mod): Cannot use function when some redundant predictors were dropped from the model.# compare with lm:LM<- lm(yi~acat*era, data=dat)
coef(LM)
## (Intercept) acatlow eramid eralate acatlow:eramid ## -1.0869385 1.0049180 -0.2727705 -0.3546127 0.3430389 ## acatlow:eralate ## NA
(RG<- ref_grid(LM))
## 'emmGrid' object with variables:## acat = high, low## era = early, mid, late## Some things are non-estimable (null space dim = 1)
confint(RG)
## acat era prediction SE df lower.CL upper.CL## high early -1.0869 0.331 8 -1.851 -0.323## low early nonEst NA NA NA NA## high mid -1.3597 0.406 8 -2.295 -0.424## low mid -0.0118 0.406 8 -0.947 0.924## high late -1.4416 0.574 8 -2.765 -0.118## low late -0.4366 0.257 8 -1.028 0.155## ## Confidence level used: 0.95
The emmeans package is capable of supporting rank-deficient models. However, it needs information about which predictors were dropped. That information is also useful to some users. With some small changes in the object created it would be possible to enable that support and take advantage of emmeans's ability to handle those cases rather than just issuing an error message.
The conventional way to handle rank deficiencies is the way lm() does it. The cofficients for all the predictors are returned by coef(), and the ones that are dropped are given the value NA. The vcov() function, on the other hand, just returns the covariances for the included predictors. Note in the confidence intervals for RG (the reference grid for LM), we were able to identify the non-estimable case (the cell that was empty). It would seem that if rma() also returned the NAs as coefficients for the dropped predictors, that is all it would take, as the qdrg() function can figure this stuff out too).
One point of common confusion (or at least to me, a number of years ago): The NA coefficients do not actually represent missing values! Instead, they represent coefficients that were set to zero. This is clear as soon as you make the connection that dropping a predictor is the same thing as setting its coefficient to zero. When there are rank deficiencies, the solution to the normal equations is not unique; and in R, the particular solution returned is the one with those constraints to zero. There are many other valid solutions, and most of them give non-zero weight to those predictors. So we really are not saying that the dropped predictors are unimportant -- just that we didn't use them for this one particular solution.
Follow-up: It turns out I already had code to figure out where to fill-in the NAs. I think you can just take the check on coef.na and it'll work just fine. See the answer I posted on the SO site I referenced above.
PS -- there was a glitch in the code for qdrg that made it not always get the V matrix right in rank-deficient models, but I don't think it affects your models as it only comes into play when the number of rows/columns is too large. But I will be sending an update within a week or two that will have that fixed.
Hi Russell -- thanks for the suggestion. I am just coming back to the office from a 5-week absence due to conferences and holidays. It will take me a while to get back to this, but will respond when I find the time.
Classification:
Enhancement suggestion
Summary
An emmeans user posted a question about
emmprep
whereby he wasn't able to useemmprep()
for a model because it had nested fixed effects, and that caused rank deficiencies in the model.Reproducible Example (if applicable)
Created on 2023-07-22 with reprex v2.0.2
Created on 2023-07-22 with reprex v2.0.2
Notes
The emmeans package is capable of supporting rank-deficient models. However, it needs information about which predictors were dropped. That information is also useful to some users. With some small changes in the object created it would be possible to enable that support and take advantage of emmeans's ability to handle those cases rather than just issuing an error message.
The conventional way to handle rank deficiencies is the way
lm()
does it. The cofficients for all the predictors are returned bycoef()
, and the ones that are dropped are given the valueNA
. Thevcov()
function, on the other hand, just returns the covariances for the included predictors. Note in the confidence intervals forRG
(the reference grid forLM
), we were able to identify the non-estimable case (the cell that was empty). It would seem that ifrma()
also returned theNA
s as coefficients for the dropped predictors, that is all it would take, as theqdrg()
function can figure this stuff out too).One point of common confusion (or at least to me, a number of years ago): The
NA
coefficients do not actually represent missing values! Instead, they represent coefficients that were set to zero. This is clear as soon as you make the connection that dropping a predictor is the same thing as setting its coefficient to zero. When there are rank deficiencies, the solution to the normal equations is not unique; and in R, the particular solution returned is the one with those constraints to zero. There are many other valid solutions, and most of them give non-zero weight to those predictors. So we really are not saying that the dropped predictors are unimportant -- just that we didn't use them for this one particular solution.sessionInfo()
The text was updated successfully, but these errors were encountered: