Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lh_robust gives inconsistent confidence intervals if using clusters #405

Open
rcragun opened this issue Jun 5, 2023 · 0 comments
Open
Labels

Comments

@rcragun
Copy link

rcragun commented Jun 5, 2023

Overview

If you specify clusters for lh_robust, the confidence intervals (CIs) and p in $lh are inconsistent with those in $lm_robust.

Reproduce

The problem can be seen by using a hypothesis that one coefficient equals 0.

Simple example data:

library(estimatr)
nSize = 12
dat = data.frame(
  x = rnorm(nSize),
  e = rnorm(nSize),
  # Irrelevant clusters for errors
  eg = sample(2, nSize, replace=T)
)
dat$z = dat$x + dat$e

CIs match when not correcting for error correlation:

> lh_robust(z~x, data=dat, se_type='HC2', linear_hypothesis='x=0')
$lm_robust
             Estimate Std. Error    t value   Pr(>|t|)    CI Lower CI Upper DF
(Intercept) -0.137880  0.2850087 -0.4837747 0.63896458 -0.77291900 0.497159 10
x            0.620707  0.3135789  1.9794287 0.07594477 -0.07799024 1.319404 10

$lh
    Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
x=0   0.6207     0.3136   1.979  0.07594 -0.07799    1.319 10

CIs don't match when correcting for error correlation:

> lh_robust(z~x, data=dat, clusters=eg, se_type='stata', linear_hypothesis='x=0')
$lm_robust
             Estimate Std. Error    t value  Pr(>|t|)  CI Lower CI Upper DF
(Intercept) -0.137880  0.4824367 -0.2857991 0.8227790 -6.267819 5.992059  1
x            0.620707  0.4538092  1.3677710 0.4019017 -5.145485 6.386899  1

$lh
    Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
x=0   0.6207     0.4538   1.368   0.2013  -0.3904    1.632 10

Using other se_types does not alter these facts.

Additional notes

The problem may be due to a difference in degrees of freedom used, so I am unsure if this is the same issue as #289.

System info

> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22621)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] estimatr_1.0.0

loaded via a namespace (and not attached):
 [1] httr_1.4.5      compiler_4.1.2  R6_2.5.1        cli_3.6.0       generics_0.1.3  tools_4.1.2    
 [7] abind_1.4-5     rstudioapi_0.14 car_3.1-2       Rcpp_1.0.9      carData_3.0-5   mvtnorm_1.1-3  
[13] texreg_1.38.6   Formula_1.2-5   rlang_1.1.0 
@graemeblair graemeblair added the bug label Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants