Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix coeftest bug #1227

Merged
merged 4 commits into from
Oct 15, 2024
Merged

Fix coeftest bug #1227

merged 4 commits into from
Oct 15, 2024

Conversation

jsr-p
Copy link
Contributor

@jsr-p jsr-p commented Oct 10, 2024

When using tidy with coeftest on a model with a constant terms nan are produced
by the confint function.
The reason is that the intercept rowname does not pass through.
The error is inside the broom_confint_terms function.
When hitting a breakpoint we can inspect the x input variable to the dataframe.
As shown below names(coef(x)) returns NULL so no rowname will be set and when
it is larger merged onto the other variables the nans are produced.
This PR checks for a one-dimensional object of class coeftest and uses the
rownames function on the x variable to get the correct intercept name onto
the resulting ci object.

Browse[2]> x


t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -3.1039     4.4520 -0.6972    0.486

Browse[2]> class(x)

[1] "coeftest"
Browse[2]> is.null(dim(ci))

[1] FALSE
Browse[2]> coef(x)

[1] -3.103912
Browse[2]> names(coef(x))

NULL
Browse[2]>

Output when using broom package

# A tibble: 2 × 8
# Groups:   cat [2]
    cat term        estimate std.error statistic p.value conf.low conf.high
  <int> <chr>          <dbl>     <dbl>     <dbl>   <dbl>    <dbl>     <dbl>
1     1 (Intercept)    -3.10      4.45    -0.697   0.486       NA        NA
2     2 (Intercept)    -6.34      4.62    -1.37    0.170       NA        NA
[1] "Second case"
# A tibble: 2 × 8
# Groups:   cat [2]
    cat term        estimate std.error statistic  p.value conf.low conf.high
  <int> <chr>          <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
1     1 (Intercept)    -3.10     0.584     -5.32 1.58e- 7    -4.25     -1.96
2     2 (Intercept)    -6.34     0.594    -10.7  3.97e-24    -7.51     -5.18
[1] "Third case"
# A tibble: 4 × 8
# Groups:   cat [2]
    cat term        estimate std.error statistic   p.value conf.low conf.high
  <int> <chr>          <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
1     1 (Intercept)    -2.97    4.42      -0.671 5.02e-  1   -11.7       5.72
2     1 x               2.05    0.0822    25.0   6.88e- 90     1.89      2.21
3     2 (Intercept)    -5.93    4.39      -1.35  1.78e-  1   -14.6       2.70
4     2 x               2.14    0.0383    56.0   3.85e-217     2.07      2.22

SessionInfo

R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Arch Linux

Matrix products: default
BLAS:   /usr/lib/libblas.so.3.12.0
LAPACK: /usr/lib/liblapack.so.3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_DK.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

time zone: Europe/Copenhagen
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] sandwich_3.1-0 readr_2.1.5    purrr_1.0.2    tidyr_1.3.1    dplyr_1.1.4
[6] lmtest_0.9-40  zoo_1.8-12     broom_1.0.5

loaded via a namespace (and not attached):
 [1] backports_1.4.1  utf8_1.2.4       R6_2.5.1         tidyselect_1.2.1
 [5] lattice_0.22-6   tzdb_0.4.0       magrittr_2.0.3   glue_1.7.0
 [9] tibble_3.2.1     pkgconfig_2.0.3  generics_0.1.3   lifecycle_1.0.4
[13] cli_3.6.3        fansi_1.0.6      grid_4.4.1       vctrs_0.6.5
[17] compiler_4.4.1   hms_1.1.3        pillar_1.9.0     rlang_1.1.4

Output when using pull request

# A tibble: 2 × 8
# Groups:   cat [2]
    cat term        estimate std.error statistic p.value conf.low conf.high
  <int> <chr>          <dbl>     <dbl>     <dbl>   <dbl>    <dbl>     <dbl>
1     1 (Intercept)    -3.10      4.45    -0.697   0.486    -11.9      5.64
2     2 (Intercept)    -6.34      4.62    -1.37    0.170    -15.4      2.73
[1] "Second case"
# A tibble: 2 × 8
# Groups:   cat [2]
    cat term        estimate std.error statistic  p.value conf.low conf.high
  <int> <chr>          <dbl>     <dbl>     <dbl>    <dbl>    <dbl>     <dbl>
1     1 (Intercept)    -3.10     0.584     -5.32 1.58e- 7    -4.25     -1.96
2     2 (Intercept)    -6.34     0.594    -10.7  3.97e-24    -7.51     -5.18
[1] "Third case"
# A tibble: 4 × 8
# Groups:   cat [2]
    cat term        estimate std.error statistic   p.value conf.low conf.high
  <int> <chr>          <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
1     1 (Intercept)    -2.97    4.42      -0.671 5.02e-  1   -11.7       5.72
2     1 x               2.05    0.0822    25.0   6.88e- 90     1.89      2.21
3     2 (Intercept)    -5.93    4.39      -1.35  1.78e-  1   -14.6       2.70
4     2 x               2.14    0.0383    56.0   3.85e-217     2.07      2.22

Test script

# devtools::load_all("/home/jsr-p/gh-repos/forks/broom")
library(broom)
library(lmtest)
library(dplyr)
library(tidyr)
library(purrr)
library(readr)
library(sandwich)

sessionInfo()

set.seed(999)
df <- data.frame(
  id = rep(1:10, each = 100),
  cs = rep(1:10, times = 100),
  cat = rep(1:2, times = 500),
  x = 4 * rnorm(1000)
)
df <- df |>
  mutate(
    y = 1 + 2 * x + 2 * as.numeric(as.factor(id)) - 3 * as.numeric(as.factor(cs)) + rnorm(1000)
  )


robust_se <- function(x) {
  tidy(
    coeftest(
      x,
      vcovCL(x, cluster = ~ id + cs)
    ),
    conf.int = TRUE
  )
}

df |>
  group_by(cat) |>
  nest() |>
  mutate(model = map(data, ~ lm(y ~ 1, data = .))) |>
  mutate(
    tidy = map(model, robust_se)
  ) |>
  select(!c("data", "model")) |>
  unnest(tidy)


print("Second case")
df |>
  group_by(cat) |>
  nest() |>
  mutate(model = map(data, ~ lm(y ~ 1, data = .))) |>
  mutate(tidy = map(model, \(x) tidy(x, conf.int = TRUE))) |>
  select(!c("data", "model")) |>
  unnest(tidy)


print("Third case")
df |>
  group_by(cat) |>
  nest() |>
  mutate(model = map(data, ~ lm(y ~ 1 + x, data = .))) |>
  mutate(
    tidy = map(model, robust_se)
  ) |>
  select(!c("data", "model")) |>
  unnest(tidy)

Copy link
Collaborator

@simonpcouch simonpcouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Could you also add a test in tests/testthat/test-lmtest.R for this case and note the change in NEWS?

R/utilities.R Outdated
@@ -517,6 +517,12 @@ broom_confint_terms <- function(x, ...) {
rownames(ci) <- names(coef(x))[1]
}

# Handle `coeftest` one-dimensional case
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than dropping this code into the broom_confint_terms() helper, could you situate it into the tidy.coeftest() method itself? After calling broom_confint_terms() there, we still have access to rownames(x) and the result of this function. This will prevent tidier-specific code from leaking into more general tools.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the code review!
This is done now inside tidy.coeftest

@jsr-p
Copy link
Contributor Author

jsr-p commented Oct 15, 2024

Thank you!

Could you also add a test in tests/testthat/test-lmtest.R for this case and note the change in NEWS?

Thanks for the code review!
I have added a test here, updated the news and moved the fix inside tidy.coeftest.

@simonpcouch
Copy link
Collaborator

Thank you for the revisions! I just pushed a few small edits—will merge once checks pass. I appreciate your work here!

@simonpcouch simonpcouch merged commit 10327d8 into tidymodels:main Oct 15, 2024
9 checks passed
@jsr-p
Copy link
Contributor Author

jsr-p commented Oct 15, 2024

@simonpcouch thank you too! 😄

Copy link

This pull request has been automatically locked. If you believe the issue addressed here persists, please file a new PR (with a reprex: https://reprex.tidyverse.org) and link to this one.

@github-actions github-actions bot locked and limited conversation to collaborators Oct 30, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants