Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MASS:glm.nb does not handle zero-only data #40

Open
thibautjombart opened this issue Sep 23, 2020 · 3 comments
Open

MASS:glm.nb does not handle zero-only data #40

thibautjombart opened this issue Sep 23, 2020 · 3 comments
Labels
bug Something isn't working enhancement New feature or request help wanted Extra attention is needed

Comments

@thibautjombart
Copy link
Contributor

Context

Using the range of models below to fit count data at small spatial scales, zero-only time series in the callibration window are common; these trigger an error with negbin models, causing asmodee to fail. This use-case should be something we should handle - no cases at all followed by a few recent cases should be picked up as a flare-up.

The issue

It seems MASS::glm.nb, which we use in the background of glm_nb_model, does not work with zero-only data:

# toy data
x <- data.frame(count = rep(0, 10), day = 1:10)
x
#>    count day
#> 1      0   1
#> 2      0   2
#> 3      0   3
#> 4      0   4
#> 5      0   5
#> 6      0   6
#> 7      0   7
#> 8      0   8
#> 9      0   9
#> 10     0  10

# poisson glm - works
glm(count ~ day, data = x, family = "poisson")
#> 
#> Call:  glm(formula = count ~ day, family = "poisson", data = x)
#> 
#> Coefficients:
#> (Intercept)          day  
#>  -2.430e+01    1.735e-15  
#> 
#> Degrees of Freedom: 9 Total (i.e. Null);  8 Residual
#> Null Deviance:       0 
#> Residual Deviance: 5.579e-10     AIC: 4

# negbin - fails
MASS::glm.nb(count ~ day, data = x)
#> Error in while ((it <- it + 1) < limit && abs(del) > eps) {: missing value where TRUE/FALSE needed

# and yet the pmf for (x = 0)  is defined
dnbinom(0:10, prob = 0.1, size = 1)
#>  [1] 0.10000000 0.09000000 0.08100000 0.07290000 0.06561000 0.05904900
#>  [7] 0.05314410 0.04782969 0.04304672 0.03874205 0.03486784

Note that this might be related to this post. Tagging @tjtnew here as this may end up being solved in trending.

Handling failures in models

This asks a more general question about handling multiple models: it is possible that for a given dataset, a specific model from the list of candidate models may error. This currently results in a general error when running select_model(); an alternative behaviour would be to catch this error, and ignore this model when selecting the 'best' one.

@thibautjombart thibautjombart added bug Something isn't working enhancement New feature or request help wanted Extra attention is needed labels Sep 23, 2020
@TimTaylor
Copy link
Member

I'll deal with error handling over in trending (see issue reconverse/trending#10). This can hopefully be merged in to trendbreaker in the near future.

In regards to the issues with MASS::glm.nb we may need to investigate zero-inflated models and think about encorporating these in to trending.

@TimTaylor
Copy link
Member

In regards to the error handling I'm thinking something along the lines of:

library(purrr)

# toy data
x <- data.frame(count = rep(0, 10), day = 1:10)

# poisson glm
model1 <- function(dat) {
    glm(count ~ day, data = dat, family = "poisson")
}

# negbin
model2 <- function(dat) {
    MASS::glm.nb(count ~ day, data = dat)
}

# list of models
models <- list(model1, model2, model2)


#### handle errors nicely with purrr::safely ####
outcome <- map(models, function(m) safely(m)(x))

# list of results and errors
outcome <- transpose(outcome)
outcome
#> $result
#> $result[[1]]
#> 
#> Call:  glm(formula = count ~ day, family = "poisson", data = dat)
#> 
#> Coefficients:
#> (Intercept)          day  
#>  -2.430e+01    2.408e-15  
#> 
#> Degrees of Freedom: 9 Total (i.e. Null);  8 Residual
#> Null Deviance:       0 
#> Residual Deviance: 5.579e-10     AIC: 4
#> 
#> $result[[2]]
#> NULL
#> 
#> $result[[3]]
#> NULL
#> 
#> 
#> $error
#> $error[[1]]
#> NULL
#> 
#> $error[[2]]
#> <simpleError in while ((it <- it + 1) < limit && abs(del) > eps) {    t0 <- abs(t0)    del <- score(n, t0, mu, y, weights)/(i <- info(n, t0, mu,         y, weights))    t0 <- t0 + del    if (trace)         message("theta.ml: iter", it, " theta =", signif(t0))}: missing value where TRUE/FALSE needed>
#> 
#> $error[[3]]
#> <simpleError in while ((it <- it + 1) < limit && abs(del) > eps) {    t0 <- abs(t0)    del <- score(n, t0, mu, y, weights)/(i <- info(n, t0, mu,         y, weights))    t0 <- t0 + del    if (trace)         message("theta.ml: iter", it, " theta =", signif(t0))}: missing value where TRUE/FALSE needed>

# What models succeeded
ok <- map_lgl(outcome$error, is.null)
models[ok]
#> [[1]]
#> function(dat) {
#>     glm(count ~ day, data = dat, family = "poisson")
#> }

# what models failed
models[!ok]
#> [[1]]
#> function(dat) {
#>     MASS::glm.nb(count ~ day, data = dat)
#> }
#> <bytecode: 0x5621f309c3d8>
#> 
#> [[2]]
#> function(dat) {
#>     MASS::glm.nb(count ~ day, data = dat)
#> }
#> <bytecode: 0x5621f309c3d8>
outcome$error[!ok]
#> [[1]]
#> <simpleError in while ((it <- it + 1) < limit && abs(del) > eps) {    t0 <- abs(t0)    del <- score(n, t0, mu, y, weights)/(i <- info(n, t0, mu,         y, weights))    t0 <- t0 + del    if (trace)         message("theta.ml: iter", it, " theta =", signif(t0))}: missing value where TRUE/FALSE needed>
#> 
#> [[2]]
#> <simpleError in while ((it <- it + 1) < limit && abs(del) > eps) {    t0 <- abs(t0)    del <- score(n, t0, mu, y, weights)/(i <- info(n, t0, mu,         y, weights))    t0 <- t0 + del    if (trace)         message("theta.ml: iter", it, " theta =", signif(t0))}: missing value where TRUE/FALSE needed>

Created on 2020-09-23 by the reprex package (v0.3.0)

@thibautjombart
Copy link
Contributor Author

Looks nice! I like that we get results and errors as two separate lists. Seems to fit the bill!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants