MASS:glm.nb does not handle zero-only data #40

thibautjombart · 2020-09-23T07:25:33Z

Context

Using the range of models below to fit count data at small spatial scales, zero-only time series in the callibration window are common; these trigger an error with negbin models, causing asmodee to fail. This use-case should be something we should handle - no cases at all followed by a few recent cases should be picked up as a flare-up.

The issue

It seems MASS::glm.nb, which we use in the background of glm_nb_model, does not work with zero-only data:

# toy data
x <- data.frame(count = rep(0, 10), day = 1:10)
x
#>    count day
#> 1      0   1
#> 2      0   2
#> 3      0   3
#> 4      0   4
#> 5      0   5
#> 6      0   6
#> 7      0   7
#> 8      0   8
#> 9      0   9
#> 10     0  10

# poisson glm - works
glm(count ~ day, data = x, family = "poisson")
#> 
#> Call:  glm(formula = count ~ day, family = "poisson", data = x)
#> 
#> Coefficients:
#> (Intercept)          day  
#>  -2.430e+01    1.735e-15  
#> 
#> Degrees of Freedom: 9 Total (i.e. Null);  8 Residual
#> Null Deviance:       0 
#> Residual Deviance: 5.579e-10     AIC: 4

# negbin - fails
MASS::glm.nb(count ~ day, data = x)
#> Error in while ((it <- it + 1) < limit && abs(del) > eps) {: missing value where TRUE/FALSE needed

# and yet the pmf for (x = 0)  is defined
dnbinom(0:10, prob = 0.1, size = 1)
#>  [1] 0.10000000 0.09000000 0.08100000 0.07290000 0.06561000 0.05904900
#>  [7] 0.05314410 0.04782969 0.04304672 0.03874205 0.03486784

Note that this might be related to this post. Tagging @tjtnew here as this may end up being solved in trending.

Handling failures in models

This asks a more general question about handling multiple models: it is possible that for a given dataset, a specific model from the list of candidate models may error. This currently results in a general error when running select_model(); an alternative behaviour would be to catch this error, and ignore this model when selecting the 'best' one.

The text was updated successfully, but these errors were encountered:

TimTaylor · 2020-09-23T08:52:35Z

I'll deal with error handling over in trending (see issue reconverse/trending#10). This can hopefully be merged in to trendbreaker in the near future.

In regards to the issues with MASS::glm.nb we may need to investigate zero-inflated models and think about encorporating these in to trending.

TimTaylor · 2020-09-23T10:23:42Z

In regards to the error handling I'm thinking something along the lines of:

library(purrr)

# toy data
x <- data.frame(count = rep(0, 10), day = 1:10)

# poisson glm
model1 <- function(dat) {
    glm(count ~ day, data = dat, family = "poisson")
}

# negbin
model2 <- function(dat) {
    MASS::glm.nb(count ~ day, data = dat)
}

# list of models
models <- list(model1, model2, model2)


#### handle errors nicely with purrr::safely ####
outcome <- map(models, function(m) safely(m)(x))

# list of results and errors
outcome <- transpose(outcome)
outcome
#> $result
#> $result[[1]]
#> 
#> Call:  glm(formula = count ~ day, family = "poisson", data = dat)
#> 
#> Coefficients:
#> (Intercept)          day  
#>  -2.430e+01    2.408e-15  
#> 
#> Degrees of Freedom: 9 Total (i.e. Null);  8 Residual
#> Null Deviance:       0 
#> Residual Deviance: 5.579e-10     AIC: 4
#> 
#> $result[[2]]
#> NULL
#> 
#> $result[[3]]
#> NULL
#> 
#> 
#> $error
#> $error[[1]]
#> NULL
#> 
#> $error[[2]]
#> <simpleError in while ((it <- it + 1) < limit && abs(del) > eps) {    t0 <- abs(t0)    del <- score(n, t0, mu, y, weights)/(i <- info(n, t0, mu,         y, weights))    t0 <- t0 + del    if (trace)         message("theta.ml: iter", it, " theta =", signif(t0))}: missing value where TRUE/FALSE needed>
#> 
#> $error[[3]]
#> <simpleError in while ((it <- it + 1) < limit && abs(del) > eps) {    t0 <- abs(t0)    del <- score(n, t0, mu, y, weights)/(i <- info(n, t0, mu,         y, weights))    t0 <- t0 + del    if (trace)         message("theta.ml: iter", it, " theta =", signif(t0))}: missing value where TRUE/FALSE needed>

# What models succeeded
ok <- map_lgl(outcome$error, is.null)
models[ok]
#> [[1]]
#> function(dat) {
#>     glm(count ~ day, data = dat, family = "poisson")
#> }

# what models failed
models[!ok]
#> [[1]]
#> function(dat) {
#>     MASS::glm.nb(count ~ day, data = dat)
#> }
#> <bytecode: 0x5621f309c3d8>
#> 
#> [[2]]
#> function(dat) {
#>     MASS::glm.nb(count ~ day, data = dat)
#> }
#> <bytecode: 0x5621f309c3d8>
outcome$error[!ok]
#> [[1]]
#> <simpleError in while ((it <- it + 1) < limit && abs(del) > eps) {    t0 <- abs(t0)    del <- score(n, t0, mu, y, weights)/(i <- info(n, t0, mu,         y, weights))    t0 <- t0 + del    if (trace)         message("theta.ml: iter", it, " theta =", signif(t0))}: missing value where TRUE/FALSE needed>
#> 
#> [[2]]
#> <simpleError in while ((it <- it + 1) < limit && abs(del) > eps) {    t0 <- abs(t0)    del <- score(n, t0, mu, y, weights)/(i <- info(n, t0, mu,         y, weights))    t0 <- t0 + del    if (trace)         message("theta.ml: iter", it, " theta =", signif(t0))}: missing value where TRUE/FALSE needed>

^{Created on 2020-09-23 by the reprex package (v0.3.0)}

thibautjombart · 2020-09-23T10:33:52Z

Looks nice! I like that we get results and errors as two separate lists. Seems to fit the bill!

thibautjombart added bug Something isn't working enhancement New feature or request help wanted Extra attention is needed labels Sep 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MASS:glm.nb does not handle zero-only data #40

MASS:glm.nb does not handle zero-only data #40

thibautjombart commented Sep 23, 2020

TimTaylor commented Sep 23, 2020

TimTaylor commented Sep 23, 2020

thibautjombart commented Sep 23, 2020

MASS:glm.nb does not handle zero-only data #40

MASS:glm.nb does not handle zero-only data #40

Comments

thibautjombart commented Sep 23, 2020

Context

The issue

Handling failures in models

TimTaylor commented Sep 23, 2020

TimTaylor commented Sep 23, 2020

thibautjombart commented Sep 23, 2020