What should predict.sdmTMB return when offset is left NULL? #274

seananderson · 2023-11-03T23:54:05Z

I just realized that glm() and glmmTMB both always include the offset in the prediction and always return the prediction on the original data no matter what newdata is if the offset argument is used instead of the offset() formula. This seems kind of crazy to me, but I guess that's the standard.

library(sdmTMB)
dat <- subset(dogfish, catch_weight > 0)

m <- glm(catch_weight ~ 1, data = dat, family = Gamma("log"), offset = log(dat$area_swept))
p1 <- predict(m)
p2 <- predict(m, newdata = dat[1:2,])
#> Warning in predictor + offset: longer object length is not a multiple of
#> shorter object length

p1[1:10]
#>        1        4        5        6        7        9       10       11 
#> 4.863265 5.035115 4.798726 5.035115 4.923889 4.798726 5.035115 4.863265 
#>       12       13 
#> 4.923889 4.798726
p2[1:10]
#>  [1] 4.863265 5.035115 4.798726 5.035115 4.923889 4.798726 5.035115 4.863265
#>  [9] 4.923889 4.798726


m <- glmmTMB::glmmTMB(catch_weight ~ 1, data = dat, family = Gamma("log"), offset = log(dat$area_swept))
p1 <- predict(m)
p2 <- predict(m, newdata = dat[1:2,])

p1[1:10]
#>  [1] 4.863263 5.035113 4.798724 5.035113 4.923887 4.798724 5.035113 4.863263
#>  [9] 4.923887 4.798724
p2[1:10]
#>  [1] 4.863263 5.035113 4.798724 5.035113 4.923887 4.798724 5.035113 4.863263
#>  [9] 4.923887 4.798724

^{Created on 2023-11-03 with reprex v2.0.2}

That changes if you put the offset in the formula, which sdmTMB does not currently support:

library(sdmTMB)
dat <- subset(dogfish, catch_weight > 0)

m <- glm(catch_weight ~ 1 + offset(area_swept), data = dat, family = Gamma("log"))
p1 <- predict(m)
nd <- dat[1:2,]
nd$area_swept <- 0
p2 <- predict(m, newdata = nd)

p1[1:10]
#>        1        4        5        6        7        9       10       11 
#> 4.970636 4.989926 4.964206 4.989926 4.977066 4.964206 4.989926 4.970636 
#>       12       13 
#> 4.977066 4.964206
p2
#>        1        4 
#> 4.867756 4.867756

m <- glmmTMB::glmmTMB(catch_weight ~ 1 + offset(area_swept), data = dat, family = Gamma("log"))
p1 <- predict(m)
p2 <- predict(m, newdata = nd)

p1[1:10]
#>  [1] 4.970636 4.989926 4.964206 4.989926 4.977066 4.964206 4.989926 4.970636
#>  [9] 4.977066 4.964206
p2
#> [1] 4.867756 4.867756

^{Created on 2023-11-03 with reprex v2.0.2}

We need the ability to set the offset to 0 for index standardization, among other uses. It's obvious what to do when newdata and offset are specified in predict.sdmTMB(). But what about:

newdata = NULL, offset = NULL: predict on original data with offset = 0
newdata = df, offset = NULL: predict on df with offset = 0
newdata = NULL, offset = some_offset: predict on original data with offset = some_offset; this was wrong and was being returned as original data and original offset

I assume I should do the above. For one, people have code that depends on it. I never realized the difference in the formula vs. argument offset specification.

Just writing this out to document the thought process.

The text was updated successfully, but these errors were encountered:

#273 #274

Lewis-Barnett-NOAA · 2024-01-26T18:33:53Z

Ah, this is timely as I was just trying to help a colleague that was having trouble estimating marginal effects with ggeffects. The error was indicating different offset lengths.

Including the offset in the formula always seems quite inelegant.

Speaking of offsets more generally, what is the best example of how different the results are with an effort offset vs modeling say cpue with 0 offset? I imagine it depends on the disparity in effort values? Or is the stats theory justification strong enough to indicate that this could substantially change important outputs or inference?

seananderson added a commit that referenced this issue Nov 4, 2023

Fix offsets, extra time, and nsim combination

3167e0a

#273 #274

seananderson mentioned this issue Sep 20, 2024

Cross validation function does not seem to consider offset in model #372

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What should predict.sdmTMB return when offset is left NULL? #274

What should predict.sdmTMB return when offset is left NULL? #274

seananderson commented Nov 3, 2023

Lewis-Barnett-NOAA commented Jan 26, 2024

What should predict.sdmTMB return when offset is left NULL? #274

What should predict.sdmTMB return when offset is left NULL? #274

Comments

seananderson commented Nov 3, 2023

Lewis-Barnett-NOAA commented Jan 26, 2024