Add RMSE and MAE sdmTMB_cv examples to cross validation vignette #322

seanhardison1 · 2024-03-18T17:12:17Z

Hello! I'm a big fan of the sdmTMB_cv function, but I find the likelihood-based measures that are currently returned with the function to be a bit unintuitive. For a future iteration, it would be nice to have the function also return RMSE and MAE (by fold? averaged across folds?) in addition to the likelihood-based measures, or perhaps allow for user defined functions.

The text was updated successfully, but these errors were encountered:

seananderson · 2024-03-18T17:50:52Z

We can certainly get some common ones in there and/or allow a user-specified function. In the meantime, it's relatively easy to calculate whatever you need yourself with the cv_predicted column. The log likelihood of the left-out data has the advantage of being consistent with the what the model is trying to maximize in the data portion. It is similar to the log score (not sure what the classic paper is, but Draper often wrote about it) or the ELPD in Bayesian stats. But, yes, I can see the benefit of calculating other classic measures on the scale of the data.

library(sdmTMB)
library(dplyr)

m <- sdmTMB_cv(
  density ~ depth_scaled + depth_scaled2,
  data = pcod, mesh = make_mesh(pcod, c("X", "Y"), cutoff = 25),
  family = tweedie(link = "log"), k_folds = 3
)
#> Running fits with `future.apply()`.
#> Set a parallel `future::plan()` to use parallel processing.

names(m$data)
#>  [1] "year"          "X"             "Y"             "depth"        
#>  [5] "density"       "present"       "lat"           "lon"          
#>  [9] "depth_mean"    "depth_sd"      "depth_scaled"  "depth_scaled2"
#> [13] "cv_fold"       "cv_predicted"  "cv_loglik"

# see cv_predicted for the prediction within each left-out fold

d <- m$data

# RMSE:
sqrt(mean((d$density - d$cv_predicted)^2)) 
#> [1] 194.1635

# MAE:
mean(abs(d$density - d$cv_predicted))
#> [1] 49.19603

# by fold:
group_by(d, cv_fold) |> 
  summarize(
    rmse = sqrt(mean((density - cv_predicted)^2)),
    mae = mean(abs(density - cv_predicted))
  )
#> # A tibble: 3 × 3
#>   cv_fold  rmse   mae
#>     <int> <dbl> <dbl>
#> 1       1 176.   47.4
#> 2       2 270.   55.5
#> 3       3  87.8  44.5

^{Created on 2024-03-18 with reprex v2.1.0}

seanhardison1 · 2024-03-18T18:23:06Z

Great, thank you Sean! Seeing how easy it is to do this with the current function, I think just adding this example to the CV vignette would be a fine alternative.

seananderson · 2024-03-18T18:27:23Z

Good idea. I've re-opened the issue and renamed it.

seanhardison1 closed this as completed Mar 18, 2024

seananderson changed the title ~~RMSE and MAE for sdmTMB_cv~~ Add RMSE and MAE sdmTMB_cv examples to cross validation vignette Mar 18, 2024

seananderson reopened this Mar 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add RMSE and MAE sdmTMB_cv examples to cross validation vignette #322

Add RMSE and MAE sdmTMB_cv examples to cross validation vignette #322

seanhardison1 commented Mar 18, 2024 •

edited

Loading

seananderson commented Mar 18, 2024

seanhardison1 commented Mar 18, 2024

seananderson commented Mar 18, 2024

Add RMSE and MAE sdmTMB_cv examples to cross validation vignette #322

Add RMSE and MAE sdmTMB_cv examples to cross validation vignette #322

Comments

seanhardison1 commented Mar 18, 2024 • edited Loading

seananderson commented Mar 18, 2024

seanhardison1 commented Mar 18, 2024

seananderson commented Mar 18, 2024

seanhardison1 commented Mar 18, 2024 •

edited

Loading