Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add vignette on benchmarking model options #695

Open
wants to merge 81 commits into
base: main
Choose a base branch
from

Conversation

jamesmbaazam
Copy link
Contributor

@jamesmbaazam jamesmbaazam commented Jun 13, 2024

Description

This PR closes #629.

Initial submission checklist

  • My PR is based on a package issue and I have explicitly linked it.
  • I have tested my changes locally (using devtools::test() and devtools::check()).
  • I have added or updated unit tests where necessary.
  • I have updated the documentation if required and rebuilt docs if yes (using devtools::document()).
  • I have followed the established coding standards (and checked using lintr::lint_package()).
  • I have added a news item linked to this PR.

After the initial Pull Request

  • I have reviewed Checks for this PR and addressed any issues as far as I am able.

@jamesmbaazam
Copy link
Contributor Author

jamesmbaazam commented Jun 17, 2024

As an update, the {rstan} models are working fine but the {cmdstanr} models are giving various errors that I will report on.

Error when using laplace() method from {cmdstanr}.

library(EpiNow2)
library(cmdstanr)
# Set the number of cores to use
options(mc.cores = 4)

# Generation time
generation_time <- Gamma(
  shape = Normal(1.3, 0.3),
  rate = Normal(0.37, 0.09),
  max = 14
)

# Incubation period
incubation_period <- LogNormal(
  meanlog = Normal(1.6, 0.05),
  sdlog = Normal(0.5, 0.05),
  max = 14
)

# Reporting delay
reporting_delay <- LogNormal(
  meanlog = 0.5,
  sdlog = 0.5,
  max = 10
)

# Combine the incubation period and reporting delay into one delay
delay <- incubation_period + reporting_delay

# Observation model options
obs <- obs_opts(
  scale = list(mean = 0.1, sd = 0.025),
  return_likelihood = TRUE
)

# Run model
epinow(
  data = example_confirmed,
  generation_time = generation_time_opts(generation_time),
  delays = delay_opts(delay),
  obs = obs,
  horizon = 0,
  rt = NULL,
  stan = stan_opts(
    method = "laplace",
    backend = "cmdstanr"
  )
)

Error in fit_model_approximate(args, id = id) : 
  Approximate inference failed due to: Error: 'jacobian' argument to optimize and laplace must match!
laplace was called with jacobian=TRUE
optimize was run with jacobian=TRUE

Not an informative error message from {cmdstanr}, I should say.

After playing around with the example above with different combinations of mode=NULL or unspecified and jacobian, and the example in laplace(), there seems to be a weird/erroneous interaction between the two arguments. It seems that the option to set mode = NULL may not be implemented? (I am yet to confirm this).

Moreover, according to the documentation of laplace(), setting mode = NULL and jacobian = TRUE/FALSE (through stan_opts() in EpiNow2) should work but I get the same error. It seems we may have to run optimise() first and pass the output to laplace().

Am I missing anything? Thoughts?? @sbfnk @seabbs

@sbfnk
Copy link
Contributor

sbfnk commented Jul 12, 2024

I can't reproduce this - do you need to update cmdstanr or cmdstan (using cmdstanr::install_cmdstan()) possibly?

My versions are:

devtools::package_info("cmdstanr") |>
  dplyr::filter(package == "cmdstanr")
#>  package  * version    date (UTC) lib source
#>  cmdstanr   0.8.1.9000 2024-06-23 [1] Github (stan-dev/cmdstanr@9878dda)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library
cmdstanr::cmdstan_version()
#> [1] "2.35.0"

Created on 2024-07-12 with reprex v2.1.0

@jamesmbaazam
Copy link
Contributor Author

Thanks. It's fixed now after updating.

@jamesmbaazam
Copy link
Contributor Author

jamesmbaazam commented Jul 15, 2024

I have now pushed the vignette with the run of all the models using MCMC (a472439).

To do:

  • Add a model using "vb" from cmdstanr.
    - UPDATE: This now runs and I've added a custom function to extract the samples for downstream analyses.
  • Run the {cmdstanr} approximate methods ("pathfinder" and "laplace").
    - Blockers: the two methods are not even initialising. (cc: @sbfnk @seabbs)
    - UPDATE: Have now gotten to the root of one of the issues and created an issue here pathfinder fails for large case reports #728.

Settings and errors so far

Pathfinder errors

num_paths = 1

stan = stan_opts(
      method = "pathfinder",
      backend = "cmdstanr",
      num_paths = 1,
    )
# Error in fit_model_approximate(args, id = id) : 
#   Approximate inference failed due to: Optimization terminated with error: Line search failed to achieve a sufficient decrease, no more progress can be made Optimization failed to start, pathfinder cannot be run.

num_paths > 1


stan = stan_opts(
      method = "pathfinder",
      backend = "cmdstanr",
      num_paths = 2,
)

# Error in fit_model_approximate(args, id = id) : 
# Approximate inference failed due to: No pathfinders ran successfully

Increase trials from default of 10 to 50 (with multipathfinder default)

stan = stan_opts(
      method = "pathfinder",
      backend = "cmdstanr",
      trials = 50
)

# Error in fit_model_approximate(args, id = id) : 
# Approximate inference failed due to: No pathfinders ran successfully

@jamesmbaazam
Copy link
Contributor Author

I noticed something interesting about pathfinder struggling to fit large case reports and have created a separate issue for it here #728.

@sbfnk The only thing in the way of this vignette getting merged is the struggle to get pathfinder and laplace working. If we don't want to delay this any further, we could finalise this version, mention that a future enhancement will include those two methods, and get this merged, then add them as an enhancement when I've figured them out. All the models including vb from {rstan} and {cmdstanr} are working currently. I will precompile them all and push.

@jamesmbaazam
Copy link
Contributor Author

jamesmbaazam commented Jul 25, 2024

From meeting with Seb today:

  • Finalise current vignette by running all models, leaving out Pathfinder and Laplace for future enhancement.
  • Add more metrics alongside run time:
    • total CRPS for estimates on complete data, partial data, and forecasting.
    • CRPS at the last time point (real-time performance)

@jamesmbaazam jamesmbaazam force-pushed the vig-speedup-options branch from a6f38ba to fef8991 Compare July 25, 2024 13:23
Copy link
Contributor

github-actions bot commented Jan 29, 2025

Thank you for your contribution jamesmbaazam 🚀! Your synthetic_recovery markdown is ready for download 👉 here 👈!
(The artifact expires on 2025-02-03T16:09:44Z. You can re-generate it by re-running the workflow here.)

@jamesmbaazam jamesmbaazam marked this pull request as ready for review January 29, 2025 15:53
@kaitejohnson
Copy link
Contributor

@jamesmbaazam Would it be possible to share a rendered html, since I don't think we've set anything up yet to post a preview of the articles as a comment (though we could)

@jamesmbaazam
Copy link
Contributor Author

@jamesmbaazam Would it be possible to share a rendered html, since I don't think we've set anything up yet to post a preview of the articles as a comment (though we could)

We usually pre-compile the vignettes from the Rmd.orig version to Rmd, which can be knit locally for review.

Copy link
Contributor

@sbfnk sbfnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, this is a lot of great work!

General comment is that it has elements of a vignette and elements of a paper so it's not quite clear where it stands on the distinction between the two. I would say for a vignette it's a bit too complex defining functions calling other functions etc. - for a paper it definitely has some great new content but the style is more like a walkthrough/vignette.

With that in mind I have focused on code correctness / implementation in my review so far. We can discuss the text later perhaps.

Anyway, very cool stuff and really interesting!

0.8 + 0.02 * 1:20
)
# Add Gaussian noise
R_noisy <- R * rnorm(length(R), 1.1, 0.05)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shifts the R trajectory up by 10% - is this intended? If you just want to add a bit of noise it should be

Suggested change
R_noisy <- R * rnorm(length(R), 1.1, 0.05)
R_noisy <- R * rnorm(length(R), 1, 0.05)

- `infections_true`: the infections by date of infection, and
- `reported_cases_true`: the reported cases by date of report.
```{r extract-true-data}
R_true <- forecast$summarised[variable == "R"]$median
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you do 10 samples and the median for R but not infections? I think for a fair comparison of the methods you'd just sample 1 and then take this noisy trajectory as truth data (possibly doing it 10 times overall but this requires 10 times the computation).


Below we describe each model.
```{r model-descriptions,echo = FALSE}
model_descriptions <- dplyr::tribble(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code for constructing this tibble is quite hard to read and edit. Might be better to construct this from the individual options rather than hard-coding. E.g. you could get the same table using

library("tidyr")
models <- c(
  default = "Default model (non-stationary prior on $R_t$)",
  non_mechanistic = "no mechanistic prior on $R_t$",
  rw7 = "7-day random walk prior on $R_t$",
  non_residual = "Stationary prior on $R_t$"
)
inference <- c(
  mcmc = "mcmc",
  vb = "variational bayes",
  pathfinder = "pathfinder algorithm", 
  laplace = "laplace approximation"
)

model_description <- expand_grid(
  model_basename = names(models),
  inference_method = names(inference) 
) |>
  unite(model, c(model_basename, inference_method), remove = FALSE) |>
  mutate(description = paste0(
    models[model_basename], "; fitting with", inference[inference_method])
  ) |>
  select(-inference_method)


These are the components of each model.
```{r model-components,echo = FALSE}
model_components <- dplyr::tribble(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above for constructing this

stan = stan_opts(method = "laplace", backend = "cmdstanr")
),
# The 7-day RW Rt model with MCMC fitting
rw7_mcmc = list(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does a 7-day random walk added to a Gaussian Process - is that intended? I think you want GP = NULL.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this was a model you were investigating at on point?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Me? Possibly but I don't think I would recommend it - given all the existing identifiability issues I don't think we want to layer on more flexibility.

if (!inherits(estimates, c("matrix"))) return(rep(NA_real_, length(truth)))
# Assumes that the estimates object is structured with the samples as rows
shortest_obs_length <- min(ncol(estimates), length(truth))
reduced_truth <- tail(truth, shortest_obs_length)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be head? I.e. your estimates correspond to the initial snapshot of the data (starting at 1) not the end (at least for now).

shortest_obs_length <- min(ncol(estimates), length(truth))
reduced_truth <- tail(truth, shortest_obs_length)
estimates_transposed <- t(estimates) # transpose to have samples as columns
reduced_estimates <- tail(estimates_transposed, shortest_obs_length)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here head vs tail

Comment on lines +505 to +506
log10(reduced_truth),
log10(reduced_estimates)
Copy link
Contributor

@sbfnk sbfnk Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only works if not ever zero (so perhaps fine if it works)

estimates_transposed <- t(estimates) # transpose to have samples as columns
reduced_estimates <- tail(estimates_transposed, shortest_obs_length)
return(
crps_sample(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you're only using crps_sample() from scoringutils would it make sense to use scoringRules instead? As the scoringutils version is just a thin wrapper.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dude

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(how are we going to goose downloads with that approach ;) )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More seriously I think this indicates we should just use more of the features of scoringutils

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, ultimately related also to #618


```{r process-crps}
# Process CRPS for Rt
rt_crps <- process_crps(results, "R", R)
Copy link
Contributor

@sbfnk sbfnk Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be good to look at the estimate on the day of the inference for a real-time measure and then e.g. the 7-day forecast rather than the whole "estimate from partial data" / "forecast" time series. Debatable perhaps but if you use the whole series I think it might make more sense to use the multivariate CRPS / energy score (which is not yet implemented in scoringutils but is available in scoringRules::es_sample().

@sbfnk
Copy link
Contributor

sbfnk commented Jan 31, 2025

I think we can push this to the next release to give us a bit more time to review content and scope - no need to rush it into 1.7.0 (which is otherwise pretty much ready to go).

Copy link
Contributor

@kaitejohnson kaitejohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jamesmbaazam Mostly a lot of questions, feel free to ignore any of these as I am sure you have already discussed a lot of it.

Overall this is a really nice analysis. From an outside perspective, I am really interested in how the approximate methods compare in terms of forecast performance to MCMC. Also, it wasn't clear that these were evaluated only on the out-of-sample data, and if not, separately evaluating the out-of-sample forecast performance seems important (but I may have missed this).

In using `{EpiNow2}`, users will often be faced with two decision points that will guide their choice of an appropriate model: (i) use case: retrospective vs real-time analysis, and (ii) limited computing resources. `{EpiNow2}` provides a range of customisations of the default model to suit these decision points.

The aim of this vignette is to show how these model customisations affect the speed and accuracy of the model. We will benchmark four (4) `{EpiNow2}` model options chosen to cover a range of use cases. We will benchmark the default model, the model with the 7-day random walk prior on $R_t$, the non-mechanistic model, which has no explicit prior on $R_t$, and the non-residual $R_t$ model, which assumes a stationary prior on $R_t$.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this reads, it seems like these differences all about the prior, but my understanding is it is also about the choice of how R(t) evolves in time? E.g. via a smmothed GP, a random walk with a 7 day time step, and the other two (which I won't pretend I understand how they evolve in time). Would consider a slight rephrasing.

The aim of this vignette is to show how these model customisations affect the speed and accuracy of the model. We will benchmark four (4) `{EpiNow2}` model options chosen to cover a range of use cases. We will benchmark the default model, the model with the 7-day random walk prior on $R_t$, the non-mechanistic model, which has no explicit prior on $R_t$, and the non-residual $R_t$ model, which assumes a stationary prior on $R_t$.

We will analyse the estimation and forecasting performance of these models when solved with MCMC sampling. `{EpiNow2}` also provides approximate sampling methods to solve these models ([variational inference](https://mc-stan.org/docs/cmdstan-guide/variational_config.html), [pathfinder method](https://mc-stan.org/docs/cmdstan-guide/pathfinder_config.html), and [laplace sampling](https://mc-stan.org/docs/cmdstan-guide/laplace_sample_config.html)). These algorithms are, however, not recommended for use in pipelines, so we will not emphasise their use. We will, however, highlight how the models perform when solved with these approximate sampling methods and provide an overview of when they may be appropriate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicky and feel free to ignore but MCMC is also an approximation of the posterior, so might want to rephrase slightly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps use language "production pipelines" , and explain that they can be useful for developing and testing, and where things might go wrong e.g. the variance

# Rt prior
rt_prior_default <- Normal(2, 0.1)
# Number of cores
options(mc.cores = 6)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment that you set to 6 bc your computer has 8 cores or something along those lines

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or do some kind of core detection probably.

R = R_noisy,
samples = 10
)
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a very outside perspective, it strikes me as a little bit odd to do a fit to the example_confirmed, and then create a separate true R(t), and then generate new simulated data, rather than just doing a forward simulation.

But I assume there are reasons for this choice that have already been discussed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha we are basically being extremely clever and pragmatic but I agree we should tell people this

Copy link
Contributor

@seabbs seabbs Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that by using the posterior for everything but the Rt trajectory you are getting a simulation that very accurately reflects a real world setting whilst still being able to know and change the true Rt trajectory

cases_traj
```

![plot of chunk plot-cases](benchmarks-plot-cases-1.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comes up in the knitted Rmd under the plot as plot of chunk plot-cases, perhaps just want to label them as Figure. # with a caption?

```

![plot of chunk crps-plotting-rt-total](benchmarks-crps-plotting-rt-total-1.png)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be easier to read if you made this into a bar chart since we were just looking at these as colored lines.

```

![plot of chunk plot-rt-tv-crps-approx](benchmarks-plot-rt-tv-crps-approx-1.png)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have some explanation of why variational basyes failed for nearly all of the models in the decline phase/generally why they are failing

Estimation in `{EpiNow2}` using the semi-mechanistic approaches (putting a prior on $R_t$) is often much slower than the non-mechanistic approach. The mechanistic model is slower because it models aspects of the processes and mechanisms that drive $R_t$ estimates using the renewal equation. The non-mechanistic model, on the other hand, runs much faster but does not use the renewal equation to generate infections. Because of this none of the options defining the behaviour of the reproduction number are available in this case, limiting its flexibility.

It's worth noting that the non-mechanistic model in `{EpiNow2}` is equivalent to that used in the [`{EpiEstim}`](https://mrc-ide.github.io/EpiEstim/index.html) R package as they both estimate $R_t$ from case time series and the generation interval distribution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth noting that its likely that this is more sensitive to GI misspecification (I'd assume?). Also I am guessing the difference is that EpiEstim would require specifying just the mean delay, where the non-mechanistic EpiNow2 is able to incorporate the full delay distribution, which is likely to be important for fitting to more delayed data

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you clarify a bit what you mean bit by this there? The renewal?

Also I am guessing the difference is that EpiEstim would require specifying just the mean delay, where the non-mechanistic EpiNow2 is able to incorporate the full delay distribution, which is likely to be important for fitting to more delayed data

It doesn't take any delays at all and the approach to smoothing is different (here it is being fit). We should probably talk about this somewhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of EpiEstim is that it does a direct computation of the posterior estimate of R(t) based on the cases and the GI, assuming cases are just latent incident infections shifted by some mean delay.

So when its written that the non-mechanistic one is like EpiEstim, I assume it does something similar, but since the input has the user specify a full delay distribution, I (perhaps incorrectly) assumed it is using that full distribution (rather than just the mean).

What do you mean it doesn't take any delays?

Copy link
Contributor

@seabbs seabbs Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EpiEstim doesnt do anything with delays and uses the serial interval not the GT. I think what the text means here is that underlying infection generating process Epinow2 and EpiEstim uses are the same but not the rest of the model and that is infact only partially true because of differences in the latent model and differences in the uncertainty. Personally I would just nuke this paragraph.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok yeah I don't think I understand how the non-mechanistic model works then!

labs(caption = "Where a model is not shown, it means it failed to run")
```

![plot of chunk plot-infections-total-crps-approx](benchmarks-plot-infections-total-crps-approx-1.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also plot a comparison of the MCMC vs approximate methods for R(t) estimation and infections? There is a lot of language in here around the approximate methods not being well-tested, but this exercise seems like an evaluation of them! Would be good to show those results directly I think

infections_crps_mcmc_plot +
facet_wrap(~epidemic_phase, ncol = 1)
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't clear to me that all of these are being evaluated in both the calibration period and the forecast period, or just in the 7 day forecast period?

Doesn't this mix in-sample and out-of-sample evaluation? I am most interested in the out of sample evaluation performance (specifically for the infections)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Change Getting started documentation to point users to faster models Vignette on approximations and speedups
4 participants