-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add vignette on benchmarking model options #695
base: main
Are you sure you want to change the base?
Conversation
As an update, the Error when using library(EpiNow2)
library(cmdstanr)
# Set the number of cores to use
options(mc.cores = 4)
# Generation time
generation_time <- Gamma(
shape = Normal(1.3, 0.3),
rate = Normal(0.37, 0.09),
max = 14
)
# Incubation period
incubation_period <- LogNormal(
meanlog = Normal(1.6, 0.05),
sdlog = Normal(0.5, 0.05),
max = 14
)
# Reporting delay
reporting_delay <- LogNormal(
meanlog = 0.5,
sdlog = 0.5,
max = 10
)
# Combine the incubation period and reporting delay into one delay
delay <- incubation_period + reporting_delay
# Observation model options
obs <- obs_opts(
scale = list(mean = 0.1, sd = 0.025),
return_likelihood = TRUE
)
# Run model
epinow(
data = example_confirmed,
generation_time = generation_time_opts(generation_time),
delays = delay_opts(delay),
obs = obs,
horizon = 0,
rt = NULL,
stan = stan_opts(
method = "laplace",
backend = "cmdstanr"
)
)
Error in fit_model_approximate(args, id = id) :
Approximate inference failed due to: Error: 'jacobian' argument to optimize and laplace must match!
laplace was called with jacobian=TRUE
optimize was run with jacobian=TRUE Not an informative error message from After playing around with the example above with different combinations of Moreover, according to the documentation of |
I can't reproduce this - do you need to update My versions are: devtools::package_info("cmdstanr") |>
dplyr::filter(package == "cmdstanr")
#> package * version date (UTC) lib source
#> cmdstanr 0.8.1.9000 2024-06-23 [1] Github (stan-dev/cmdstanr@9878dda)
#>
#> [1] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library
cmdstanr::cmdstan_version()
#> [1] "2.35.0" Created on 2024-07-12 with reprex v2.1.0 |
Thanks. It's fixed now after updating. |
I have now pushed the vignette with the run of all the models using MCMC (a472439). To do:
Settings and errors so farPathfinder errors
|
I noticed something interesting about @sbfnk The only thing in the way of this vignette getting merged is the struggle to get |
From meeting with Seb today:
|
a6f38ba
to
fef8991
Compare
bfe0e4c
to
76e9161
Compare
2736b9c
to
b6e89ac
Compare
@jamesmbaazam Would it be possible to share a rendered html, since I don't think we've set anything up yet to post a preview of the articles as a comment (though we could) |
We usually pre-compile the vignettes from the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, this is a lot of great work!
General comment is that it has elements of a vignette and elements of a paper so it's not quite clear where it stands on the distinction between the two. I would say for a vignette it's a bit too complex defining functions calling other functions etc. - for a paper it definitely has some great new content but the style is more like a walkthrough/vignette.
With that in mind I have focused on code correctness / implementation in my review so far. We can discuss the text later perhaps.
Anyway, very cool stuff and really interesting!
0.8 + 0.02 * 1:20 | ||
) | ||
# Add Gaussian noise | ||
R_noisy <- R * rnorm(length(R), 1.1, 0.05) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shifts the R trajectory up by 10% - is this intended? If you just want to add a bit of noise it should be
R_noisy <- R * rnorm(length(R), 1.1, 0.05) | |
R_noisy <- R * rnorm(length(R), 1, 0.05) |
- `infections_true`: the infections by date of infection, and | ||
- `reported_cases_true`: the reported cases by date of report. | ||
```{r extract-true-data} | ||
R_true <- forecast$summarised[variable == "R"]$median |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you do 10 samples and the median for R but not infections? I think for a fair comparison of the methods you'd just sample 1 and then take this noisy trajectory as truth data (possibly doing it 10 times overall but this requires 10 times the computation).
|
||
Below we describe each model. | ||
```{r model-descriptions,echo = FALSE} | ||
model_descriptions <- dplyr::tribble( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code for constructing this tibble is quite hard to read and edit. Might be better to construct this from the individual options rather than hard-coding. E.g. you could get the same table using
library("tidyr")
models <- c(
default = "Default model (non-stationary prior on $R_t$)",
non_mechanistic = "no mechanistic prior on $R_t$",
rw7 = "7-day random walk prior on $R_t$",
non_residual = "Stationary prior on $R_t$"
)
inference <- c(
mcmc = "mcmc",
vb = "variational bayes",
pathfinder = "pathfinder algorithm",
laplace = "laplace approximation"
)
model_description <- expand_grid(
model_basename = names(models),
inference_method = names(inference)
) |>
unite(model, c(model_basename, inference_method), remove = FALSE) |>
mutate(description = paste0(
models[model_basename], "; fitting with", inference[inference_method])
) |>
select(-inference_method)
|
||
These are the components of each model. | ||
```{r model-components,echo = FALSE} | ||
model_components <- dplyr::tribble( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above for constructing this
stan = stan_opts(method = "laplace", backend = "cmdstanr") | ||
), | ||
# The 7-day RW Rt model with MCMC fitting | ||
rw7_mcmc = list( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does a 7-day random walk added to a Gaussian Process - is that intended? I think you want GP = NULL
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this was a model you were investigating at on point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Me? Possibly but I don't think I would recommend it - given all the existing identifiability issues I don't think we want to layer on more flexibility.
if (!inherits(estimates, c("matrix"))) return(rep(NA_real_, length(truth))) | ||
# Assumes that the estimates object is structured with the samples as rows | ||
shortest_obs_length <- min(ncol(estimates), length(truth)) | ||
reduced_truth <- tail(truth, shortest_obs_length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be head
? I.e. your estimates correspond to the initial snapshot of the data (starting at 1) not the end (at least for now).
shortest_obs_length <- min(ncol(estimates), length(truth)) | ||
reduced_truth <- tail(truth, shortest_obs_length) | ||
estimates_transposed <- t(estimates) # transpose to have samples as columns | ||
reduced_estimates <- tail(estimates_transposed, shortest_obs_length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here head
vs tail
log10(reduced_truth), | ||
log10(reduced_estimates) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only works if not ever zero (so perhaps fine if it works)
estimates_transposed <- t(estimates) # transpose to have samples as columns | ||
reduced_estimates <- tail(estimates_transposed, shortest_obs_length) | ||
return( | ||
crps_sample( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you're only using crps_sample()
from scoringutils
would it make sense to use scoringRules
instead? As the scoringutils
version is just a thin wrapper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dude
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(how are we going to goose downloads with that approach ;) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More seriously I think this indicates we should just use more of the features of scoringutils
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, ultimately related also to #618
|
||
```{r process-crps} | ||
# Process CRPS for Rt | ||
rt_crps <- process_crps(results, "R", R) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if it would be good to look at the estimate on the day of the inference for a real-time measure and then e.g. the 7-day forecast rather than the whole "estimate from partial data" / "forecast" time series. Debatable perhaps but if you use the whole series I think it might make more sense to use the multivariate CRPS / energy score (which is not yet implemented in scoringutils
but is available in scoringRules::es_sample()
.
I think we can push this to the next release to give us a bit more time to review content and scope - no need to rush it into 1.7.0 (which is otherwise pretty much ready to go). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jamesmbaazam Mostly a lot of questions, feel free to ignore any of these as I am sure you have already discussed a lot of it.
Overall this is a really nice analysis. From an outside perspective, I am really interested in how the approximate methods compare in terms of forecast performance to MCMC. Also, it wasn't clear that these were evaluated only on the out-of-sample data, and if not, separately evaluating the out-of-sample forecast performance seems important (but I may have missed this).
In using `{EpiNow2}`, users will often be faced with two decision points that will guide their choice of an appropriate model: (i) use case: retrospective vs real-time analysis, and (ii) limited computing resources. `{EpiNow2}` provides a range of customisations of the default model to suit these decision points. | ||
|
||
The aim of this vignette is to show how these model customisations affect the speed and accuracy of the model. We will benchmark four (4) `{EpiNow2}` model options chosen to cover a range of use cases. We will benchmark the default model, the model with the 7-day random walk prior on $R_t$, the non-mechanistic model, which has no explicit prior on $R_t$, and the non-residual $R_t$ model, which assumes a stationary prior on $R_t$. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this reads, it seems like these differences all about the prior, but my understanding is it is also about the choice of how R(t) evolves in time? E.g. via a smmothed GP, a random walk with a 7 day time step, and the other two (which I won't pretend I understand how they evolve in time). Would consider a slight rephrasing.
The aim of this vignette is to show how these model customisations affect the speed and accuracy of the model. We will benchmark four (4) `{EpiNow2}` model options chosen to cover a range of use cases. We will benchmark the default model, the model with the 7-day random walk prior on $R_t$, the non-mechanistic model, which has no explicit prior on $R_t$, and the non-residual $R_t$ model, which assumes a stationary prior on $R_t$. | ||
|
||
We will analyse the estimation and forecasting performance of these models when solved with MCMC sampling. `{EpiNow2}` also provides approximate sampling methods to solve these models ([variational inference](https://mc-stan.org/docs/cmdstan-guide/variational_config.html), [pathfinder method](https://mc-stan.org/docs/cmdstan-guide/pathfinder_config.html), and [laplace sampling](https://mc-stan.org/docs/cmdstan-guide/laplace_sample_config.html)). These algorithms are, however, not recommended for use in pipelines, so we will not emphasise their use. We will, however, highlight how the models perform when solved with these approximate sampling methods and provide an overview of when they may be appropriate. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpicky and feel free to ignore but MCMC is also an approximation of the posterior, so might want to rephrase slightly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps use language "production pipelines" , and explain that they can be useful for developing and testing, and where things might go wrong e.g. the variance
# Rt prior | ||
rt_prior_default <- Normal(2, 0.1) | ||
# Number of cores | ||
options(mc.cores = 6) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment that you set to 6 bc your computer has 8 cores or something along those lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or do some kind of core detection probably.
R = R_noisy, | ||
samples = 10 | ||
) | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From a very outside perspective, it strikes me as a little bit odd to do a fit to the example_confirmed
, and then create a separate true R(t), and then generate new simulated data, rather than just doing a forward simulation.
But I assume there are reasons for this choice that have already been discussed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha we are basically being extremely clever and pragmatic but I agree we should tell people this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that by using the posterior for everything but the Rt trajectory you are getting a simulation that very accurately reflects a real world setting whilst still being able to know and change the true Rt trajectory
cases_traj | ||
``` | ||
|
||
![plot of chunk plot-cases](benchmarks-plot-cases-1.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comes up in the knitted Rmd under the plot as plot of chunk plot-cases
, perhaps just want to label them as Figure. # with a caption?
``` | ||
|
||
![plot of chunk crps-plotting-rt-total](benchmarks-crps-plotting-rt-total-1.png) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be easier to read if you made this into a bar chart since we were just looking at these as colored lines.
``` | ||
|
||
![plot of chunk plot-rt-tv-crps-approx](benchmarks-plot-rt-tv-crps-approx-1.png) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to have some explanation of why variational basyes failed for nearly all of the models in the decline phase/generally why they are failing
Estimation in `{EpiNow2}` using the semi-mechanistic approaches (putting a prior on $R_t$) is often much slower than the non-mechanistic approach. The mechanistic model is slower because it models aspects of the processes and mechanisms that drive $R_t$ estimates using the renewal equation. The non-mechanistic model, on the other hand, runs much faster but does not use the renewal equation to generate infections. Because of this none of the options defining the behaviour of the reproduction number are available in this case, limiting its flexibility. | ||
|
||
It's worth noting that the non-mechanistic model in `{EpiNow2}` is equivalent to that used in the [`{EpiEstim}`](https://mrc-ide.github.io/EpiEstim/index.html) R package as they both estimate $R_t$ from case time series and the generation interval distribution. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth noting that its likely that this is more sensitive to GI misspecification (I'd assume?). Also I am guessing the difference is that EpiEstim would require specifying just the mean delay, where the non-mechanistic EpiNow2 is able to incorporate the full delay distribution, which is likely to be important for fitting to more delayed data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you clarify a bit what you mean bit by this there? The renewal?
Also I am guessing the difference is that EpiEstim would require specifying just the mean delay, where the non-mechanistic EpiNow2 is able to incorporate the full delay distribution, which is likely to be important for fitting to more delayed data
It doesn't take any delays at all and the approach to smoothing is different (here it is being fit). We should probably talk about this somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding of EpiEstim is that it does a direct computation of the posterior estimate of R(t) based on the cases and the GI, assuming cases are just latent incident infections shifted by some mean delay.
So when its written that the non-mechanistic one is like EpiEstim, I assume it does something similar, but since the input has the user specify a full delay distribution, I (perhaps incorrectly) assumed it is using that full distribution (rather than just the mean).
What do you mean it doesn't take any delays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EpiEstim doesnt do anything with delays and uses the serial interval not the GT. I think what the text means here is that underlying infection generating process Epinow2 and EpiEstim uses are the same but not the rest of the model and that is infact only partially true because of differences in the latent model and differences in the uncertainty. Personally I would just nuke this paragraph.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok yeah I don't think I understand how the non-mechanistic model works then!
labs(caption = "Where a model is not shown, it means it failed to run") | ||
``` | ||
|
||
![plot of chunk plot-infections-total-crps-approx](benchmarks-plot-infections-total-crps-approx-1.png) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also plot a comparison of the MCMC vs approximate methods for R(t) estimation and infections? There is a lot of language in here around the approximate methods not being well-tested, but this exercise seems like an evaluation of them! Would be good to show those results directly I think
infections_crps_mcmc_plot + | ||
facet_wrap(~epidemic_phase, ncol = 1) | ||
``` | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't clear to me that all of these are being evaluated in both the calibration period and the forecast period, or just in the 7 day forecast period?
Doesn't this mix in-sample and out-of-sample evaluation? I am most interested in the out of sample evaluation performance (specifically for the infections)
Description
This PR closes #629.
Initial submission checklist
devtools::test()
anddevtools::check()
).devtools::document()
).lintr::lint_package()
).After the initial Pull Request