You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In retrospective_forecasting/main.py we set the fix passed to numpyro. However, either I'm missing something obvious or we have a bug somewhere which is overriding or interacting with this.
This is the trace plot for a parameter in one run of the analysis pipeline
And this is the same parameter in another run of the analysis pipeline
The set of models run both times was different, but that shouldn't impact the MCMC results for the same model (prior x likelihood) on the same data with the same MCMC seed. (Unless I'm missing something.)
The text was updated successfully, but these errors were encountered:
After poking around, it appears that linmod.data makes no guarantees about row order (due to pl.unique()), and models were coding discrete covariates in the order of appearance in the dataset.
When is this a problem?
As far as I can tell, only when trying to compare individual parameters' samples across different runs of the end-to-end pipeline
e.g. as output by convergence plots. (I don't think we save parameter samples anywhere else?)
We are safe within one run of the pipeline, as the same dataset is used throughout.
What can we do about it?
Sort the datasets before exporting in linmod.data
We should probably do this regardless of any other mitigations
Sort the datasets in each model's constructor, before creating discrete covariate codes
I'm not sure how I feel about this yet.
Pros: It makes life a little easier (once you know that we do this)
Cons: I consider it "unpredictable behavior", at least coming from R, where lm and friends encode discrete variables in the order of appearance (the way we do now) unless a factor with an explicit level ordering is used
In
retrospective_forecasting/main.py
we set the fix passed to numpyro. However, either I'm missing something obvious or we have a bug somewhere which is overriding or interacting with this.This is the trace plot for a parameter in one run of the analysis pipeline
And this is the same parameter in another run of the analysis pipeline
The set of models run both times was different, but that shouldn't impact the MCMC results for the same model (prior x likelihood) on the same data with the same MCMC seed. (Unless I'm missing something.)
The text was updated successfully, but these errors were encountered: