Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MCMC seed not being respected? #61

Open
afmagee42 opened this issue Oct 15, 2024 · 1 comment · May be fixed by #72
Open

MCMC seed not being respected? #61

afmagee42 opened this issue Oct 15, 2024 · 1 comment · May be fixed by #72
Labels
bug Something isn't working

Comments

@afmagee42
Copy link
Collaborator

In retrospective_forecasting/main.py we set the fix passed to numpyro. However, either I'm missing something obvious or we have a bug somewhere which is overriding or interacting with this.

This is the trace plot for a parameter in one run of the analysis pipeline
image

And this is the same parameter in another run of the analysis pipeline
image

The set of models run both times was different, but that shouldn't impact the MCMC results for the same model (prior x likelihood) on the same data with the same MCMC seed. (Unless I'm missing something.)

@afmagee42 afmagee42 added the bug Something isn't working label Dec 16, 2024
@thanasibakis
Copy link
Collaborator

After poking around, it appears that linmod.data makes no guarantees about row order (due to pl.unique()), and models were coding discrete covariates in the order of appearance in the dataset.

When is this a problem?

  • As far as I can tell, only when trying to compare individual parameters' samples across different runs of the end-to-end pipeline
    • e.g. as output by convergence plots. (I don't think we save parameter samples anywhere else?)
  • We are safe within one run of the pipeline, as the same dataset is used throughout.

What can we do about it?

  • Sort the datasets before exporting in linmod.data
    • We should probably do this regardless of any other mitigations
  • Sort the datasets in each model's constructor, before creating discrete covariate codes
    • I'm not sure how I feel about this yet.
    • Pros: It makes life a little easier (once you know that we do this)
    • Cons: I consider it "unpredictable behavior", at least coming from R, where lm and friends encode discrete variables in the order of appearance (the way we do now) unless a factor with an explicit level ordering is used

@thanasibakis thanasibakis linked a pull request Jan 6, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants