You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi everyone in this forum. I am a PhD student at University of Massachusetts Amherst working on Bayesian inference and probabilistic programming, with my advisor Professor Dan Sheldon. Recently we have worked on projects about inference time marginalization inside HMC (https://arxiv.org/pdf/2302.00564, https://arxiv.org/pdf/2410.24079). In particular, in the second paper we find that in many linear mixed-effects models it could be beneficial to integrate out one set of random effects during HMC sampling. The core technique is to exploit a block-diagonal structure of a transformed model, akin to what was in lme4 (2.3 in https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf).
In our own (early) implementation inside BRMS, we find similar results. We added an argument to a forked BRMS to control the marginalization and implemented the corresponding Stan functions. We tried a simple model using the kidney dataset as belows:
fit1 <- brm(time ~ age + (age+1|disease*sex) + (1|patient), iter = 20000, data = kidney, family = "gaussian", prior = set_prior("cauchy(0,2)", class = "sd"), marginalize = NULL)
fit2 <- brm(time ~ age + (age+1|disease*sex) + (1|patient), iter = 20000, data = kidney, family = "gaussian", prior = set_prior("cauchy(0,2)", class = "sd"), marginalize = 'patient')
The running times are similar but HMC generates more effective samples if we choose to marginalize out the random effects of "patient". Without marginalization, the outputs from BRMS are
Before the next move, we would like to seek expert opinions on the topic, listed as the following questions.
The whole procedure assumes conjugacy, usually in the form of normal - (log) normal relationship. We are aware that the family of models is limited. In your experience, how widely are this type of models used in applied settings?
Do you think it could be beneficial to have a feature of inference time marginalization in BRMS, while keeping everything simple as above? We are thinking of two types of marginalization: one is to marginalize effects using some algorithms (ours, or possibly INLA); the other is to marginalize conjugate hyperparameters, done automatically (with minimal user specification) in the backend.
Many thanks to your attention.
The text was updated successfully, but these errors were encountered:
It is exciting to hear you working on such features on top of brms!
About the questions you raised, my current thoughts are as follows:
(1) In the context of brms, this is indeed a very special case but an important one. To prevent too much special case coding, I would currently prefer not to implement this special case even if it targets an important subclass of models.
(2) Having algorithms available that automatically marginalize at inference time could indeed be cool. But wouldn't such a feature rather have to go into Stan than brms? perhaps I am misunderstanding what your concrete plans are
Thank you so much for your feedback! Currently we are trying to develop a modular approach of marginalization in a forked version of BRMS. We hope users that need such an optimization could use it without much trouble.
Regarding marginalization in BRMS or in Stan, here are our thoughts. Marginalization is performed on an abstraction of the probabilistic model, which both BRMS and Stan can provide. However, it also requires a lot of structural information to be efficient, which is easier to obtain from BRMS than from Stan. Theoretically it is possible to do program tracing in Stan and match, for example, linear mixed-effects models from there, but we find it more direct to work on the formulas in BRMS.
That makes sense, thank you. I would love to see the forked version of brms once you feel it is is a good state. Then we can discuss further whether this fork should be merged into brms or if it should be staying stand-alone for now.
Hi everyone in this forum. I am a PhD student at University of Massachusetts Amherst working on Bayesian inference and probabilistic programming, with my advisor Professor Dan Sheldon. Recently we have worked on projects about inference time marginalization inside HMC (https://arxiv.org/pdf/2302.00564, https://arxiv.org/pdf/2410.24079). In particular, in the second paper we find that in many linear mixed-effects models it could be beneficial to integrate out one set of random effects during HMC sampling. The core technique is to exploit a block-diagonal structure of a transformed model, akin to what was in lme4 (2.3 in https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf).
In our own (early) implementation inside BRMS, we find similar results. We added an argument to a forked BRMS to control the marginalization and implemented the corresponding Stan functions. We tried a simple model using the kidney dataset as belows:
The running times are similar but HMC generates more effective samples if we choose to marginalize out the random effects of "patient". Without marginalization, the outputs from BRMS are
But with marginalization, the outputs from BRMS are
Before the next move, we would like to seek expert opinions on the topic, listed as the following questions.
Many thanks to your attention.
The text was updated successfully, but these errors were encountered: