Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle multivariate responses with HSGP #856

Merged
merged 6 commits into from
Dec 16, 2024

Conversation

tomicapretto
Copy link
Collaborator

This PR does two things.

  • Main: Make it possible to use HSGP with multivariate responses, such as a multinomial likelihood.
  • Adjacent: Make multivariate families use two dims. So far, it created an extra dimension which was not equal to the response dimension.

There's something I would like to clarify about HSGP with multivariate responses. With this PR it's possible to do something like

formula = "c(y1, y2, y3) ~ 0 + hsgp(x, m=30, c=2)"
hsgp_model = bmb.Model(formula, df, family="multinomial")

# Setting aliases for a nicer graph
hsgp_model.set_alias({"c(y1, y2, y3)": "result"})
hsgp_model.set_alias({"hsgp(x, m=30, c=2)": "hsgp"})
hsgp_model.build()
hsgp_model.graph()

and the graph will look like:

image

See all the dimensions of the response share the same priors for hsgp_sigma and hsgp_ell. Theoretically, it's possible to use a different coefficient for each response dimension. However, Bambi does not support that, and after some thought I decided it is fine that way.

The implementation is very complicated already, and it would be much more complicated if we decided to handle this. There's the by (and share_cov) argument in hsgp(), which could make us think we can use it for this purpose. However, that argument expects categories to be values within a given variable. In the multivariate family case, the dimensions are different columns (i.e. y1, y2, and y3 in the example above). So, at least, we would need a special way to tell Bambi to handle things differently in this special case.

On top of that, a multivariate model with an HSGP is already a fairly complex model. To have a more granular control of HSGP, one should use a PPL like PyMC.

I'm open to change my mind in the future, but for now, I think this is good enough.

@codecov-commenter
Copy link

codecov-commenter commented Nov 10, 2024

Codecov Report

Attention: Patch coverage is 69.23077% with 4 lines in your changes missing coverage. Please review.

Project coverage is 89.33%. Comparing base (516d7bd) to head (59f2f7a).
Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
bambi/backend/terms.py 60.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #856      +/-   ##
==========================================
- Coverage   89.71%   89.33%   -0.39%     
==========================================
  Files          47       47              
  Lines        3997     4030      +33     
==========================================
+ Hits         3586     3600      +14     
- Misses        411      430      +19     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@AlexAndorra AlexAndorra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @tomicapretto , thanks a lot for adding this!
I'd vote to allow share_cov=False also for these cases, but I understand it's hard to implement. I'm happy to contribute this though, if you think it's not too big

@tomicapretto tomicapretto merged commit 1559a97 into bambinos:main Dec 16, 2024
4 checks passed
@tomicapretto tomicapretto deleted the hsgp-multivariate-responses branch December 16, 2024 11:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants