Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expanding abundance when fitting Bernoulli-cloglog model #339

Open
James-Thorson-NOAA opened this issue May 4, 2024 · 2 comments
Open

Comments

@James-Thorson-NOAA
Copy link
Collaborator

One use-case for SDMs is to fit a logistic-regression using a cloglog link function, but then (instead of using the inverse-link to calculate the predictor) exponentiating the linear predictor to calculate abundance that could then be reported from predict or when calculating an area-weighted abundance index. This is, e.g., used by Gruss-Thorson-2019, and it matches the interpretation of a cloglog link as a thinned Poisson-point-process from the presence-only literature. It is implemented in VAST using ObsModel[13,1], as demonstrated in the combined-data demo.

I envision that we need some interface that separates (1) the inverse-link function that's applied when calculating the data likelihood from (2) the inverse-link function that's applied when calculating the predicted response for predict calls. I'd then copy the same interface for use in tinyVAST. Any ideas, or do you want to discuss @seananderson?

@James-Thorson-NOAA
Copy link
Collaborator Author

James-Thorson-NOAA commented May 6, 2024

To clarify the problem a bit more: If you have a count $C$ from a Poisson point process with intensity \lambda, then:

Pr(C>0) = \pi = 1 - e^{-\lambda)

We then might want to calculate the total intensity int_s{\lambda ds} as predicted abundance. A Bernoulli GLM with cloglog-link arises fits that likelihood using linear predictor log(\lambda), but we then want to treat it as log-linked when calculating the abundance-index.

I think the simplest option is to pass TMB two family arguments (perhaps family and family_pred, where one is used during the data-likelihood, and the other during expansion. By default, the family_pred argument is missing, and then fixed at family_pred = family. But advanced users could pass family_pred (perhaps via sdmTMBcontrol or via predict) that over-rides the default.

As I said, I need to implement this in tinyVAST e.g., to allow this demo to give the same index results when using the presence-absence or other variables as default factor level, and I'd love to keep in lock-step with sdmTMB regarding family interface.

@seananderson
Copy link
Member

Hmm. A few thoughts:

  1. Do we need the full second family architecture or would a second link (or vector of links) be enough? Perhaps that depends whether this might foreseeably be needed in delta families. Might one ever be using a delta model (say for biomass) and then only using the first linear predictor as part of an abundance index? Maybe? That's already baked into the Poisson-link families, I guess, with the log link.
  2. I was going to say I'm less fussed about the predict function, since the default is in link space anyways and the user can always inverse-link the linear predictor however they'd want, but perhaps a user would want to make conditional predictions with standard errors, say using only fixed effects.
  3. In a fit->predict->expansion workflow it's nice to have flexibility at the last stage for integration-specific options (like an area vector). So, at the very least, perhaps integrate_output()/get_index() should have this extra argument. Since integrate_output() skips over the prediction object, I presume the argument should be there.
  4. I'd like to keep 'control' about optimizer settings and similar if possible.
  5. I presume defining a bernoulli() family with an extra argument wouldn't work? That gets around binomial() being a built-in function. I guess this could also apply to binomial with trials > 1.

Assuming it might be used in a standard delta model and that a new Bernoulli wouldn't solve it, yes, something like family_pred in integrate_output(), and optionally also predict(), would work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants