Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable Fit Intercept #577

Open
mersoz-rccl opened this issue Oct 11, 2024 · 3 comments
Open

Disable Fit Intercept #577

mersoz-rccl opened this issue Oct 11, 2024 · 3 comments

Comments

@mersoz-rccl
Copy link

It seems like it's not possible to disable intercept fit. Is there a work around to have intercept be zero? I'm not taking about setting it to zero after train, but having it zero to begin with, so that other futures adjust accordingly.

@paulbkoch
Copy link
Collaborator

Hi @mersoz-rccl -- I can see why you'd want to do this for linear regression. I could be wrong, but my guess is that it wouldn't have a measurable benefit for EBMs since they are much more flexible in what they can fit. I'd want to see a good example where it makes a difference before adding more complexity to the API given there's already a reasonable post-process solution of zeroing the intercept.

If you'd like to investigate this yourself, the place to modify the code that would implement it would be to add the following code:

update = booster.get_term_update()
update -= update[zeroith[term_idx]]
booster.set_term_update(term_idx, update)

In this location:

The zeroith array needs to be passed into the boost function somehow. That needs to contain the per-term index of the bin that represents zero in the original data before binning. Probably the easiest way to do this, if you wanted to, would be to pass in the bin definitions via "feature_types" in the public API. If you pass in the bin definition [-1.1, 2.2, 3.3] then you can hard code the zero-index 2 for that term because the bins will be: "missing", [-inf, -1.1), [-1.1, 2.2), [2.2, 3.3), [3.3, +inf], "unknown". See: https://interpret.ml/docs/ebm-internals-regression.html

@mersoz-rccl
Copy link
Author

@paulbkoch hi!

My use case for wanting to disable intercept is with physical system or event modelling, where values of zero input needs to be mapped to zero target prediction.

Most cases are with 1D and for those you probably shouldn't even use EBM. But we can imagine cases where its applicable with multi target scenarios where zero of all features needs to be mapped to zero value of the target.

Now imagine that you use EBM model as part of your cost function to minimize a problem. E.g power generation optimization problem, each generator in a power plant can have an EBM model to predict fuel cost. I need to be able to get zero fuel consumption with zero power in. There are other ways around this, but it's much simpler when we can disable the intercept thus optimization problem simplifies.

@paulbkoch
Copy link
Collaborator

paulbkoch commented Oct 14, 2024

@mersoz-rccl - That's an interesting case. I think you're right that in many cases a linear model would perform well on such a dataset. Have you compared an unrestricted EBM to a linear model for this problem? I'd be interested in knowing how they compare.

One additional quirk of this problem is that you don't actually want an intercept of zero. For EBMs, and GAMs in general, the intercept value is the response when all features are average, not zero. You can still get the property that the target is zero when all features are zero, but to do so you need to set the intercept to the negative value of the sum of the zeroth bin indexes on each graph. In general, I'd assume the intercept would be fairly close to this value, but forcing it would give you a guarantee.

I'm a little concerned that if we had a parameter called force_zero=False or something similar, people might mistakenly use it on unrestricted problems and get bad results. There's a good chance someone doing hyperparameter tuning would try tuning the parameter for example, although presumably the optimization algorithm would not pick it. Perhaps having a new specialty class might be better, especially since this would only be applicable to regression.

I haven't tried this specific scenario, but I have tried post-process setting the intercept to the exact base rate before. On that experiment I found the differences to be very minor, so I'd still like to see an example where this makes a significant difference before adding it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants