Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GLM Support #763

Merged
merged 53 commits into from
Jan 4, 2025
Merged

Add GLM Support #763

merged 53 commits into from
Jan 4, 2025

Conversation

s3alfisc
Copy link
Member

@s3alfisc s3alfisc commented Dec 28, 2024

Introduces pf.feglm(), which allows to fit GLMs with Gaussian & Binomial families (Logit and Probit).

In Python:

%load_ext autoreload
%autoreload 2

import pyfixest as pf
import numpy as np

data = pf.get_data()
data["Y"] = np.where(data["Y"] > 0, 1, 0)

fit_ols = pf.feols("Y ~ X1", data = data)
fit_gaussian = pf.feglm("Y ~ X1", data = data, family = "gaussian")
fit_probit = pf.feglm("Y ~ X1", data = data, family = "probit")
fit_logit = pf.feglm("Y ~ X1", data = data, family = "logit")

pf.etable([fit_ols, fit_gaussian, fit_probit, fit_logit],digits = 6)

image

In R:

library(reticulate)
library(fixest)
data = py$data
fml = Y ~ X1

fit_ols = feols(Y~X1, data = data)
fit_gaussian =feglm(Y ~ X1, data = data, family = "gaussian")
fit_probit = feglm(Y ~ X1, data = data, family = binomial(link = "probit"))
fit_logit = feglm(Y ~ X1, data = data, family = binomial(link = "logit"))

etable(fit_ols, fit_gaussian, fit_probit, fit_logit)

image

To do:

  • point estimates match
  • inference matches
  • prediction matches

Only apparent issues / error (?):

  • IWLS sometimes fails, even with step halfing. Due to separation? Seems to be mostly probit, so likely implementation error. -> fixed, incorrect deviance implementation
  • - scores for Gaussian match OLS scores in pyfixest but not fixest (Gaussian scores in fixest <> OLS scores, not sure why).

@s3alfisc s3alfisc marked this pull request as draft December 28, 2024 11:40
@s3alfisc s3alfisc mentioned this pull request Dec 28, 2024
4 tasks
Copy link
Contributor

@juanitorduz juanitorduz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some minor code-style comments. I am unfamiliar with the numerical estimation details, so someone else can help review them.

weights_type=weights_type,
# tol=tol,
# maxiter=maxiter,
collin_tol=collin_tol,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why these commented lines ? :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a relict from an earlier implementation in which the Feglm class inherited from Fepois, which has these arguments. Now it inherits from Feols, which does not have them, so both arguments are set in Feglm.init. I've cleaned this up, coming with the next commit.


# Step 1: _get weights w_tilde(r-1) and v(r-1) (eq. 2.5)
detadmu = self._update_detadmu(mu=mu)
# v = self._update_v(y=_Y.flatten(), mu=mu, detadmu=detadmu)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented line?

v_dotdot=v_dotdot,
X_dotdot=X_dotdot,
deviance_old=deviance_old,
step_halfing_tolerance=1e-12,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you a case where the user would need to modify the step_halfing_tolerance parameter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the glmnet API, and they do not seem to support setting the step halfing tolerance: link. For now I would maybe not support it, but add it in case users request it?

return Z.T @ W @ v

def _get_diff(self, deviance: np.ndarray, last: np.ndarray) -> np.ndarray:
return np.abs(deviance - last) / (0.1 + np.abs(last))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 0.1? It clearly seems to be numerical stability, but could it be smaller (just curious)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually don't know - I am using this as it is mimic fixest, which itself follows R's glm implementation, which is using this convergence criterion: link

@s3alfisc s3alfisc marked this pull request as ready for review January 1, 2025 13:19
@juanitorduz
Copy link
Contributor

I had some issues locking the file in #767 (initial commits). I had to make sure to update the environments that I really needed. It seems a total update is breaking something (in case it helps)

@s3alfisc
Copy link
Member Author

s3alfisc commented Jan 4, 2025

Tests fail due to an incorrect assumption on formulaic behavior, which seems to have changed with version 1.1.0. Setting formulaic requirements to <1.1.0 for now. Details here: #777

@s3alfisc s3alfisc merged commit 83a6f27 into master Jan 4, 2025
9 checks passed
@s3alfisc s3alfisc deleted the glm branch January 4, 2025 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Logistic Regression
2 participants