Generalized Linear Models (GLM) Benchmark

This repository is dedicated to benchmarking GLMs using the Benchopt framework.

This is a benchmark based on the Benchopt framework. You can learn more about it here.

Theoretical Overview

A generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function. In a generalized linear model, the outcome $\mathbf{Y}$ (dependent variable) is assumed to be generated from a particular distribution in a family of exponential distributions (e.g. Normal, Binomial, Poisson, Gamma). The mean $\mathbf{\mu}$ of the distribution depends on the independent variables $\mathbf{X}$ through the relation:

$$\mathbb{E}[\boldsymbol{Y}|\boldsymbol{X}] = \boldsymbol{\mu} = g^{-1}(\boldsymbol{X},\boldsymbol{\beta})$$

where $\mathbb{E}[\boldsymbol{Y}|\boldsymbol{X}]$ is the expected value of $\boldsymbol{Y}$ conditioned to $\boldsymbol{X}$ , $\boldsymbol{X}\hspace{1pt}\boldsymbol{\beta}$ is the linear predictor and $g(\cdot)$ is the link function.

Use benchopt run -h for more details about the available options, or visit https://benchopt.github.io/api.html.

Generalized Linear Models (GLM) Benchmark

This repository is dedicated to benchmarking GLMs using the Benchopt framework.

About

This is a benchmark based on the Benchopt framework. You can learn more about it here.

Theoretical Overview

A generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function. In a generalized linear model, the outcome $\mathbf{Y}$ (dependent variable) is assumed to be generated from a particular distribution in a family of exponential distributions (e.g. Normal, Binomial, Poisson, Gamma). The mean $\mathbf{\mu}$ of the distribution depends on the independent variables $\mathbf{X}$ through the relation:

$$\mathbb{E}[\boldsymbol{Y}|\boldsymbol{X}] = \boldsymbol{\mu} = g^{-1}(\boldsymbol{X},\boldsymbol{\beta})$$

where $\mathbb{E}[\boldsymbol{Y}|\boldsymbol{X}]$ is the expected value of $\boldsymbol{Y}$ conditioned to $\boldsymbol{X}$ , $\boldsymbol{X}\hspace{1pt}\boldsymbol{\beta}$ is the linear predictor and $g(\cdot)$ is the link function.

Practical Examples

As already mentioned, let $Y$ be the outcome (dependent variable) and $\mathbf{X}$ be the independent variables. The three types of regression analyzed here(Linear, Logistic and Poisson) differ in the nature of $Y$. For each type, ad hoc datasets and solvers were collected.

Linear Regression

In the case of linear regression, $Y$ is modeled as:

$$\begin{cases} \hspace{4pt} Y\sim N(\mu,\sigma^2)\\ \hspace{4pt} \mu = \boldsymbol{X}\hspace{1pt}\boldsymbol{\beta} \end{cases}$$

The following datasets are used:

The bodyfat LIBSVM dataset
The diabetes sklearn dataset
The California housing sklearn dataset
A simulated dataset

Logistic Regression

In the case of logistic regression $Y$ is a categorical value (** be sure to have values between $-1$ and $1$ **) and it is modeled as:

$$\begin{cases} \hspace{4pt} Y \sim Bernoulli(\mu)\\ \hspace{4pt} \log(\frac{\mu}{1-\mu}) = \boldsymbol{X}\hspace{1pt}\boldsymbol{\beta} \end{cases}$$

The following datasets are used :

The sklearn breast cancer dataset
A simulated dataset

Poisson Regression

In the case of poisson regression, $Y$ is a count value and it is modeled as:

$$\begin{cases} \hspace{4pt} Y \sim Poisson(\mu)\\ \hspace{4pt}\log(\mu) = \boldsymbol{X}\hspace{1pt}\boldsymbol{\beta} \end{cases}$$

For Poisson regression, the following datasets were used :

The freMTPL insurance dataset
A simulated dataset with different levels of sparsity for the design matrix $\boldsymbol{X}$

How to use this benchmark

This benchmark can be run using the following commands:


   $ pip install -U benchopt
   $ git clone https://github.com/wassimmazouz/benchmark_glm
   $ cd benchmark_glm
   $ benchopt run .

Options can be passed to benchopt run, to restrict the benchmarks to some solvers or datasets, e.g.:


	$ benchopt run . -s sklearn -d bcancer --max-runs 10 --n-repetitions 10

Use benchopt run -h for more details about these options, or visit https://benchopt.github.io/api.html.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
benchmark_utils		benchmark_utils
datasets		datasets
install_scripts		install_scripts
solvers		solvers
.gitignore		.gitignore
README.md		README.md
linreg_config.yml		linreg_config.yml
logreg_config.yml		logreg_config.yml
objective.py		objective.py
test_config.py		test_config.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Generalized Linear Models (GLM) Benchmark

Theoretical Overview

Generalized Linear Models (GLM) Benchmark

About

Theoretical Overview

Practical Examples

Linear Regression

Logistic Regression

Poisson Regression

How to use this benchmark

About

Releases

Packages

Contributors 2

Languages

wassimmazouz/benchmark_glm

Folders and files

Latest commit

History

Repository files navigation

Generalized Linear Models (GLM) Benchmark

Theoretical Overview

Generalized Linear Models (GLM) Benchmark

About

Theoretical Overview

Practical Examples

Linear Regression

Logistic Regression

Poisson Regression

How to use this benchmark

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages