feat: Display the model coefficients for linear models #1339

sylvaincom · 2025-02-17T16:14:47Z

Introduces a new accessor on EstimatorReport, feature_importance, which currently has one method, coefficients, only available for estimators with a coef_ attribute (such as linear models).

Demo:

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
X, y = make_regression(random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
estimator = LinearRegression().fit(X_train, y_train)
from skore import EstimatorReport
report = EstimatorReport(estimator, X_test=X_test, y_test=y_test)
report.help()
╭─────────────────── Tools to diagnose estimator LinearRegression ───────────────────╮
│ EstimatorReport                                                                      │
│ ├── .metrics                                                                         │
│ │   ├── .accuracy(...)         (↗︎)     - Compute the accuracy score.                 │
│ │   ├── .brier_score(...)      (↘︎)     - Compute the Brier score.                    │
│ │   ├── .log_loss(...)         (↘︎)     - Compute the log loss.                       │
│ │   ├── .precision(...)        (↗︎)     - Compute the precision score.                │
│ │   ├── .precision_recall(...)         - Plot the precision-recall curve.            │
│ │   ├── .recall(...)           (↗︎)     - Compute the recall score.                   │
│ │   ├── .roc(...)                      - Plot the ROC curve.                         │
│ │   ├── .roc_auc(...)          (↗︎)     - Compute the ROC AUC score.                  │
│ │   ├── .custom_metric(...)            - Compute a custom metric.                    │
│ │   └── .report_metrics(...)           - Report a set of metrics for our estimator.  │
│ ├── .feature_importance                                                              │
│ │   └── .coefficients(...)             - Report the coefficients of a regression     │
│ │       estimator.                                                                   │
│ ├── .cache_predictions(...)            - Cache estimator's predictions.              │
│ ├── .clear_cache(...)                  - Clear the cache.                            │
│ └── Attributes                                                                       │
│     ├── .X_test                                                                      │
│     ├── .y_test                                                                      │
│     ├── .estimator_                                                                  │
│     └── .estimator_name_                                                             │
│                                                                                      │
│                                                                                      │
│ Legend:                                                                              │
│ (↗︎) higher is better (↘︎) lower is better                                             │
╰──────────────────────────────────────────────────────────────────────────────────────╯

report.feature_importance.coefficients()

Note what happens if the estimator is not a regressor (e.g. a LogisticRegression):

report.feature_importance.help()
╭─ Available feature importance methods ─╮
│ report.feature_importance              │
╰────────────────────────────────────────╯

Close #1320

Co-authored-by: @auguste-probabl [email protected]

github-actions · 2025-02-17T17:40:21Z

Coverage Report for backend

File	Stmts	Miss	Cover	Missing
venv/lib/python3.12/site-packages/skore
__init__.py	15	0	100%
__main__.py	8	8	0%	3–19
_config.py	28	0	100%
exceptions.py	4	4	0%	4–23
venv/lib/python3.12/site-packages/skore/persistence
__init__.py	0	0	100%
venv/lib/python3.12/site-packages/skore/persistence/item
__init__.py	56	3	93%	96–99
altair_chart_item.py	19	1	91%	14
item.py	22	1	95%	86
matplotlib_figure_item.py	36	1	95%	19
media_item.py	22	0	100%
numpy_array_item.py	27	1	94%	16
pandas_dataframe_item.py	29	1	94%	14
pandas_series_item.py	29	1	94%	14
pickle_item.py	22	0	100%
pillow_image_item.py	25	1	93%	15
plotly_figure_item.py	20	1	92%	14
polars_dataframe_item.py	27	1	94%	14
polars_series_item.py	22	1	92%	14
primitive_item.py	23	2	91%	13–15
sklearn_base_estimator_item.py	29	1	94%	15
skrub_table_report_item.py	10	1	86%	11
venv/lib/python3.12/site-packages/skore/persistence/repository
__init__.py	2	0	100%
item_repository.py	59	5	91%	15–16, 202–203, 226
venv/lib/python3.12/site-packages/skore/persistence/storage
__init__.py	4	0	100%
abstract_storage.py	22	0	100%
disk_cache_storage.py	33	1	95%	44
in_memory_storage.py	20	0	100%
venv/lib/python3.12/site-packages/skore/project
__init__.py	3	0	100%
_open.py	5	0	100%
project.py	81	1	99%	284
venv/lib/python3.12/site-packages/skore/sklearn
__init__.py	6	0	100%
_base.py	162	13	92%	43, 115, 118, 171–180, 192–>197, 212, 215–216
find_ml_task.py	61	0	99%	136–>144
types.py	13	2	85%	33, 61
venv/lib/python3.12/site-packages/skore/sklearn/_comparison
__init__.py	5	0	100%
metrics_accessor.py	164	2	97%	165, 166–>168, 1218
precision_recall_curve_display.py	73	1	97%	196–>199, 304
prediction_error_display.py	67	10	78%	97, 154–>exit, 209, 214–218, 227, 231, 236–238
report.py	64	1	96%	16, 251–>254
roc_curve_display.py	69	1	96%	204–>213, 213–>216, 308
venv/lib/python3.12/site-packages/skore/sklearn/_cross_validation
__init__.py	5	0	100%
metrics_accessor.py	170	0	99%	142–>144, 144–>146
report.py	105	1	98%	22
venv/lib/python3.12/site-packages/skore/sklearn/_estimator
__init__.py	7	0	100%
feature_importance_accessor.py	39	0	100%
metrics_accessor.py	325	11	95%	166–175, 203–>212, 211, 241, 252–>254, 282, 309–313, 328, 351, 363, 364–>366
report.py	127	1	97%	22, 229–>235, 237–>239
venv/lib/python3.12/site-packages/skore/sklearn/_plot
__init__.py	4	0	100%
precision_recall_curve.py	129	1	98%	240–>257, 329
prediction_error.py	102	1	98%	173, 189–>192
roc_curve.py	143	0	100%
style.py	14	0	100%
utils.py	99	5	94%	31, 55–57, 61
venv/lib/python3.12/site-packages/skore/sklearn/train_test_split
__init__.py	0	0	100%
train_test_split.py	36	2	94%	16–17
venv/lib/python3.12/site-packages/skore/sklearn/train_test_split/warning
__init__.py	8	0	100%
high_class_imbalance_too_few_examples_warning.py	17	1	90%	79
high_class_imbalance_warning.py	18	0	100%
random_state_unset_warning.py	12	1	88%	15
shuffle_true_warning.py	10	1	83%	46
stratify_is_set_warning.py	12	1	88%	15
time_based_column_warning.py	23	2	86%	17, 73
train_test_split_warning.py	5	1	80%	21
venv/lib/python3.12/site-packages/skore/utils
__init__.py	6	0	100%
_accessor.py	17	0	100%
_environment.py	27	27	0%	1–51
_index.py	5	0	100%
_logger.py	22	22	0%	3–38
_parallel.py	38	3	88%	23–33, 124
_patch.py	13	5	53%	21–37
_progress_bar.py	34	0	100%
_show_versions.py	33	0	100%
TOTAL	2991	152	94%

Tests	Skipped	Failures	Errors	Time
636	3 💤	0 ❌	0 🔥	48.235s ⏱️

github-actions · 2025-02-17T17:45:51Z

Documentation preview @ 9f2734d

skore/src/skore/sklearn/_estimator/metrics_accessor.py

skore/src/skore/utils/_accessor.py

skore/src/skore/sklearn/_estimator/metrics_accessor.py

skore/tests/unit/sklearn/test_estimator.py

glemaitre

We should anticipate how hard is to do the plotting with only pandas. It is to decide whether or not we should have a display.

skore/src/skore/sklearn/_estimator/feature_importance_accessor.py

sylvaincom · 2025-02-21T14:57:27Z

We have a toy example for example-driven dev at examples/model_evaluation/plot_feature_importance.py

MarieSacksick

I checked /examples/model_evaluation/plot_feature_importance.py and from this, it seems nice to me!
I didn't check the code nor several different models.

examples/model_evaluation/plot_feature_importance.py

skore/src/skore/sklearn/_estimator/feature_importance_accessor.py

sylvaincom · 2025-02-21T18:08:29Z

@thomass-dev do you know why the documentation preview does not include the new example? 🙏 Is it because the example was added after the documentation preview was generated?

MarieSacksick

Only one comment, otherwise it's good for me!

sphinx/conf.py

examples/example_driven_dev/plot_feature_importance.py

glemaitre

I just started by reviewing the example.

Now I'm thinking that .model_weights() should return a display. The plan would be that this display has .plot() to make the right plotting for the user, as .frame property that return the current dataframe.

In addition, I like the current direction for the pandas style and I think that we can make it the default HTML repr of the display. One issue with the style is that it does not return a real dataframe that you can manipulate. That's why, I think that we can use it in the repr since it is not an object that is used by the user afterwards and we still have the .frame in case someone wants to access the real dataframe.

examples/example_driven_dev/plot_feature_importance.py

glemaitre

We need to extend the test and check what happens when passing a classifier. It tells me that somehow, we might want to validate the shape of coef_ and intercept_ to something that we know how to handle and raise an error otherwise.

skore/src/skore/utils/_accessor.py

skore/tests/unit/sklearn/test_estimator.py

auguste-probabl · 2025-02-25T09:38:29Z

We need to extend the test and check what happens when passing a classifier. It tells me that somehow, we might want to validate the shape of coef_ and intercept_ to something that we know how to handle and raise an error otherwise.

We have a test that passes a LogisticRegression. Our strategy for dealing with the non-standardized shapes of coef_ and intercept_ has been to use np.atleast_2d, which seems to work pretty well.

skore/src/skore/sklearn/_estimator/feature_importance_accessor.py

glemaitre · 2025-02-25T10:29:21Z

We have a test that passes a LogisticRegression. Our strategy for dealing with the non-standardized shapes of coef_ and intercept_ has been to use np.atleast_2d, which seems to work pretty well.

Your test does not check for multiclass. As mentioned #1339 (comment), I think that the transpose in this case will do the trick.

skore/tests/unit/sklearn/test_estimator.py

Co-authored-by: Guillaume Lemaitre <[email protected]>

…-weights-for-linear-models

…r-linear-models' of https://github.com/probabl-ai/skore into 1320-featestimatorreport-display-the-feature-weights-for-linear-models

Co-authored-by: Guillaume Lemaitre <[email protected]>

…r-linear-models' of https://github.com/probabl-ai/skore into 1320-featestimatorreport-display-the-feature-weights-for-linear-models

Co-authored-by: Guillaume Lemaitre <[email protected]>

…-weights-for-linear-models

sylvaincom linked an issue Feb 17, 2025 that may be closed by this pull request

Feat(EstimatorReport): Display the feature weights for linear models #1320

Closed

github-actions bot assigned sylvaincom Feb 17, 2025

auguste-probabl self-assigned this Feb 17, 2025

glemaitre reviewed Feb 18, 2025

View reviewed changes

skore/tests/unit/sklearn/test_estimator.py Outdated Show resolved Hide resolved

glemaitre reviewed Feb 18, 2025

View reviewed changes

auguste-probabl force-pushed the 1320-featestimatorreport-display-the-feature-weights-for-linear-models branch 2 times, most recently from ac49914 to 8eae80b Compare February 20, 2025 15:48

auguste-probabl marked this pull request as ready for review February 20, 2025 15:50

auguste-probabl requested a review from glemaitre February 20, 2025 15:50

sylvaincom commented Feb 20, 2025

View reviewed changes

skore/src/skore/sklearn/_estimator/feature_importance_accessor.py Outdated Show resolved Hide resolved

sylvaincom marked this pull request as draft February 21, 2025 14:30

sylvaincom marked this pull request as ready for review February 21, 2025 14:56

MarieSacksick reviewed Feb 21, 2025

View reviewed changes

examples/model_evaluation/plot_feature_importance.py Outdated Show resolved Hide resolved

sylvaincom commented Feb 21, 2025

View reviewed changes

skore/src/skore/sklearn/_estimator/feature_importance_accessor.py Show resolved Hide resolved

auguste-probabl force-pushed the 1320-featestimatorreport-display-the-feature-weights-for-linear-models branch 2 times, most recently from 3296ad9 to b74e296 Compare February 24, 2025 10:46

MarieSacksick requested changes Feb 24, 2025

View reviewed changes

sphinx/conf.py Outdated Show resolved Hide resolved

glemaitre reviewed Feb 24, 2025

View reviewed changes

examples/example_driven_dev/plot_feature_importance.py Outdated Show resolved Hide resolved

glemaitre reviewed Feb 25, 2025

View reviewed changes

skore/src/skore/utils/_accessor.py Outdated Show resolved Hide resolved

skore/tests/unit/sklearn/test_estimator.py Outdated Show resolved Hide resolved

skore/tests/unit/sklearn/test_estimator.py Outdated Show resolved Hide resolved

glemaitre reviewed Feb 25, 2025

View reviewed changes

skore/src/skore/sklearn/_estimator/feature_importance_accessor.py Outdated Show resolved Hide resolved

glemaitre reviewed Feb 25, 2025

View reviewed changes

skore/tests/unit/sklearn/test_estimator.py Outdated Show resolved Hide resolved

glemaitre reviewed Feb 25, 2025

View reviewed changes

skore/tests/unit/sklearn/test_estimator.py Outdated Show resolved Hide resolved

sylvaincom and others added 24 commits March 4, 2025 15:24

example: explain slicing of pipeline

dd9a735

Update examples/model_evaluation/plot_feature_importance.py

e59cf6a

Co-authored-by: Guillaume Lemaitre <[email protected]>

example: refine calculation of n_features_in

1a9d435

Merge branch 'main' into 1320-featestimatorreport-display-the-feature…

44f2dc7

…-weights-for-linear-models

example: refine sort by abs values

068304a

Merge branch '1320-featestimatorreport-display-the-feature-weights-fo…

5309dbb

…r-linear-models' of https://github.com/probabl-ai/skore into 1320-featestimatorreport-display-the-feature-weights-for-linear-models

example: minor iter

4be3c64

Update examples/model_evaluation/plot_feature_importance.py

8b1bcae

Co-authored-by: Guillaume Lemaitre <[email protected]>

example: use RidgeCV

7a3556a

example: catch warnings of the grid search

6216924

example: minor iter

d0067cd

example: apply spline on all features

b9c080e

example: refining comments and explanations

e7a3323

example: refine, add a conclusion

1e67b3e

add test for _check_has_coef

b0e773c

example: adding some random seeds

34289d0

Merge branch '1320-featestimatorreport-display-the-feature-weights-fo…

3a2680b

…r-linear-models' of https://github.com/probabl-ai/skore into 1320-featestimatorreport-display-the-feature-weights-for-linear-models

example: move it to end to end section

f6fdeed

Update plot_feature_importance.py

98b70ed

Co-authored-by: Guillaume Lemaitre <[email protected]>

Merge branch 'main' into 1320-featestimatorreport-display-the-feature…

5ee4996

…-weights-for-linear-models

example: add report names in comparator

30cc53d

example: refine dict for comparator

b516a7f

example: minor iter

36ea706

Merge branch 'main' into 1320-featestimatorreport-display-the-feature…

69eb4f4

…-weights-for-linear-models

auguste-probabl force-pushed the 1320-featestimatorreport-display-the-feature-weights-for-linear-models branch from 3ebfa52 to 69eb4f4 Compare March 4, 2025 14:25

Merge branch 'main' into 1320-featestimatorreport-display-the-feature…

9f2734d

…-weights-for-linear-models

glemaitre merged commit 5432a40 into main Mar 4, 2025
19 checks passed

glemaitre deleted the 1320-featestimatorreport-display-the-feature-weights-for-linear-models branch March 4, 2025 15:57

This was referenced Mar 5, 2025

enh: For the inspection of linear models, add statistical tests to the coefficients #1386

Open

docs: For feature importance, add MDI and PI in the example on California housing dataset #1390

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Display the model coefficients for linear models #1339

feat: Display the model coefficients for linear models #1339

sylvaincom commented Feb 17, 2025 •

edited by auguste-probabl

Loading

github-actions bot commented Feb 17, 2025 •

edited

Loading

github-actions bot commented Feb 17, 2025 •

edited

Loading

glemaitre left a comment

sylvaincom commented Feb 21, 2025

MarieSacksick left a comment

sylvaincom commented Feb 21, 2025

MarieSacksick left a comment

glemaitre left a comment

glemaitre left a comment

auguste-probabl commented Feb 25, 2025 •

edited

Loading

glemaitre commented Feb 25, 2025

feat: Display the model coefficients for linear models #1339

feat: Display the model coefficients for linear models #1339

Conversation

sylvaincom commented Feb 17, 2025 • edited by auguste-probabl Loading

github-actions bot commented Feb 17, 2025 • edited Loading

github-actions bot commented Feb 17, 2025 • edited Loading

glemaitre left a comment

Choose a reason for hiding this comment

sylvaincom commented Feb 21, 2025

MarieSacksick left a comment

Choose a reason for hiding this comment

sylvaincom commented Feb 21, 2025

MarieSacksick left a comment

Choose a reason for hiding this comment

glemaitre left a comment

Choose a reason for hiding this comment

glemaitre left a comment

Choose a reason for hiding this comment

auguste-probabl commented Feb 25, 2025 • edited Loading

glemaitre commented Feb 25, 2025

sylvaincom commented Feb 17, 2025 •

edited by auguste-probabl

Loading

github-actions bot commented Feb 17, 2025 •

edited

Loading

github-actions bot commented Feb 17, 2025 •

edited

Loading

auguste-probabl commented Feb 25, 2025 •

edited

Loading