Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Display the model coefficients for linear models #1339

Conversation

sylvaincom
Copy link
Contributor

@sylvaincom sylvaincom commented Feb 17, 2025

Introduces a new accessor on EstimatorReport, feature_importance, which currently has one method, coefficients, only available for estimators with a coef_ attribute (such as linear models).

Demo:

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
X, y = make_regression(random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
estimator = LinearRegression().fit(X_train, y_train)
from skore import EstimatorReport
report = EstimatorReport(estimator, X_test=X_test, y_test=y_test)
report.help()
╭─────────────────── Tools to diagnose estimator LinearRegression ───────────────────╮
│ EstimatorReport                                                                      │
│ ├── .metrics                                                                         │
│ │   ├── .accuracy(...)         (↗︎)     - Compute the accuracy score.                 │
│ │   ├── .brier_score(...)      (↘︎)     - Compute the Brier score.                    │
│ │   ├── .log_loss(...)         (↘︎)     - Compute the log loss.                       │
│ │   ├── .precision(...)        (↗︎)     - Compute the precision score.                │
│ │   ├── .precision_recall(...)         - Plot the precision-recall curve.            │
│ │   ├── .recall(...)           (↗︎)     - Compute the recall score.                   │
│ │   ├── .roc(...)                      - Plot the ROC curve.                         │
│ │   ├── .roc_auc(...)          (↗︎)     - Compute the ROC AUC score.                  │
│ │   ├── .custom_metric(...)            - Compute a custom metric.                    │
│ │   └── .report_metrics(...)           - Report a set of metrics for our estimator.  │
│ ├── .feature_importance                                                              │
│ │   └── .coefficients(...)             - Report the coefficients of a regression     │
│ │       estimator.                                                                   │
│ ├── .cache_predictions(...)            - Cache estimator's predictions.              │
│ ├── .clear_cache(...)                  - Clear the cache.                            │
│ └── Attributes                                                                       │
│     ├── .X_test                                                                      │
│     ├── .y_test                                                                      │
│     ├── .estimator_                                                                  │
│     └── .estimator_name_                                                             │
│                                                                                      │
│                                                                                      │
│ Legend:                                                                              │
│ (↗︎) higher is better (↘︎) lower is better                                             │
╰──────────────────────────────────────────────────────────────────────────────────────╯

report.feature_importance.coefficients()

Note what happens if the estimator is not a regressor (e.g. a LogisticRegression):

report.feature_importance.help()
╭─ Available feature importance methods ─╮
│ report.feature_importance              │
╰────────────────────────────────────────╯

Close #1320

Co-authored-by: @auguste-probabl [email protected]

Copy link
Contributor

github-actions bot commented Feb 17, 2025

Coverage

Coverage Report for backend
FileStmtsMissCoverMissing
venv/lib/python3.12/site-packages/skore
   __init__.py150100% 
   __main__.py880%3–19
   _config.py280100% 
   exceptions.py440%4–23
venv/lib/python3.12/site-packages/skore/persistence
   __init__.py00100% 
venv/lib/python3.12/site-packages/skore/persistence/item
   __init__.py56393%96–99
   altair_chart_item.py19191%14
   item.py22195%86
   matplotlib_figure_item.py36195%19
   media_item.py220100% 
   numpy_array_item.py27194%16
   pandas_dataframe_item.py29194%14
   pandas_series_item.py29194%14
   pickle_item.py220100% 
   pillow_image_item.py25193%15
   plotly_figure_item.py20192%14
   polars_dataframe_item.py27194%14
   polars_series_item.py22192%14
   primitive_item.py23291%13–15
   sklearn_base_estimator_item.py29194%15
   skrub_table_report_item.py10186%11
venv/lib/python3.12/site-packages/skore/persistence/repository
   __init__.py20100% 
   item_repository.py59591%15–16, 202–203, 226
venv/lib/python3.12/site-packages/skore/persistence/storage
   __init__.py40100% 
   abstract_storage.py220100% 
   disk_cache_storage.py33195%44
   in_memory_storage.py200100% 
venv/lib/python3.12/site-packages/skore/project
   __init__.py30100% 
   _open.py50100% 
   project.py81199%284
venv/lib/python3.12/site-packages/skore/sklearn
   __init__.py60100% 
   _base.py1621392%43, 115, 118, 171–180, 192–>197, 212, 215–216
   find_ml_task.py61099%136–>144
   types.py13285%33, 61
venv/lib/python3.12/site-packages/skore/sklearn/_comparison
   __init__.py50100% 
   metrics_accessor.py164297%165, 166–>168, 1218
   precision_recall_curve_display.py73197%196–>199, 304
   prediction_error_display.py671078%97, 154–>exit, 209, 214–218, 227, 231, 236–238
   report.py64196%16, 251–>254
   roc_curve_display.py69196%204–>213, 213–>216, 308
venv/lib/python3.12/site-packages/skore/sklearn/_cross_validation
   __init__.py50100% 
   metrics_accessor.py170099%142–>144, 144–>146
   report.py105198%22
venv/lib/python3.12/site-packages/skore/sklearn/_estimator
   __init__.py70100% 
   feature_importance_accessor.py390100% 
   metrics_accessor.py3251195%166–175, 203–>212, 211, 241, 252–>254, 282, 309–313, 328, 351, 363, 364–>366
   report.py127197%22, 229–>235, 237–>239
venv/lib/python3.12/site-packages/skore/sklearn/_plot
   __init__.py40100% 
   precision_recall_curve.py129198%240–>257, 329
   prediction_error.py102198%173, 189–>192
   roc_curve.py1430100% 
   style.py140100% 
   utils.py99594%31, 55–57, 61
venv/lib/python3.12/site-packages/skore/sklearn/train_test_split
   __init__.py00100% 
   train_test_split.py36294%16–17
venv/lib/python3.12/site-packages/skore/sklearn/train_test_split/warning
   __init__.py80100% 
   high_class_imbalance_too_few_examples_warning.py17190%79
   high_class_imbalance_warning.py180100% 
   random_state_unset_warning.py12188%15
   shuffle_true_warning.py10183%46
   stratify_is_set_warning.py12188%15
   time_based_column_warning.py23286%17, 73
   train_test_split_warning.py5180%21
venv/lib/python3.12/site-packages/skore/utils
   __init__.py60100% 
   _accessor.py170100% 
   _environment.py27270%1–51
   _index.py50100% 
   _logger.py22220%3–38
   _parallel.py38388%23–33, 124
   _patch.py13553%21–37
   _progress_bar.py340100% 
   _show_versions.py330100% 
TOTAL299115294% 

Tests Skipped Failures Errors Time
636 3 💤 0 ❌ 0 🔥 48.235s ⏱️

Copy link
Contributor

github-actions bot commented Feb 17, 2025

Documentation preview @ 9f2734d

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should anticipate how hard is to do the plotting with only pandas. It is to decide whether or not we should have a display.

@auguste-probabl auguste-probabl force-pushed the 1320-featestimatorreport-display-the-feature-weights-for-linear-models branch 2 times, most recently from ac49914 to 8eae80b Compare February 20, 2025 15:48
@auguste-probabl auguste-probabl marked this pull request as ready for review February 20, 2025 15:50
@sylvaincom sylvaincom marked this pull request as draft February 21, 2025 14:30
@sylvaincom sylvaincom marked this pull request as ready for review February 21, 2025 14:56
@sylvaincom
Copy link
Contributor Author

We have a toy example for example-driven dev at examples/model_evaluation/plot_feature_importance.py

Copy link
Contributor

@MarieSacksick MarieSacksick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked /examples/model_evaluation/plot_feature_importance.py and from this, it seems nice to me!
I didn't check the code nor several different models.

@sylvaincom
Copy link
Contributor Author

@thomass-dev do you know why the documentation preview does not include the new example? 🙏 Is it because the example was added after the documentation preview was generated?

@auguste-probabl auguste-probabl force-pushed the 1320-featestimatorreport-display-the-feature-weights-for-linear-models branch 2 times, most recently from 3296ad9 to b74e296 Compare February 24, 2025 10:46
Copy link
Contributor

@MarieSacksick MarieSacksick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one comment, otherwise it's good for me!

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just started by reviewing the example.

Now I'm thinking that .model_weights() should return a display. The plan would be that this display has .plot() to make the right plotting for the user, as .frame property that return the current dataframe.

In addition, I like the current direction for the pandas style and I think that we can make it the default HTML repr of the display. One issue with the style is that it does not return a real dataframe that you can manipulate. That's why, I think that we can use it in the repr since it is not an object that is used by the user afterwards and we still have the .frame in case someone wants to access the real dataframe.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to extend the test and check what happens when passing a classifier. It tells me that somehow, we might want to validate the shape of coef_ and intercept_ to something that we know how to handle and raise an error otherwise.

@auguste-probabl
Copy link
Contributor

auguste-probabl commented Feb 25, 2025

We need to extend the test and check what happens when passing a classifier. It tells me that somehow, we might want to validate the shape of coef_ and intercept_ to something that we know how to handle and raise an error otherwise.

We have a test that passes a LogisticRegression. Our strategy for dealing with the non-standardized shapes of coef_ and intercept_ has been to use np.atleast_2d, which seems to work pretty well.

@glemaitre
Copy link
Member

We have a test that passes a LogisticRegression. Our strategy for dealing with the non-standardized shapes of coef_ and intercept_ has been to use np.atleast_2d, which seems to work pretty well.

Your test does not check for multiclass. As mentioned #1339 (comment), I think that the transpose in this case will do the trick.

sylvaincom and others added 24 commits March 4, 2025 15:24
…r-linear-models' of https://github.com/probabl-ai/skore into 1320-featestimatorreport-display-the-feature-weights-for-linear-models
…r-linear-models' of https://github.com/probabl-ai/skore into 1320-featestimatorreport-display-the-feature-weights-for-linear-models
Co-authored-by: Guillaume Lemaitre <[email protected]>
@auguste-probabl auguste-probabl force-pushed the 1320-featestimatorreport-display-the-feature-weights-for-linear-models branch from 3ebfa52 to 69eb4f4 Compare March 4, 2025 14:25
@glemaitre glemaitre merged commit 5432a40 into main Mar 4, 2025
19 checks passed
@glemaitre glemaitre deleted the 1320-featestimatorreport-display-the-feature-weights-for-linear-models branch March 4, 2025 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feat(EstimatorReport): Display the feature weights for linear models
6 participants