Feat(EstimatorReport): Display the feature weights for linear models #1320

MarieSacksick · 2025-02-14T11:01:06Z

Is your feature request related to a problem? Please describe.

As a Data Scientist, to explain my model and understand the problem I'm trying to solve, I need to check the feature importance by looking at its weights. This should be available linear models only.

Describe the solution you'd like

df = report.feature_importance.model_weights() # renders a dataframe
display = report.feature_importance.plot.model_weights() # renders a display
display.plot()

Describe alternatives you've considered, if relevant

Later, if the object report contains too many accessors, we will group the feature importance and add a parameter to decide which of the feature importance type we want to display.

Additional context

part of epic #1314

auguste-probabl · 2025-02-17T14:32:10Z

report.feature_importance.model_weights() # renders a dataframe
report.feature_importance.model_weights() # renders a display

There is one method for both use-cases, I guess that's a typo?

Currently you write report.metrics.[...], should we make a separate feature_importance accessor or write report.metrics.feature_importance.model_weights?

MarieSacksick · 2025-02-17T14:47:09Z

thanks for the typo, it was everywhere, it's fixed now.
Yes, it should be a separate accessor.

sylvaincom · 2025-02-17T17:35:34Z

By "linear models" (https://scikit-learn.org/stable/modules/linear_model.html), which models do you mean exactly?

FYI, for a very first iteration, we dealt with LinearRegression, Ridge, and Lasso.

MarieSacksick · 2025-02-17T18:05:51Z

So cool 🤩
I was thinking about all these models: https://scikit-learn.org/stable/api/sklearn.linear_model.html.
I don't know if there is a private method in scikit-learn to recognize them?

sylvaincom · 2025-02-21T14:48:40Z

https://scikit-learn.org/stable/auto_examples/inspection/plot_linear_model_coefficient_interpretation.html#sphx-glr-auto-examples-inspection-plot-linear-model-coefficient-interpretation-py

We need to add a check to see if the data is normalized or not, otherwise it makes no sense to interpret the coefficients

If the input estimator is a pipeline, and in the pipeline there is a scaler, then it is good.
If the input estimator is just an estimator (no pipeline) or if the pipeline has no scaler, then we must check if the input data is scaled or not.

@glemaitre do you know if there is an internal in scikit-learn to check if the input data is normalized?

sylvaincom · 2025-02-21T14:50:11Z

It would also be nice to have this kind of pandas styling:

Related to #1351. WDYT?

MarieSacksick · 2025-02-21T14:59:30Z

We need to add a check to see if the data is normalized or not, otherwise it makes no sense to interpret the coefficients

Do we? Or should this be a warning when we add them?

It would also be nice to have this kind of pandas styling:

With blue and orange, but yes, if it's not much in terms of code, and it doesn't change the nature of the dataframe (I wouldn't use format though, it's something people should be able to choose by themselves)

sylvaincom · 2025-02-21T15:19:06Z

Do we? Or should this be a warning when we add them?

IMO strong yes, we can't have a feature_importance accessor that human DS can not use to interpret what is going on, we must at least raise a warning to say that the data is not scaled, hence the interpretation of the coefficients does not make sense

With blue and orange, but yes, if it's not much in terms of code, and it doesn't change the nature of the dataframe (I wouldn't use format though, it's something people should be able to choose by themselves)

I like red and green because the underlying data is "ordinal" / "ordered": green conveys positive and read conveys negative

using pandas.style does change the nature: it is no longer a pandas dataframe but a pandas datafame styler, hence the API suggested in #1351

in the styler, formatting the number of decimals makes sense IMO: too many decimals has no statistical significance, the last decimals of 2.07464663737764 are just noise ; scikit-learn's classification report also removes too many decimals:

MarieSacksick · 2025-02-24T10:58:01Z

About the warnings

Ok, then for now, since we don't have the warnings implemented and it will be done in the next iteration, let's not add this.

styling

about the colors: following what we discussed this morning (the meaning of "red" is not the same world-wide + to be colorblind-friendly), let's not use red/green, but rather orange/blue.
I like the api suggested in enh(ComparisonReport): Add pandas styling to the benchmark of metrics #1351. We will do it. To keep the milestone as is and avoid a tunnel or a drift, let's not do it in this PR though, and let's do it later.

sylvaincom · 2025-02-24T11:49:15Z

Many thanks, agreed

MarieSacksick added enhancement New feature or request needs-triage This has been recently submitted and needs attention labels Feb 14, 2025

MarieSacksick added this to the skore 0.8 milestone Feb 14, 2025

auguste-probabl assigned auguste-probabl and sylvaincom Feb 17, 2025

MarieSacksick removed the needs-triage This has been recently submitted and needs attention label Feb 17, 2025

sylvaincom mentioned this issue Feb 17, 2025

feat: Display the model coefficients for linear models #1339

Merged

sylvaincom linked a pull request Feb 17, 2025 that will close this issue

feat: Display the model coefficients for linear models #1339

Merged

MarieSacksick mentioned this issue Feb 20, 2025

Feat(EstimatorReport): Display the feature importance #1314

Open

3 tasks

sylvaincom mentioned this issue Feb 25, 2025

enh(ComparisonReport): Add pandas styling to the benchmark of metrics #1351

Open

3 tasks

glemaitre closed this as completed in #1339 Mar 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat(EstimatorReport): Display the feature weights for linear models #1320

Feat(EstimatorReport): Display the feature weights for linear models #1320

MarieSacksick commented Feb 14, 2025 •

edited

Loading

auguste-probabl commented Feb 17, 2025

MarieSacksick commented Feb 17, 2025

sylvaincom commented Feb 17, 2025

MarieSacksick commented Feb 17, 2025

sylvaincom commented Feb 21, 2025

sylvaincom commented Feb 21, 2025 •

edited

Loading

MarieSacksick commented Feb 21, 2025

sylvaincom commented Feb 21, 2025 •

edited

Loading

MarieSacksick commented Feb 24, 2025

sylvaincom commented Feb 24, 2025

Feat(EstimatorReport): Display the feature weights for linear models #1320

Feat(EstimatorReport): Display the feature weights for linear models #1320

Comments

MarieSacksick commented Feb 14, 2025 • edited Loading

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered, if relevant

Additional context

auguste-probabl commented Feb 17, 2025

MarieSacksick commented Feb 17, 2025

sylvaincom commented Feb 17, 2025

MarieSacksick commented Feb 17, 2025

sylvaincom commented Feb 21, 2025

sylvaincom commented Feb 21, 2025 • edited Loading

MarieSacksick commented Feb 21, 2025

sylvaincom commented Feb 21, 2025 • edited Loading

MarieSacksick commented Feb 24, 2025

About the warnings

styling

sylvaincom commented Feb 24, 2025

MarieSacksick commented Feb 14, 2025 •

edited

Loading

sylvaincom commented Feb 21, 2025 •

edited

Loading

sylvaincom commented Feb 21, 2025 •

edited

Loading