Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat(EstimatorReport): Display the feature weights for linear models #1320

Closed
Tracked by #1314
MarieSacksick opened this issue Feb 14, 2025 · 10 comments · Fixed by #1339
Closed
Tracked by #1314

Feat(EstimatorReport): Display the feature weights for linear models #1320

MarieSacksick opened this issue Feb 14, 2025 · 10 comments · Fixed by #1339
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@MarieSacksick
Copy link
Contributor

MarieSacksick commented Feb 14, 2025

Is your feature request related to a problem? Please describe.

As a Data Scientist, to explain my model and understand the problem I'm trying to solve, I need to check the feature importance by looking at its weights. This should be available linear models only.

Describe the solution you'd like

df = report.feature_importance.model_weights() # renders a dataframe
display = report.feature_importance.plot.model_weights() # renders a display
display.plot()

Describe alternatives you've considered, if relevant

Later, if the object report contains too many accessors, we will group the feature importance and add a parameter to decide which of the feature importance type we want to display.

Additional context

part of epic #1314

@MarieSacksick MarieSacksick added enhancement New feature or request needs-triage This has been recently submitted and needs attention labels Feb 14, 2025
@MarieSacksick MarieSacksick added this to the skore 0.8 milestone Feb 14, 2025
@auguste-probabl
Copy link
Contributor

report.feature_importance.model_weights() # renders a dataframe
report.feature_importance.model_weights() # renders a display

There is one method for both use-cases, I guess that's a typo?

Currently you write report.metrics.[...], should we make a separate feature_importance accessor or write report.metrics.feature_importance.model_weights?

@MarieSacksick MarieSacksick removed the needs-triage This has been recently submitted and needs attention label Feb 17, 2025
@MarieSacksick
Copy link
Contributor Author

thanks for the typo, it was everywhere, it's fixed now.
Yes, it should be a separate accessor.

@sylvaincom
Copy link
Contributor

By "linear models" (https://scikit-learn.org/stable/modules/linear_model.html), which models do you mean exactly?

FYI, for a very first iteration, we dealt with LinearRegression, Ridge, and Lasso.

@MarieSacksick
Copy link
Contributor Author

So cool 🤩
I was thinking about all these models: https://scikit-learn.org/stable/api/sklearn.linear_model.html.
I don't know if there is a private method in scikit-learn to recognize them?

@sylvaincom
Copy link
Contributor

https://scikit-learn.org/stable/auto_examples/inspection/plot_linear_model_coefficient_interpretation.html#sphx-glr-auto-examples-inspection-plot-linear-model-coefficient-interpretation-py

We need to add a check to see if the data is normalized or not, otherwise it makes no sense to interpret the coefficients

If the input estimator is a pipeline, and in the pipeline there is a scaler, then it is good.
If the input estimator is just an estimator (no pipeline) or if the pipeline has no scaler, then we must check if the input data is scaled or not.

@glemaitre do you know if there is an internal in scikit-learn to check if the input data is normalized?

@sylvaincom
Copy link
Contributor

sylvaincom commented Feb 21, 2025

It would also be nice to have this kind of pandas styling:

Image

Related to #1351. WDYT?

@MarieSacksick
Copy link
Contributor Author

We need to add a check to see if the data is normalized or not, otherwise it makes no sense to interpret the coefficients

Do we? Or should this be a warning when we add them?

It would also be nice to have this kind of pandas styling:

With blue and orange, but yes, if it's not much in terms of code, and it doesn't change the nature of the dataframe (I wouldn't use format though, it's something people should be able to choose by themselves)

@sylvaincom
Copy link
Contributor

sylvaincom commented Feb 21, 2025

Do we? Or should this be a warning when we add them?

IMO strong yes, we can't have a feature_importance accessor that human DS can not use to interpret what is going on, we must at least raise a warning to say that the data is not scaled, hence the interpretation of the coefficients does not make sense

With blue and orange, but yes, if it's not much in terms of code, and it doesn't change the nature of the dataframe (I wouldn't use format though, it's something people should be able to choose by themselves)

I like red and green because the underlying data is "ordinal" / "ordered": green conveys positive and read conveys negative

using pandas.style does change the nature: it is no longer a pandas dataframe but a pandas datafame styler, hence the API suggested in #1351

in the styler, formatting the number of decimals makes sense IMO: too many decimals has no statistical significance, the last decimals of 2.07464663737764 are just noise ; scikit-learn's classification report also removes too many decimals:

@MarieSacksick
Copy link
Contributor Author

About the warnings

Ok, then for now, since we don't have the warnings implemented and it will be done in the next iteration, let's not add this.

styling

  • about the colors: following what we discussed this morning (the meaning of "red" is not the same world-wide + to be colorblind-friendly), let's not use red/green, but rather orange/blue.
  • I like the api suggested in enh(ComparisonReport): Add pandas styling to the benchmark of metrics #1351. We will do it. To keep the milestone as is and avoid a tunnel or a drift, let's not do it in this PR though, and let's do it later.

@sylvaincom
Copy link
Contributor

Many thanks, agreed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants