Feat(EstimatorReport): Display the feature permutation importance #1319

MarieSacksick · 2025-02-14T10:59:16Z

Is your feature request related to a problem? Please describe.

As a Data Scientist, to explain my model and understand the problem I'm trying to solve, I need to check the feature importance by a permutation method. This should be available to any kind of model.

Describe the solution you'd like

df = report.feature_importance.feature_permutation(scoring = "a scoring method") # renders a dataframe
display = report.feature_importance.plot.feature_permutation(scoring = "a scoring method") # renders a display
display.plot()

Describe alternatives you've considered, if relevant

Later, if the object report contains too many accessors, we will group the feature importance and add a parameter to decide which of the feature importance type we want to display.

Additional context

part of the epic #1314

auguste-probabl · 2025-02-17T14:27:38Z

What is the difference with #1323 ?

MarieSacksick · 2025-02-17T14:40:11Z

I forgot to change the title of 1323, thanks!

auguste-probabl · 2025-02-21T15:22:40Z

Which data should be used to compute the permutation importance? Should we accept arguments like data_source="test"/"train"/"X_y"?

MarieSacksick · 2025-02-21T16:26:15Z

by default, test, but yes, adding this data_source parameter will be perfect!

auguste-probabl · 2025-02-24T16:36:22Z

permutation_importance accepts a "scoring" parameter, that can be a list of metrics. In that case the result looks like this:

{
    'r2': {
        'importances_mean': array([1.72819206, 0.07236024, 0.0503269 ]),
		'importances_std': array([0.37087406, 0.00928599, 0.00382661]),
		'importances': array([[1.99305438, 1.27627584, 1.41891128, 1.66448695, 2.28823186],
						      [0.06935003, 0.09033894, 0.07040989, 0.06812194, 0.06358039],
						      [0.04762681, 0.055217  , 0.04907298, 0.04538669, 0.05433103]])
	},
    'neg_root_mean_squared_error': {
        'importances_mean': array([139.1296646 ,  28.57762622,  23.86150889]),
		'importances_std': array([14.93272773,  1.77354259,  0.90637781]),
		'importances': array([[150.26937093, 120.24946953, 126.79102571, 137.32546909, 161.01298776],
                              [ 28.03071877,  31.99251603,  28.24409869,  27.78141644, 26.83938118],
                              [ 23.22932701,  25.01193447,  23.57936469,  22.6764554 , 24.81046286]])
	}
}

Can you give an example of what you'd expect the dataframe to look like?

MarieSacksick · 2025-02-24T17:35:03Z

We can output something similar to what we have in the ComparisonReport or in the CrossValidationReport: several lines for several scores, and the features are the columns. It's not very pretty, I would expect the scorings to be max 5 and the features at least 10, making it logical to have the long list as the lines index, but it makes it consistant this way.

auguste-probabl · 2025-02-26T11:08:36Z

Here is what I currently have:

Repeat                                   Repeat #0   Repeat #1   Repeat #2   Repeat #3   Repeat #4
Metric                      Feature
r2                          Feature #0    1.993054    1.276276    1.418911    1.664487    2.288232
                            Feature #1    0.069350    0.090339    0.070410    0.068122    0.063580
                            Feature #2    0.047627    0.055217    0.049073    0.045387    0.054331
neg_root_mean_squared_error Feature #0  150.269371  120.249470  126.791026  137.325469  161.012988
                            Feature #1   28.030719   31.992516   28.244099   27.781416   26.839381
                            Feature #2   23.229327   25.011934   23.579365   22.676455   24.810463

auguste-probabl · 2025-02-26T11:29:44Z

permutation_importance accepts a random_state parameter, which is None by default (so calling the function returns a different result every time).
Right now the plan is to cache calls, so this behaviour is inconvenient. Should we:

Impose a random_state?
Only cache if random_state is given?
Stop caching?

MarieSacksick · 2025-02-26T13:56:15Z

Good point!
I'd like to keep caching because it's a nice feature, in particular for artefacts requiring a lot of computing time, and feature importance can be one. I don't like imposing a random state (I find it unfriendly), nor decide one ourselves if the user doesn't provide one (I find it surprising).
So I'd go for your second option!

MarieSacksick added enhancement New feature or request needs-triage This has been recently submitted and needs attention labels Feb 14, 2025

MarieSacksick added this to the skore 0.8 milestone Feb 14, 2025

MarieSacksick removed the needs-triage This has been recently submitted and needs attention label Feb 17, 2025

MarieSacksick mentioned this issue Feb 20, 2025

Feat(EstimatorReport): Display the feature importance #1314

Open

3 tasks

auguste-probabl assigned auguste-probabl and sylvaincom Feb 21, 2025

auguste-probabl linked a pull request Feb 26, 2025 that will close this issue

feat(EstimatorReport): Display the feature permutation importance and mean decrease impurity #1365

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat(EstimatorReport): Display the feature permutation importance #1319

Feat(EstimatorReport): Display the feature permutation importance #1319

MarieSacksick commented Feb 14, 2025 •

edited

Loading

auguste-probabl commented Feb 17, 2025

MarieSacksick commented Feb 17, 2025

auguste-probabl commented Feb 21, 2025

MarieSacksick commented Feb 21, 2025

auguste-probabl commented Feb 24, 2025

MarieSacksick commented Feb 24, 2025

auguste-probabl commented Feb 26, 2025

auguste-probabl commented Feb 26, 2025

MarieSacksick commented Feb 26, 2025

Feat(EstimatorReport): Display the feature permutation importance #1319

Feat(EstimatorReport): Display the feature permutation importance #1319

Comments

MarieSacksick commented Feb 14, 2025 • edited Loading

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered, if relevant

Additional context

auguste-probabl commented Feb 17, 2025

MarieSacksick commented Feb 17, 2025

auguste-probabl commented Feb 21, 2025

MarieSacksick commented Feb 21, 2025

auguste-probabl commented Feb 24, 2025

MarieSacksick commented Feb 24, 2025

auguste-probabl commented Feb 26, 2025

auguste-probabl commented Feb 26, 2025

MarieSacksick commented Feb 26, 2025

MarieSacksick commented Feb 14, 2025 •

edited

Loading