Improved sampling efficiency and conversion of coalitions to strings #426

LHBO · 2024-12-13T12:02:36Z

In this PR, we improve the sampling efficiency, and we convert from working with the sampled coalitions as a list of integer vectors to a vector of strings, where each string is a coalition with the groups/features being separated by a white space. I.e., the coalition S = {1, 5, 10} is stored as "1 5 10".

There are two main reasons for doing these changes:

The previous sampling strategy was inefficient, especially, when the number of sampled coalitions increased and when the number of unique coalitions approached 2^M, where M is the number of features. Previously, for each iteration in a while loop, we sampled the remaining number of coalitions between the current and desired number of coalitions. We added them to the list of sampled coalitions and checked the number of unique elements in this list in a naive and brute way, which was time consuming.
In the new version, we sample and excess number of coalitions, as this is an inexpensive step, then compute the number of unique coalitions and when they were sampled for the first time, and remove the redundant coalitions sampled after having obtained the desired number of unique coalitions.
To speed up the sampling, we also do all the sampling and comparing on a string basis, as the comparison/finding of unique strings are faster than for integer vectors, and at the end we convert the strings to list of integer vectors, which the approaches rely on.

Also fixed bug in forecast such that we remember the sampled coalitions from the previous iteration in the current iteration. Previously, this was not done and we resampled coalitions for each iter.

…l hurdles that we did not think about. Created new cpp function that returns the sampled coalitions as a character vector. Need to check how / if it works with adaptive and asymmetric.

…ault version for sampling coalitions

…` is always a part of X.

…exact data table.

…same form as the symmetric Shapley values and refactored the code to only have a single while loop. Having an if-else inside it will not be expensive, as this while loop will often only do one/a few iterations.

…e when they are NA. Got some issues when I ran explain( testing = TRUE, model = model_lm_numeric, x_explain = x_explain_numeric, x_train = x_train_numeric, approach = "gaussian", phi0 = p0, asymmetric = FALSE, causal_ordering = list(1, 2, 3), confounding = c(TRUE, TRUE, FALSE), group = list("A" = c("Solar.R"), B = c("Wind", "Temp"), C = c("Month", "Day")), n_MC_samples = 5, # Just for speed, verbose = c("basic", "progress", "convergence", "vS_details", "shapley"), iterative = TRUE, iterative_args = list(initial_n_coalitions = 4, convergence_tol = NULL, fixed_n_coalitions_per_iter = 1) )

explain( model = model, x_train = x_train, x_explain = x_explain, approach = "gaussian", phi0 = phi0, asymmetric = TRUE, causal_ordering = causal_ordering_group, confounding = confounding, paired_shap_sampling = FALSE, n_MC_samples = 1000, group = group_list, iterative = TRUE, ) where the initial_n_coalitions where larger than the max_n_coalitions.

…l hurdles that we did not think about. Created new cpp function that returns the sampled coalitions as a character vector. Need to check how / if it works with adaptive and asymmetric.

…ault version for sampling coalitions

…` is always a part of X.

…exact data table.

…same form as the symmetric Shapley values and refactored the code to only have a single while loop. Having an if-else inside it will not be expensive, as this while loop will often only do one/a few iterations.

…/shapr into Lars/String_coalitions

…from the previous iteration in the current interation. Previousely, this was not done and we resampled coalitions for each iter.

LHBO and others added 30 commits December 3, 2024 17:12

Started woerking on the change from integers to string. A lot of smal…

f94a8ee

…l hurdles that we did not think about. Created new cpp function that returns the sampled coalitions as a character vector. Need to check how / if it works with adaptive and asymmetric.

Change to whitespace seperated strings in Rcpp

c17ff3c

Ensure even number of coalitions when paired_shap_sampling is TRUE

94d5f60

Update next iteration parameters

f5ea67c

Rcpp

5d74a3a

Updates to main shapley setup which now introduces strigns as the def…

3491954

…ault version for sampling coalitions

Started to simplify code

168aee4

Removed coalitions_tmp as it is no longer needed as `coalitions_str…

a13fb96

…` is always a part of X.

Wrong name in dt_valid_causal_coalitions.

8f76e8a

Change from comma separated to whitespace separated coaltions in the …

7dde5a4

…exact data table.

Rewritten the sampling procedure for asymmetric Shapley value to the …

19a7cca

…same form as the symmetric Shapley values and refactored the code to only have a single while loop. Having an if-else inside it will not be expensive, as this while loop will often only do one/a few iterations.

Add reference to data.table

fc1d705

Remove causal ordering stuff from forecast.

2dbbd48

Removed old code

c69f43a

lintr + styler

38e9b21

Change max_n_coalitions due to model rank deficience.

884b258

Bug in explain_forecast() (#425)

db81ed7

Started woerking on the change from integers to string. A lot of smal…

f97058f

…l hurdles that we did not think about. Created new cpp function that returns the sampled coalitions as a character vector. Need to check how / if it works with adaptive and asymmetric.

Change to whitespace seperated strings in Rcpp

62fc8f0

Ensure even number of coalitions when paired_shap_sampling is TRUE

db6462e

Update next iteration parameters

d5bc8ed

Rcpp

ed8b1e3

Updates to main shapley setup which now introduces strigns as the def…

8ea705c

…ault version for sampling coalitions

Started to simplify code

6f62690

Removed coalitions_tmp as it is no longer needed as `coalitions_str…

16d76ed

…` is always a part of X.

Wrong name in dt_valid_causal_coalitions.

4f97041

Change from comma separated to whitespace separated coaltions in the …

c97ba7b

…exact data table.

Rewritten the sampling procedure for asymmetric Shapley value to the …

3b96f3e

…same form as the symmetric Shapley values and refactored the code to only have a single while loop. Having an if-else inside it will not be expensive, as this while loop will often only do one/a few iterations.

LHBO added 25 commits December 17, 2024 13:30

Add reference to data.table

3feb670

Remove causal ordering stuff from forecast.

d063b26

Removed old code

89a35fa

lintr + styler

6766c6e

Change max_n_coalitions due to model rank deficience.

78c48cd

Merge branch 'Lars/String_coalitions' of github.com:NorskRegnesentral…

42f4e74

…/shapr into Lars/String_coalitions

Merge branch 'Lars/String_coalitions' of github.com:NorskRegnesentral…

a334a4f

…/shapr into Lars/String_coalitions

Merge branch 'Lars/String_coalitions' of github.com:NorskRegnesentral…

b2cb3be

…/shapr into Lars/String_coalitions

Delete commented code

6af3c1e

Updated manuals with the new parameters and manuals for new functions

e0ec963

Add documentation to sample_coalition_table

51aff62

Moved documentation

ae6be2c

Updated regular-setup test files

472e815

regular-output

333d113

Forecast setup

19724a8

Asym-caus-output

8be683c

forecast output

bb7dc17

iterative output

90f659e

Add comment

7f22f6c

regression

dbc6bee

Added bugfix to forecast such that we remeber the sampled coalitions …

2a487a9

…from the previous iteration in the current interation. Previousely, this was not done and we resampled coalitions for each iter.

Added global variables to zzz

d1172f7

update forecast test rds files

0a05c4d

Upadeted documentation

aa73f75

Fix warning with argument

098c7de

LHBO marked this pull request as ready for review December 19, 2024 11:46

martinju self-requested a review December 19, 2024 12:02

martinju approved these changes Dec 19, 2024

View reviewed changes

martinju merged commit f89ead4 into master Dec 19, 2024
7 checks passed

martinju deleted the Lars/String_coalitions branch December 19, 2024 12:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved sampling efficiency and conversion of coalitions to strings #426

Improved sampling efficiency and conversion of coalitions to strings #426

LHBO commented Dec 13, 2024 •

edited by martinju

Loading

Improved sampling efficiency and conversion of coalitions to strings #426

Improved sampling efficiency and conversion of coalitions to strings #426

Conversation

LHBO commented Dec 13, 2024 • edited by martinju Loading

LHBO commented Dec 13, 2024 •

edited by martinju

Loading