Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved sampling efficiency and conversion of coalitions to strings #426

Merged
merged 57 commits into from
Dec 19, 2024

Conversation

LHBO
Copy link
Collaborator

@LHBO LHBO commented Dec 13, 2024

In this PR, we improve the sampling efficiency, and we convert from working with the sampled coalitions as a list of integer vectors to a vector of strings, where each string is a coalition with the groups/features being separated by a white space. I.e., the coalition S = {1, 5, 10} is stored as "1 5 10".

There are two main reasons for doing these changes:

  1. The previous sampling strategy was inefficient, especially, when the number of sampled coalitions increased and when the number of unique coalitions approached 2^M, where M is the number of features. Previously, for each iteration in a while loop, we sampled the remaining number of coalitions between the current and desired number of coalitions. We added them to the list of sampled coalitions and checked the number of unique elements in this list in a naive and brute way, which was time consuming.
    In the new version, we sample and excess number of coalitions, as this is an inexpensive step, then compute the number of unique coalitions and when they were sampled for the first time, and remove the redundant coalitions sampled after having obtained the desired number of unique coalitions.
  2. To speed up the sampling, we also do all the sampling and comparing on a string basis, as the comparison/finding of unique strings are faster than for integer vectors, and at the end we convert the strings to list of integer vectors, which the approaches rely on.

Also fixed bug in forecast such that we remember the sampled coalitions from the previous iteration in the current iteration. Previously, this was not done and we resampled coalitions for each iter.

LHBO and others added 30 commits December 3, 2024 17:12
…l hurdles that we did not think about.

Created new cpp function that returns the sampled coalitions as a character vector.

Need to check how / if it works with adaptive and asymmetric.
…same form as the symmetric Shapley values and refactored the code to only have a single while loop. Having an if-else inside it will not be expensive, as this while loop will often only do one/a few iterations.
…e when they are NA. Got some issues when I ran

explain(
      testing = TRUE,
      model = model_lm_numeric,
      x_explain = x_explain_numeric,
      x_train = x_train_numeric,
      approach = "gaussian",
      phi0 = p0,
      asymmetric = FALSE,
      causal_ordering = list(1, 2, 3),
      confounding = c(TRUE, TRUE, FALSE),
      group = list("A" = c("Solar.R"), B = c("Wind", "Temp"), C = c("Month", "Day")),
      n_MC_samples = 5, # Just for speed,
      verbose = c("basic", "progress", "convergence", "vS_details", "shapley"),
      iterative = TRUE,
      iterative_args = list(initial_n_coalitions = 4, convergence_tol = NULL, fixed_n_coalitions_per_iter = 1)
    )
explain(
        model = model,
        x_train = x_train,
        x_explain = x_explain,
        approach = "gaussian",
        phi0 = phi0,
        asymmetric = TRUE,
        causal_ordering = causal_ordering_group,
        confounding = confounding,
        paired_shap_sampling = FALSE,
        n_MC_samples = 1000,
        group = group_list,
        iterative = TRUE,
    )

where the initial_n_coalitions where larger than the max_n_coalitions.
…l hurdles that we did not think about.

Created new cpp function that returns the sampled coalitions as a character vector.

Need to check how / if it works with adaptive and asymmetric.
…same form as the symmetric Shapley values and refactored the code to only have a single while loop. Having an if-else inside it will not be expensive, as this while loop will often only do one/a few iterations.
@LHBO LHBO marked this pull request as ready for review December 19, 2024 11:46
@martinju martinju self-requested a review December 19, 2024 12:02
@martinju martinju merged commit f89ead4 into master Dec 19, 2024
7 checks passed
@martinju martinju deleted the Lars/String_coalitions branch December 19, 2024 12:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants